For those of you who don’t normally think in data, what that means is that past a certain point, your return on adding more data diminishes to the point that you’re only wasting time gathering more.
One reason: The “bigger” your data, the more false positives will turn up in it, when you’re looking for correlations. As data scientist Vincent Granville wrote in “The curse of big data,” it’s not hard, even with a data set that includes just 1,000 items, to get into a situation in which “we are dealing with many, many millions of correlations.” And that means, “out of all these correlations, a few will be extremely high just by chance: if you use such a correlation for predictive modeling, you will lose.”
(Source: qz.com)
The firm also sounds a note of caution about whether the search giant will ever embark on a nationwide effort: it could cost up to $11 billion to build out gigabit Internet and TV service to another 20 million homes to achieve a medium-to-large rollout to compete with other providers.