“There were 5 exabytes of information created between the dawn of civilization through 2003, but that much information is now created every two days.”

Eric Schmidt, executive chairman of Google, stated this in 2010 — four years ago.

Unlike any time in history, we are drowning in data. Marketers have access to abundant streams of information – customer satisfaction scores, credit card swipes, web clicks, in-app behaviors, social media trends, etc. – that are being collected and stored in massive data centers.

The promise of big data is that by intelligently analyzing these data points, companies can achieve unprecedented insights into their customers and prospects, helping grow the business through Moneyball-style decision making.

But with this great promise also comes significant peril. In the hands of unskilled practitioners, big data can not only fail to meet expectations, but may lead to erroneous conclusions that can lead companies astray.

The problem is that big data can easily find relationships that are completely meaningless.

For example, look at the chart below, created by a terrific website called Spurious Correlations. It shows a fairly high correlation (.66) with the number of Nicolas Cage films and number of people who drowned by falling into a swimming pool.

 

 

An even more extreme example is the near perfect correction (.99) of the divorce rate in Maine, with per capita consumption of margarine in the U.S.

 

 

Clearly, statistics alone can lead us astray. So how can marketers ensure that their big data efforts are intelligent and actionable without leading them to nonsensical conclusions? Here are three basic rules to follow:

1) Don’t equate correlation with causation.

Most of us remember the classic saying that “correlation does not imply causation,” and the examples above are excellent real-world reminders. But it is still surprising how many people infer causation from correlations. When data sets are all focused on a specific topic, say customer purchase habits, it is easy to rationalize how a specific correlation must be causal. If we can create a logical rationalization, and are unable to identify alternative explanations, then we may allow ourselves to forget the old adage and believe causation is at work. Marketers need to be vigilant and not fall into this trap.

2) Don’t equate statistical with substantive significance.

As any Stats 101 student knows, the larger your sample size, the easier it is to find statistically significant results. But big data is HUGE, with data sets that provide a level of statistical power that can be highly misleading, as even a “small” finding can easily be statistically significant. Many analysts are trained to focus on significance levels, but just because something is statistically significant, does not mean it is meaningful in the real world. Marketers have to bring their business acumen to the table and resist the urge to blindly accept statistically significant results.

3) Don’t neglect hypothesis testing.

Big data is best utilized to answer specific business questions, such as whether customers active on social media spend more than non-social media customers (as a simplistic example). But the allure of big data can also entice marketers to “go fishing” and “see what sticks” from a statistical perspective. In psychology, they call this “dust bowl empiricism” – a study where no theory is involved – and it is generally regarded as inferior, not to mention lazy. Hypothesis testing counters dust bowl empiricism by posing specific business questions that can be tested, and proven false.

Big Data holds tremendous promise and opportunity to those that wield it responsibly. By keeping in mind the potential pitfalls, marketers can successfully reap the benefits for their company.

(This article originally appeared on The Agency Post)