Data analysis informs our daily business decisions, but that analysis is only as good as the data supporting it. Organizations should invest in ensuring that their decision-making is based on accurate information.
As data analysis is becoming increasingly sophisticated, machine learning is becoming a sought after technology. Organizations that haven’t already incorporated machine learning predictions should consider doing so as the benefits can be significant. These benefits, however, are critically dependent on the data the machine learning algorithms are learning from.
At its heart, machine learning makes predictions based on patterns found in data. Those patterns are the “learning” in “machine learning.” In order for the predictions drawn from those learned patterns to be accurate, the underlying data that are the source of those learned patterns, need to be clean, that is, accurate.
Accurate data are the exception, not the rule. A recent study found that a very small portion of organizations’ data meet these basic quality standards.
In a report in Harvard Business Review, researchers worked with 75 different executives over two years to review the accuracy of small samples of their data. Only 3% found that their data met their own basic quality standards for accuracy. Half of those executives found that more than 40% of their records were contaminated with inaccuracies.
The report also found wide variation in the range of data quality. Data are used to make a variety of predictions that impact broader sets of business decisions. For these reasons, the full costs of even a specific instance of dirty data are difficult to estimate. They are likely to be substantial.
What do these findings imply for organizations? Like the sample of executives in the study, most are probably unaware of just how dirty the data they rely on are. As you use data to make important HR decisions, it may be in your interest to approach the data with skepticism. Don’t’ be afraid to take a second look.
The authors of the HBR article also propose organizations conduct their own assessments. Carefully check 100 records upon which you rely and calculate the proportion with errors. If errors are found, outsourcing for regulatory compliance, such as pay equity statutes or the Affordable Care Act, ACA may be a cost-effective solution. Because bad data provide little defense for regulatory non-compliance, involving an independent third-party in that process brings data-cleaning expertise to your data sources. Clean, reliable data results as a necessary by-product of such expert compliance processing.
How great would it be if there was a process in which you could automatically have your data assessed, consolidated and validated, to be used for any purpose? That day may be coming. In the meantime, as you rely on data to provide the foundations for your critical business decisions, be aware of the likelihood of errors in those foundations. As the phrase goes, garbage in, garbage out. Especially in business, “garbage” decisions are likely to be expensive decisions.