Fundamentally, “data quality” refers to the reliability and value of data in terms of the purpose for which it is being used. For some uses, data can be approximate, inconsistent, and even inaccurate, but still serve its purpose effectively. For other uses, data must be absolutely pristine—clean, consistent, and perfectly formatted—in order to have value. Every organization should identify the level(s) of data quality it actually needs, and commit to maintaining that level.
Data quality is a broad concept that breaks out into several key components. Among the most important:
- Correctness: How well does the data correlate with the reality it represents?
- Suitability: Is the data appropriate for its potential uses?
- Consistency: Are the same facts represented the same way throughout the data aggregate?
- Cleanness: Is the data aggregate free of inaccurate or outdated data, duplicated data, etc.?
The first two characteristics refer to the data itself, while the second two refer to the total content of the database or data warehouse. A complete data quality strategy needs both perspectives.
Problems with data quality can arise at any stage of the data collection and management process. To ensure a high level of data quality, organizations must:
- Plan for data quality in the design of databases and the development of all data-related projects
- Gather data effectively (ask the right questions in the right way, and collect the answers in a usable format)
- Define and enforce data quality rules throughout acquisition and management processes
- Dedicate resources to maintain and improve data quality
In the real world, of course, data collection is rarely perfect–and when data is integrated from different sources (e.g., multiple divisions, locations, vendors), there are often inconsistencies and duplications. Consequently, most organizations use specialized tools and processes for cleansing and standardizing data.
B: Why does it matter?
To borrow an old phrase . . . “Garbage in, garbage out.” Business Analytics is all about business data—so obviously, if the data is bad, the analysis will be flawed. But poor data quality is not necessarily obvious. Information may look perfectly reasonable and believable, while still being wrong (by a little, or a lot).
In the days when most business intelligence work was done within IT, there was a fair chance that data quality problems would be spotted by knowledgeable analysts. But now that data analytics is widely distributed in many organizations–and end-users often take information at face value–poor enforcement of data quality could become increasingly problematic.
Even if data quality processes were perfect, they would still be strained by the huge volume of data that many organizations deal with today. And as businesses are becoming ever more reliant on analytics, data quality is rapidly emerging as a significant issue.
C: What’s next?
In their most recent Magic Quadrant for Data Integration, Gartner noted that “during 2010, buyer demand showed a clear preference for solutions which offer both data integration and data quality functionality.” It seems likely that these functional areas will continue to become more tightly linked. Meanwhile, broader conceptual views of data quality and its role in information management are developing. For interesting perspectives on the present and future of data quality, check out these two white papers: Information Quality Management: Assessing Your IQM Practice from Trillium and Understanding the Financial Value of Data Quality Improvement from Informatica and Knowledge Integrity.