Add to Technorati Favorites

 Subscribe in a reader

Trends and Outliers

TIBCO Spotfire's Business Intelligence Blog


The ABCs of Data Quality

data quality 300x225 The ABCs of Data QualityA:  What is it?

Fundamentally, “data quality” refers to the reliability and value of data in terms of the purpose for which it is being used.  For some uses, data can be approximate, inconsistent, and even inaccurate, but still serve its purpose effectively.  For other uses, data must be absolutely pristine—clean, consistent, and perfectly formatted—in order to have value.  Every organization should identify the level(s) of data quality it actually needs, and commit to maintaining that level.

Data quality is a broad concept that breaks out into several key components.  Among the most important:

  1. Correctness:  How well does the data correlate with the reality it represents?
  2. Suitability:  Is the data appropriate for its potential uses?
  3. Consistency:  Are the same facts represented the same way throughout the data aggregate?
  4. Cleanness:  Is the data aggregate free of inaccurate or outdated data, duplicated data, etc.?

The first two characteristics refer to the data itself, while the second two refer to the total content of the database or data warehouse.  A complete data quality strategy needs both perspectives.

Problems with data quality can arise at any stage of the data collection and management process.  To ensure a high level of data quality, organizations must:

  1. Plan for data quality in the design of databases and the development of all data-related projects
  2. Gather data effectively (ask the right questions in the right way, and collect the answers in a usable format)
  3. Define and enforce data quality rules throughout acquisition and management processes
  4. Dedicate resources to maintain and improve data quality

In the real world, of course, data collection is rarely perfect–and when data is integrated from different sources (e.g., multiple divisions, locations, vendors), there are often inconsistencies and duplications.  Consequently, most organizations use specialized tools and processes for cleansing and standardizing data.

B:  Why does it matter?

To borrow an old phrase . . . “Garbage in, garbage out.”  Business Analytics is all about business data—so obviously, if the data is bad, the analysis will be flawed.  But poor data quality is not necessarily obvious.  Information may look perfectly reasonable and believable, while still being wrong (by a little, or a lot).

In the days when most business intelligence work was done within IT, there was a fair chance that data quality problems would be spotted by knowledgeable analysts.  But now that data analytics is widely distributed in many organizations–and end-users often take information at face value–poor enforcement of data quality could become increasingly problematic.

Even if data quality processes were perfect, they would still be strained by the huge volume of data that many organizations deal with today.  And as businesses are becoming ever more reliant on analytics, data quality is rapidly emerging as a significant issue.

C:  What’s next?

In their most recent Magic Quadrant for Data Integration, Gartner noted that “during 2010, buyer demand showed a clear preference for solutions which offer both data integration and data quality functionality.”  It seems likely that these functional areas will continue to become more tightly linked.  Meanwhile, broader conceptual views of data quality and its role in information management are developing.  For interesting perspectives on the present and future of data quality, check out these two white papers:  Information Quality Management: Assessing Your IQM Practice from Trillium and Understanding the Financial Value of Data Quality Improvement from Informatica and Knowledge Integrity.

Print post

Is your data hiding something?
Download Your Free 30-Day Evaluation of TIBCO Spotfire®

One Trackback

[...] Geek Lesson – Data quality is a must or you could end up with myths and legends like snakes in [...]


Post a Comment

Your email is never shared. Required fields are marked