Have you ever tried to fill every corner of a box with a single ball? The ball might fit but there will always be gaps.
Have you ever compared the characteristics of an apple to an orange? The results will always be the same, but the only conclusion that can be drawn is that they are different. You can’t glean any additional information from the apple-to-orange comparison.
Now think of a data warehouse design. The data warehouse will leave gaps, make comparative analysis difficult, and it won’t lend itself to self-service business intelligence if it’s built without:
- a properly formatted physical structure
- data that’s been subjected to a rigorous filtering and transformation process
- a data warehousing schema that’s easy for an end user to use and understand
In order to make use of something, you must first understand its purpose, its limitations, and its structure. When evaluating anything there are guidelines and standards to help determine its usefulness.
Bin Jiang, a distinguished professor of a large university in China, suggests that the infrastructure of the data warehouse is an extremely important component. The infrastructure includes the system hardware and software that make up the data warehouse.
Jiang is correct, but I have been on multiple data warehousing projects where infrastructure components (CPUs, memory, storage, etc.) have been decided by teams other than the data warehousing team without consultation or coordination. Because of this lack of coordination between teams, some reworks and modifications to the infrastructure are required. This problem could be overcome by teams working collaboratively and following proven data warehousing standards.
Jiang lists two other noteworthy points regarding the data warehouse infrastructure – it should be unique and it should provide functionalities suited for data analysis with little or no data manipulation required by the end user. These functionalities include:
- Data integration – All types of data including data that is structurally and semantically different should be integrated.
- Data collection – The data warehouse should be the only infrastructure within the organization that keeps the collected snapshots available online for the whole organization for as long as the business requires.
- Data preparation – Having the appropriate filtering and transformation in place to make the data useful.
A data warehouse is built to support data analysis. It includes a historical snapshot of the data, and it must allow users to quickly and easily retrieve the data. To accomplish this, your data warehouse development process must follow a set of standards and guidelines that ensure efficiency, quality and speed.
- Subscribe to our blog to stay up to date on the latest insights and trends in data warehousing and data analytics.
- Check out our complimentary “5-Minute Guide to Business Analytics” to find out how user-driven “analytic” or “data discovery” technologies help business and technology users more quickly uncover insights and speed action.
Dennis Earl Hardy
Spotfire Blogging Team