When your company decides to undertake the initiative of cleaning their data, you will have to choose between using standard data quality tools or custom building a data quality system to fit their unique data quality requirements. Data Quality tools may exist as standalone tools or may be packaged within Master Data Management (MDM) tools. In either case, they will come with a prepackaged set of capabilities that can be used to clean your data. Standard data quality tools are often the best option; however, when the need is to custom code a data quality rule due to unique requirements, Extract-Transform-Load (ETL) tools may be a better option.
Examples of Prepackaged Data Quality Capabilities:
- Fixing Zip Codes inconsistencies
- Getting phone numbers to display in a defined format
- Parsing address records to the correct fields
- Validating Email addresses and format
Advantages & Disadvantages of ETL Tools
The clear advantage of ETL tools is that most companies already have them as part of their database system, therefore do not need to purchase any additional tools. They also have resources with the required skillset to code the data quality rules.
The disadvantage is that the ETL tools will sometimes be used to develop standard data quality functions that are already available in the data quality tools, ultimately reinventing the wheel. Companies should assess what the data quality tools offer out-of-the-box and not try to recreate that functionality with the ETL tools.
Before choosing ETL tools, companies should profile their data to assess the current state of their data. The best way to do this is to utilize Data Profiling tools, which are often prebuilt into the Data Quality tools, but they can also be purchased standalone.
Analyzing Quality with Data Profiling
Data profiling is the process of examining the data to develop statistics on the quality of the data. Historically, data profiling is performed by a skilled practitioner utilizing SQL scripts and conducting source data analysis. The practitioner would need to have a strong understanding of the source system and the various data fields.
When utilizing data profiling tools, companies can perform a much more comprehensive analysis of the quality of the data and provide results in a standardized format that can be measured and tracked over time, providing a lot of value at the start of a data quality effort. After analyzing the quality of the data, there may be a need for a specific quick-hit data cleanup effort where you might not leverage data quality tools. The value of this type of an initiative is valuable, focused and short-term, but should not be utilized enterprise-wide for all data quality issues.
Data Quality Tools Next Steps
Setting yourself up for success with clean data does not mean building every component of your data quality toolset. Having someone guide you through the data quality aisles will help you pick out the tools that best fit your needs. Whether you are just starting on your journey towards clean data or looking for more information, we can help. Our eBook, The Clean Data Initiative, explores data quality, data governance, and master data management, in addition to data insights. The eBook compares dirty data to dirty water, highlighting the parallels between the two, and exploring the difference in how we handle the quality of water vs. data. For more information on choosing a data quality tool, contact a subject matter expert.