Garbage in, garbage out was first uttered by an IBM employee back in 1965 and this computer science term is just as valid today as it was nearly 55 years ago. In fact, we are swimming in more data today than ever, which is created from new software applications and IOT devices that add to the data noise each and every second. With all of this data comes the need to ensure that data is cleaned, summarized or prepared for further analysis or action.
Many software applications lack data validation capabilities or they weren’t configured when the systems first went live. As a result, data records are often duplicated with multiple entries for the same (customer, vendor, product, complaint, ticket, etc, etc, etc). I think we have all experienced this phenomenon and there are likely duplicates living in nearly all our systems, creating havoc when we try to roll data up for analysis or associate different data elements.
It doesn’t have to stay this way. Rule sets within Decisions can be used to clean operational and reporting data. There are typically three methods that can be employed to clean and maintain data for both operational and reporting systems. In each example below, a set of data can be run through a set of data validation rules that all operate on different fields or attributes within a single data set.
Method 1 – Validate Upon Entry: This is always the ideal case, although some software applications don’t have the capability to use external services to incorporate this method. If the application does support this, data can be passed to the rules engine for processing before it is saved back to the application database.
Method 2 – Clean & Replace: Using this method, nightly jobs can be run that grab newly entered data from your operational system (CRM, ERP……) and processed through a rule set within Decisions. These rules can compare this data against previously generated rules that validate addresses, company names and any other commonly duplicated or misspelled data items. This data can then be standardized to a common spelling. Once cleaned the data can be replaced, deleted or repaired in the operational system.
Method 3 – Clean for Reporting: In this method, operational data is run through the rule set and cleaned prior to the data being added to a data warehouse or data lake. In this case, the data is left as-is in the operational system but is cleaned for entry into the data warehouse. If the data warehouse is where business people look to make operational decisions – this can be perfectly ok.
In each example above, the rules themselves are the same. The only difference is when and where the data validation takes place in your process. If you have a particularly tricky data cleaning project we would love to hear about it. Please feel free to reach out to firstname.lastname@example.org.