A Data Quality Riot Act

As the son of a former Marine Chief Warrant Officer (CW2) I have not only seen the occasional riot act being doled out to my brother; I have been on the receiving end of a few myself.  I have discovered that in order for your data quality initiatives to be successful you need someone who is willing to play the role of drill sergeant on occasion. There are going to be occasions where someone needs to take control when a business process or system provides the latitude to introduce bad data into the environment. This person is going to need to motivated and passionate about rooting out bad data and the processes that allow it to be introduced.

Recently I received an email from a business analyst inquiring about a discrepancy in the between the operational system and the analytical application that his team had recently distributed across the enterprise. The discrepancy was that a name change in the operational system was not being reflected in the analytical application. At the surface, this seemed fairly benign. We took a look at the table responsible for this particular entity the table below depicts what was discovered.

Id First Name Last Name Employee Id User Id
10087632 Olivia Johnson 257303 johnsono
10108465 Olivia Matthews 0257303 matthews

Since this table behaves as a Type 4 slowly changing dimension, an update of the last name and user id would have simply move the original record to an audit table and introduced a new record with the correct last name. Instead, a new record was created not only introducing a new surrogate key for the same person but the unique constraint on the employee id was bypassed by inserting a leading zero into the field.  This had to have been done by a user who was not properly trained on handling employee name changes in the operational system. A well written email can prevent this from happening again. The user could even be instructed to go back in to the system, delete the new record, and update the existing record properly. But that was not the case.

The users responsible for managing user accounts in the operational system have been instructed to create a new user account when someone submits a name change. These instructions also explicitly state to insert a leading zero for the employee id to bypass the uniqueness constraint placed on the field. What happens when an employee gets married then divorced and no longer wishes to retain her now ex-husband’s last name or marries again? Will they add a second leading zero to the employee id?

It was upon hearing the details of the business process that I could feel the increase in my blood pressure. This operational system has been live enterprise wide for less than six months and the business processes that support it are already showing signs of failure. History did not provide the learning experience that you would have hoped; it will take someone to put on their drill instructor uniform and break the privates in their platoon of their bad habits. Only then can they be built back up with understanding that they are to protect the enterprises data like it was their homeland.

This entry was posted in Blog, Data Quality, Databases and tagged , . Bookmark the permalink.

14 Responses to A Data Quality Riot Act

  1. Phil Simon says:

    Rob

    Really good stuff. I have unfortunately seen this happen so many times that I have developed a bit of an immunity to it, as outrageous as it is.

    A well written email can prevent this from happening again.

    Perhaps coupled with some electric shocks?

  2. Jill Wanless (aka sheezaredhead) says:

    Great post Rob. I know exactly what you mean about the blood boiling bit! Happens to me all the time (my co-workers say: “Jill’s Ranting again’)! :)
    I see it when business requirements do not clearly articulate how the data will be used to support a business decision. If that line of sight is not explicit, it can easily lead to bad processes that create poor data quality.

    Thanks!

  3. Jim Harris says:

    Users responsible for managing the operational system have been “instructed” to intentionally circumvent database protocols specifically put in place to ensure proper data management and data quality?

    I hereby order a DQ-CR!

    (For the DQ civilians – that stands for Data Quality – Code Red!)

    Shut Your Mouth – Open Your Ears – Listen and listen well!

    For every leading zero I catch you adding to the “Employee Id” field, I am going to personally remove a trailing zero from your “Salary” field – how that’s sound, sport? That sound like a best practice you can get behind? Or do I need to add a swift kick in your behind to make sure that it really sinks in?

    . . .

    Excellent post Rob – Best Regards, Jim

  4. Pingback: Tweets that mention A Data Quality Riot Act | Rob Paller -- Topsy.com

  5. Rob Paller says:

    @Phil – Electric shock therapy might be what it takes to drive the point home. Thanks for commenting.

    @Jill – The business process was what put me over the edge in this situation. I can understand a user making a mistake, but when the rules introduce this it is mind blowing. Thank you for commenting.

    @Jim – I like your concept. I might suggest it as a solution to the problem. Great comment, thank you.

  6. Dylan Jones says:

    Great post Rob.

    The scary thing is that in most businesses this kind of thing actually stays in there hidden. I’m forever in awe of the creativity that goes into figuring out ways to inject defects into apps.

    Love examples like this because it demonstrates to DB designers that no matter how many constraints you build into the schema you still need a data quality rule system on top to police the exceptions. Surprising how many people still don’t grasp how easy it is to create defective data in the most well designed database, this post should be circulated to all database designers as a warning.

    I can just imagine a downstream ETL feed taking that employee ID and loading it into an integer within perhaps a salary payment system. All of a sudden the duplicate returns and the employee is taxed twice, renumerated twice etc.

    Great example of data quality being played out in reality Rob, nice post.

  7. Jim Tepin says:

    Rob, I hope you don’t mind if I fan the fire a little bit. This problem was reported by THE employee (who manages a “case load”). They could not understand why none of their past work was showing up on a report. The first reason is obvious. The relationship between the worker and their work was broken. The “business” solution is even more maddening. The workers supervisor has to enter the system and transfer all of the past work/cases from the old worker id to the new… (where’s that banging head emoticon when you need it).

  8. Rob Paller says:

    @Dylan – Great comment, thank you for commenting. As for downstream ETL, in this particular instance the system doesn’t feed employee payroll. The data warehouse was cognizant to retain the data types from the source system and not assume anything. Double edged sword, I suppose.

    @Jim – Thanks for commenting, throwing a little more fuel on the fire, and tying a few loose ends together.

  9. Rob,
    a great example for a broken data process. I understand it needs someone that is “motivated and passionate about rooting out bad data and the processes that allow it to be introduced.” (Great job description btw!)
    But a riot act?
    In my experience, this may lead to an adverse relationship with the people entering all the data .. which doesn’t help data quality. In most cases, people are just solving a problem that they are left alone with. They came up with something that worked (at least for them). Instead of being mad at them it’s much more helpful to show them the negative effect their “solution” had and offer them help in coming up with a better solution. Usually, they are quite thankful for the help that they’re getting.
    Only if they’re too stupid to ask for help, a riot act may be in order. Even then, it’s better if the riot act is read by their boss instead of you!
    What’s your experience?
    Thorsten

    • Rob Paller says:

      @Thorsten – You’re correct, reading a riot act to a user who unknowingly introduces bad data into the environment is probably a career deciding moment that you would soon want to forget. However, in this particular situation the business process was created with the intent to bypass the natural changing that occurs in this dimension. Furthermore, the process instructed them on how to bypass the constraints that were placed on the employee’s ID from the HR system. The user in the field that introduced this new record was not at fault or deserving of the riot act. The persons responsible for drafting this business process on the other hand need to be shown the light.

      Thank you for your comment. You raised a great point that as data quality professionals we have to exercise restraint and not look down the mountain as we proselytize but come down the mountain an teach.

  10. Adrian Noble says:

    Rob,

    Terrific example. I agree with Thorsten, though, that another low-key conversation with the business is in order. There must be some perceived benefit for the business unit in doing it this way, and if you can identify this and show them a better way (i.e. one that does not screw up downstream users) of achieving it, that would be preferable.
    Of course, if they are deliberately ignoring IT and, especially, DM, the the riot act may be in order.

  11. Rob Paller says:

    @Adrian – Thank you for taking the time to comment. We are in a great position to help them understand the that what they perceive is throwing a pebble in the pond will cause ripples that affect every one else in the pond. (Pardon the cliche…)

  12. Pingback: Heute schon das Vertrauen in DQM erhöht? : SmarterSoftware Blog

  13. Pingback: Have you built your DQ trust today? : SmarterSoftware Blog