Archive for December 2010

T-SQL Tuesday #13–Data Quality and the Business User

TSQL2sDay150x150This post is part of T-SQL Tuesday #13 – organized by Adam Machanic (Blog | Twitter), and hosted this month by Steve Jones (Blog | Twitter). The topic this month is “What issues have you had in interacting with the business to get your job done?”

Working in business intelligence, I do a lot of work with business users. For the most part, it’s an enjoyable process and a good interchange of information. I learn about the business, and in the process help the users gain new insights about their data and how it can be used to answer their questions. There is one consistent item, though, that I’ve found difficult to communicate to business users, and that’s the state of their data quality.

Data quality has a pretty big impact on the value of a business intelligence solution. Bad data quality = bad answers, and if your business intelligence solution isn’t delivering good answers, it’s not really worth much. But the data quality in a business intelligence solution depends in large part on the data being fed into it. And unfortunately, the data  in a lot of source systems is not in good shape.

It’s Not Me, It’s You

It’s really difficult to communicate this to the business users, though. After all, they’ve been running their business for years on this data, and now I’m telling them it’s bad? Why haven’t people been complaining about it all along? There are a few things that I try mention at this point.

First, the level of data quality required to get good information from a BI system is very different from that required to run an operational system. For operations, there are usually some required, key pieces of information necessary. These are kept to a minimum, though, because operationally, the more data quality checks you implement, the more you impede the business process. So you want to have just enough data quality to for the system to not fall apart. You don’t really care that the customer entered the wrong zip code with their state (after all, the Post Office will work that out when you send them the bill, right?)

For BI work, though, you are flipping that around. To analyze and get meaning form the data, you need classification of the data, and some of those optional, not-so-important from an operational perspective pieces of information start coming into play. Knowing that your states actually align with your zip codes becomes pretty important if you want to display your data on a map.

Also, people probably have been complaining about data quality – they just aren’t complaining to the business users. The DBAs in most companies I’ve worked with are well aware of the issues in the company’s data. But when they complain about it, they complain to their managers or other DBAs. They don’t complain to business users, because, after all, we technology people are just supposed to make it work, right?

Can We Make This Work?

Convincing the business users of these two points can be pretty difficult, though. In the past, I’ve had to extract data, profile it, and then identify problem areas. Even then, going to a business user with a list of the 1000s of customers who have zip codes that aren’t valid for their state of residence often didn’t help. Business users can’t easily correlate the impact of those types of data issues on their reporting and analytics until they see it in the context of the actual analysis.

So, in the past, I often ended up doing a first pass on the data where I let anything in – quality data or not. Inevitably, as soon as users started looking at the numbers, they’d become much more interested in data quality. Today, tools like PowerPivot can help, because I can quickly build a functional prototype on top of the dirty data, and visually show the impact much more easily.

Once the business users understand the impact of bad quality data, they’ve always seen the value in either fixing it or putting in the appropriate safeguards to prevent it from entering the BI system in the first place. It can be challenging to communicate it, but the faster I could get this point across with the business users, the more successful the projects turned out. One of the keys things that I’ve learned from this process over time is that, while it’s difficult to show someone quality data, you can show clearly show the impact that it has on the solutions. For many business users, this communicates far better than all the talking in the world.