T-SQL Tuesday – Agile BI

T-SQL Tuesday #15: Automating XML Updates w/ PowerShell

jwelch — Tue, 08 Feb 2011 23:56:38 +0000

This month’s T-SQL Tuesday is being hosted by Pat Wright (blog | twitter). He selected the topic of automation with T-SQL or PowerShell. As it happened, I’d just had a scenario come up that seemed appropriate for a little PowerShell, so I went with it for this post.

The Varigence toolset is based on a language called BIML (Business Intelligence Markup Language), which is an XML-based language for creating BI solutions. We needed to make some mass updates to some of our sample files, but there were quite a few individual ones to update. I already had an XSLT transform that would update each file individually, so all I needed was a quick way to run it on each file in a directory.

I’m pretty new to PowerShell, but it seemed like a good fit for this. So, after a little trial and error, I came up with the following script:

[sourcecode language="powershell" padlinenumbers="true"]
$path = "C:tempxml*.xml"
get-childItem $path | rename-item -newname { $_.name -replace '.xml','_backup.xml' }
$files = Get-ChildItem $path
$xslt = New-Object System.Xml.Xsl.XslCompiledTransform
$xslt.Load("C:tempxmlDataType.xslt")
foreach ($file in $files) { 
    $oldFileName = $file.FullName
    $newFileName = $file.FullName -replace '_backup.xml', '.xml'
    $xslt.Transform($oldFileName, $newFileName) 
}
[/sourcecode]

Basically, it goes through a directory, renames all the existing .XML files to create a backup, and then gets a list of the files. It then loads an XSLT transformation (used to transform XML documents by inserting or removing markup). Finally, it loops through the files in the folder, and uses the XSLT transform to create an updated version of the file with the original filename.

There are probably better ways to do this, but it does exactly what I needed, and it let me learn a little more about PowerShell. It also struck me that this could be very useful to transform XML into a friendlier shape for processing in SSIS.

T-SQL Tuesday: Fix the Glitch or Resolve the Problem

jwelch — Tue, 11 Jan 2011 17:36:07 +0000

One of the interesting things about T-SQL Tuesday is that it’s a 24-hour event. So, being on the east coast of the US, I get the chance to read some of the blog posts for the event on Monday night, before I actually post mine, though I usually write them ahead of time. Last night I saw that Rob Farley (twitter | blog) had posted his T-SQL Tuesday post. Reading it, I came to a horrifying realization – Rob and I think alike! (If you know Rob, you’ll realize that’s a lot scarier than it sounds). His post and thought process was very similar to mine – though, while I did think about the screen resolution joke, I had the good sense not to mention it.

Seriously, since Rob’s one of the smartest guys I know, I think it’s pretty cool that we had the same approach.

This month’s T-SQL Tuesday is being hosted by Jen McCown (twitter | blog). She chose the topic of resolutions, with the new year in mind. I’ve never been big on the traditional idea of resolutions, and usually focus more on goals that I revisit on a regular basis.

So I started thinking about other ways that I could blog about resolutions, and the idea of resolving problems came to mind pretty quickly. In the roles I’ve filled over the last several years, I was often called in to deal with problems (technical and non-technical). The scope of the problems could vary quite a bit, from simple, 5 minute technical issues (“Why won’t my cube process?”) to business problems requiring full projects (“I need to analyze student performance and determine what factors impact it most heavily.”). One of the trends that I’ve noticed around solving problems is the shift from resolving the problem to simply fixing the glitch.

What’s the difference?

Well, just a second there, professor. We, uh, we fixed the *glitch*. So he won’t be receiving a paycheck anymore, so it’ll just work itself out naturally.
-Office Space

When you fix the glitch, you are basically looking for the quickest, cheapest, least effort patch to getting the immediate results you desire. When you resolve a problem, you are making sure the root cause of the issue is fixed, so that the problem doesn’t resurface in a new form a few months later, and so that you address the logical ramifications of the problem.

Of course, fixing the glitch is sometimes appropriate. If you have a server down situation and 50 people sitting around twiddling their thumbs till it’s back up, you need to get that fixed immediately. However, all too often, it stops there. Rather than investigating the root cause and how to prevent it, often the reaction is “It’s working – now back to business as usual.”

The biggest issue with fixing the glitch is that it sets you up for future failure. Most of the time, the patch fails to prevent issues because it didn’t address the problem completely. Even if the patch continues working, it’s creating something that will have to be worked around in the future. We’ve all dealt with those systems that have so many patches and quick fixes that everyone is scared to make any changes to them, for fear of breaking something else.

Resolving Problems

One of my goals for this year is to avoid quick fixes, and focus on resolving the root problems. That’s sometimes painful in the short term, as you don’t get that immediate satisfaction of resolving a problem 5 minutes into the conversation. The benefit, though, is that you won’t have to keep dealing with that same problem on a recurring basis for the next 5 years, so it’s a price I’m willing to pay.

T-SQL Tuesday #13–Data Quality and the Business User

jwelch — Tue, 14 Dec 2010 17:00:00 +0000

This post is part of T-SQL Tuesday #13 – organized by Adam Machanic (Blog | Twitter), and hosted this month by Steve Jones (Blog | Twitter). The topic this month is “What issues have you had in interacting with the business to get your job done?”

Working in business intelligence, I do a lot of work with business users. For the most part, it’s an enjoyable process and a good interchange of information. I learn about the business, and in the process help the users gain new insights about their data and how it can be used to answer their questions. There is one consistent item, though, that I’ve found difficult to communicate to business users, and that’s the state of their data quality.

Data quality has a pretty big impact on the value of a business intelligence solution. Bad data quality = bad answers, and if your business intelligence solution isn’t delivering good answers, it’s not really worth much. But the data quality in a business intelligence solution depends in large part on the data being fed into it. And unfortunately, the data in a lot of source systems is not in good shape.

It’s Not Me, It’s You

It’s really difficult to communicate this to the business users, though. After all, they’ve been running their business for years on this data, and now I’m telling them it’s bad? Why haven’t people been complaining about it all along? There are a few things that I try mention at this point.

First, the level of data quality required to get good information from a BI system is very different from that required to run an operational system. For operations, there are usually some required, key pieces of information necessary. These are kept to a minimum, though, because operationally, the more data quality checks you implement, the more you impede the business process. So you want to have just enough data quality to for the system to not fall apart. You don’t really care that the customer entered the wrong zip code with their state (after all, the Post Office will work that out when you send them the bill, right?)

For BI work, though, you are flipping that around. To analyze and get meaning form the data, you need classification of the data, and some of those optional, not-so-important from an operational perspective pieces of information start coming into play. Knowing that your states actually align with your zip codes becomes pretty important if you want to display your data on a map.

Also, people probably have been complaining about data quality – they just aren’t complaining to the business users. The DBAs in most companies I’ve worked with are well aware of the issues in the company’s data. But when they complain about it, they complain to their managers or other DBAs. They don’t complain to business users, because, after all, we technology people are just supposed to make it work, right?

Can We Make This Work?

Convincing the business users of these two points can be pretty difficult, though. In the past, I’ve had to extract data, profile it, and then identify problem areas. Even then, going to a business user with a list of the 1000s of customers who have zip codes that aren’t valid for their state of residence often didn’t help. Business users can’t easily correlate the impact of those types of data issues on their reporting and analytics until they see it in the context of the actual analysis.

So, in the past, I often ended up doing a first pass on the data where I let anything in – quality data or not. Inevitably, as soon as users started looking at the numbers, they’d become much more interested in data quality. Today, tools like PowerPivot can help, because I can quickly build a functional prototype on top of the dirty data, and visually show the impact much more easily.

Once the business users understand the impact of bad quality data, they’ve always seen the value in either fixing it or putting in the appropriate safeguards to prevent it from entering the BI system in the first place. It can be challenging to communicate it, but the faster I could get this point across with the business users, the more successful the projects turned out. One of the keys things that I’ve learned from this process over time is that, while it’s difficult to show someone quality data, you can show clearly show the impact that it has on the solutions. For many business users, this communicates far better than all the talking in the world.