My New Blog

When I originally started this blog, I wanted to cover agile development in BI/DW projects, as well as technical topics about BI. However, it didn’t work out quite that way. I’ve posted a few thing here and there about agile development on this blog, but for the most part it’s been focused on technical topics (and most of those about SSIS). So, to avoid any confusion on this blog, I’ve started a second one named BIpartisan (no, it’s not about politics 🙂 ). On BIpartisan, I’ll be focusing a bit more on higher level topics around agile development and delivering value with BI, and not so much on technical ones. If you are interested in that, please check it out.

I will continue to post here as well on the more technical topics. I’ve got several posts planned, but it’s been tough finding time to put them together. Hopefully, things will settle down a bit and I’ll have more free time for blogging.

Posted in Uncategorized | Comments Off on My New Blog

Managing Scope for Iterative Development

As I mentioned in my previous post, defining an appropriate scope is difficult when you are attempting to do iterative development on BI/DW projects. The problems with this are not unique to BI/DW projects, of course. All projects have scope issues. Most of the same scope management techniques that work well on traditional application development projects also work for BI/DW projects. There are two unique aspect of scope for BI/DW projects, though. One is that there is often significant work required “behind the scenes” to deliver even small pieces of visible end user functionality. The other is that there can be significant “hidden” scope in properly cleansing the data and making it useful from a business perspective. This can be challenging because the end users may have a perception that they are not getting very much functionality, particularly in the first several iterations of the project.

What are some of the hidden aspects of BI/DW projects? ETL and data profiling are two of the most common. I consider these hidden because the end users of a BI application rarely are intimately involved in the ETL process development. They may be involved in the data profiling, but often are not involved in the data cleansing that usually accompanies it. These are time-consuming activities that the users only get to appreciate indirectly, so they often don’t put very much value on it.

How can this be addressed? I’ve found that there’s two parts to it. First, you need to do a certain amount of education with the stakeholder on what happens behind the scenes, so that they have a better understanding of the effort that has to be expended. The benefits that they get from this effort need to be explained as well. Telling them that ETL processes are time-consuming to implement isn’t very effective unless you also explain the benefits of well-implemented ETL: cleaner, consistent, conformed data, with appropriate controls to verify that the right data is going into the data warehouse, and the bad data is being excluded.

However, education is not enough by itself. The second part of it is to show them that they can get incremental benefits. Again, as pointed out in the previous article, each iteration should deliver something of value to the users. It’s important to do this from the first iteration on, and to continue to do it consistently. One effective way to determine what would be of value for an iteration is to ask the users to decide on one or two questions that they want to be able to answer. The scope of the iteration becomes delivering the information that allows them to answer those questions. But what if the questions are complex, and you don’t feel that they can be addressed in a single iteration? I’ve generally found that if the questions are that complex, you can break them up into smaller questions and negotiate with the stakeholders to deliver them over multiple iterations.

This does require that the development team on the project has an agile mindset, and is focused on meeting the deliverables for the iteration. It also poses a more significant challenge when a project is in an initial phase, and the infrastructure is still being put in place. I’ll discuss this challenge further in my next post.

In conclusion, scope management is important on all projects, not just BI/DW projects. However, perceptions of scope may be more challenging on BI/DW challenges, because of the hidden nature of some of the activities. It’s important to communicate the value of these activities to the project stakeholders, and to demonstrate that you can consistently produce incremental deliverables, while still carrying out these valuable activities.

As always, comments and feedback are welcome.

Posted in Uncategorized | Comments Off on Managing Scope for Iterative Development

SQL Server Launch Event

The launch event in Charlotte went very well – we had over 100 people in attendance, and the sessions were packed. We had a lot of good information on the new features in SQL Server 2008, covering the new T-SQL features, server consolidation, dimensional modeling, Analysis Services, and Reporting Services.

If you have follow up questions from any of the presentations, or are interested in seeing more information on the topics, send me an email and we can see about lining the presenter up for a more in-depth presentation.

Posted in Uncategorized | Comments Off on SQL Server Launch Event

Challenges to an Iterative Approach to Business Intelligence

I’m a fan of agile development. Prior to focusing on business intelligence and data warehousing, I architected and developed client server and n-tier applications, and I found that agile development techniques delivered better end results. I believe that, in large part, this came about because of the focus on smaller, functional iterations over a more traditional waterfall approach. However, this approach is still not regularly used on BI projects. Since it can have a big impact on the success of your BI initiatives, I’d like to list some of the challenges that prevent adoption of this approach. I’ll go into more detail on each of these in my next few posts, and also cover some of the benefits you can see if you overcome the challenges.

First, though, let me define what I mean by an iteration1. An iteration is a short development cycle that takes a specific set of requirements and delivers working, useful functionality. Generally, I think the duration of an iteration should be 2 weeks to a month, but I don’t consider that an absolute rule. More than 2 months for an iteration, though, really impacts the agility of the project, so I prefer to keep them shorter. Delivering working functionality in an iteration doesn’t necessarily mean that it has to go to production. It does mean that it could go to production, if the project stakeholders choose to do that. Delivering useful functionality means that what’s been developed actually meets a stakeholder’s need.

There are a number of challenges that you might encounter in doing iterative development for BI. Fortunately, you can work around these, though it’s not always easy.

  1. Scope
    Many BI/DW initiatives are large and involve multiple systems and departments. Even smaller BI projects often have multiple audiences with differing needs and wants. Managing the scope of the effort appropriately can be challenging in that environment. This means that defining and managing scope is critical to successful iterative development.
  2. Foundation development
    Particularly when BI projects are getting off the ground, they need a lot of foundational setup – environments, software installations, data profiling, and data cleansing. This poses definite problems for the first couple of iterations, especially when you take into account the guideline of delivering working and useful functionality with each iteration. The foundation needs to be built in conjunction with delivering useful functionality.
  3. Existing infrastructure and architecture
    Iterative BI development requires an infrastructure for development that is flexible and adaptive, and it makes it much easier if existing solutions were architected to be easily modified. Sadly, this is not the case in many organizations. Existing process and infrastructure tends to be rigid and not support or encourage rapid development. Existing data warehouses tend to be monolithic applications that are difficult to modify to address changing business needs. And many BI development tools do not provide adequate support for rapidly changing BI applications.
  4. Changing requirements2
    Changing requirements is a fact of life in most IT projects, and it’s definitely the case in BI/DW projects. While agile and iterative development can help address changing requirements, it still poses a challenge. As requirements shift, they can have a ripple effect on the BI infrastructure – adding a new field to a report may require changes to the database and ETL processes in addition to the report itself. This can make seemingly inconsequential changes take much longer than expected, and lower the adaptability of the project.
  5. Design 2
    Due to the scope and complexity of many BI/DW initiatives, there is a tendency to get bogged down in design. This is particularly true when you consider that the database is a key part of BI/DW projects, and many BI developers feel most comfortable having a complete and stable data model before beginning development (which is perfectly reasonable). However, design by itself does not produce working functionality, so an iteration that involves nothing but design doesn’t really meet my definition of iterative.

After looking at all these challenges, you may feel that it’s pointless to try iterative development on BI projects. However, I’m happy to say that all these items can be addressed (granted, some of them are much more easily fixed than others). If you do make some changes, you can get a number of benefits, including the ability to rapidly accommodate change, deliver working functionality more quickly, and facilitate emergent requirements and design. The end result? Happy project stakeholders who are more satisfied with the results of their projects, and less wear and tear on the BI developers. Everybody wins!

Over the next couple of months, I’ll be posting more information about the challenges, and how you can work around them. Please keep reading, and if you have questions or comments, don’t hesitate to get in touch with me.

 

Notes
1. If you are familiar with Scrum, you will notice some similarities between what I describe as an iteration and a sprint. Scrum has influenced my thinking on agile methodologies quite a bit, and I’m a big fan of it. However, because I don’t follow the definitions in Scrum rigidly, I’ve found it better to not use the same terminology to avoid confusion. If I do refer to a sprint, though, I will be referring to the Scrum definition (a 30-day increment, resulting in potentially shippable product).
2. Please don’t take the above to mean that I don’t believe in requirements or design – I feel that both are vital to iterative development. However, the approach that many BI practitioners take to requirements and design does not lend itself to iterative development.

Posted in Uncategorized | Comments Off on Challenges to an Iterative Approach to Business Intelligence

SQL Server 2008 Firestarter Event

There is a SQL Server 2008 Firestarter event at the Charlotte Microsoft campus next Thursday, January 15th. If you want to learn more about the new features in 2008, or just get a chance to network, it’s a great opportunity. We’ll have 5 SQL Server MVPs present, so it will be a good chance to ask some questions as well. If you are interested, please register at this link: http://msevents.microsoft.com/CUI/EventDetail.aspx?EventID=1032400468&Culture=en-US

Posted in Uncategorized | Comments Off on SQL Server 2008 Firestarter Event

Analysis Services Error: The attribute key cannot be found when processing a dimension

I was processing a SSAS database to test some aggregations today, and I noticed some errors and came across some unexpected behavior. 


When I was processing the User dimension (among others), I got an error similar to the following:


Errors in the OLAP storage engine: The attribute key cannot be found when processing: Table: ‘dbo_DimUser’, Column: ‘Country’, Value: ‘US’; Table: ‘dbo_DimUser’, Column: ‘Region’, Value: ‘NY’; Table: ‘dbo_DimUser’, Column: ‘City’, Value: ‘Albany ‘. The attribute is ‘City’. Errors in the OLAP storage engine: The attribute key was converted to an unknown member because the attribute key was not found. Attribute User of Dimension: User from Database: OfficeLive, Record: 1234.


Now, if I was processing a cube and saw this, I would immediately think a referential integrity issue.  Not so in this case.  So I looked at the missing key value, in this case ‘Albany ‘.  The string had some trailing spaces… but that shouldn’t be a big deal, since the Key property in BIDS was set to perform right trimming. 


But with a little sleuthing, I found something interesting:  the error message was <gasp> a lie!  When I looked at the database, it wasn’t a trailing space (which is what appeared in the error message), it was a trailing tab.  The character were automagically converted in the error message.  So it wasn’t getting trimmed and matching the already existing ‘Albany’ key.  The weird thing was that it was just failing, because of the tab character in the key.   I actually would have expected to see two entries for ‘Albany’ (one with some whitespace, if you looked hard enough) in the dimension.  I actually prefer this, but it did take me a little by surprise…


Cheers,


David

Posted in Uncategorized | Comments Off on Analysis Services Error: The attribute key cannot be found when processing a dimension

Formatting Numbers in SSRS (and SSAS)

This is a little off of my usual SSIS posts, but I ran across an interesting item. It’s probably well-known to people who work with format strings on a regular basis. However, I’ve used them for a while, and just ran across this the other day, so I thought I’d share it.

I’ve been working with SSRS recently, using an SSAS cube as a data source. The cube had some large numbers in it, in the 10s of millions (10,000,000). It’s not uncommon to translate these large numbers to display as thousands or millions, by moving the decimal place to the left. This saves space on the report, and can make it easier to interpret when dealing with large numbers.

In the past, I’ve usually divided the number by 1,000 to move the decimal place. However, I found out that you can use format strings to accomplish the same thing a little more gracefully. For example, you might have a format string that looks like $#,### to format the number and an expression on the value of the text box that divides the number by 1,000 (=FIelds!MyColumn.Value / 1000). When setting the format string for the textboxes in Reporting Services, you can put two commas together to tell SSRS to divide the by 1,000, instead of using an expression, like this: $#,,

The same technique can be used in format strings for measures in SSAS. Again, this is probably common knowledge for a lot of people, but it was new for me, so I thought I’d share it here.

Posted in Uncategorized | Comments Off on Formatting Numbers in SSRS (and SSAS)

New Features for Visual Studio Team System 2008 Database Edition

Today I was creating a new project in Studio Team System 2008 Database Edition, and needed to import a SQL Server 2008 DB.  I went looking for a SQL Server 2008 template for Visual Studio… instead, I found the Microsoft® Visual Studio Team System 2008 Database Edition GDR that was release a few weeks ago.  With this General Distribution Release (GDR), you get support for SQL Server 2008 as well as a number of cool features.  If you’re using VSTS Database Edition, definitely check it out…

Posted in Uncategorized | Comments Off on New Features for Visual Studio Team System 2008 Database Edition

SQL Load Generator Tool

A few weeks ago I spoke at PASS on using the Resource Governor in SQL Server 2008.  For my demo, I created a little application that could fire off multiple queries against SQL Server to simulate different users/applications.  This tool… gasp… creates a load on SQL Server.  People seemed interested in the application, so I’ve decided to release it on CodePlex.  You can find the project here.


For anyone that saw that demo, this tool has gotten a a bit of a face lift… I sort of got bored over the holidays, and added a few features that I wished I had when I was using it to demo, as well as just playing around.  It’s nothing fancy… I just put it together one Saturday for my presentation, then gave into some feeping creaturism a couple of nights.


I originally created this to test settings for the SQL Server 2008 Resource Governor.   You can also use it as a (very) light weight load testing tool, but no promises on that.  I’m not planning on adding any features to it, or supporting it much, but I thought I’d put it out there.


Here’s what it looks like:


SqlLoadGenerator1


Here’s a summary of the features:



  • Runs multiple queries against SQL Server.  You can add as many as you like.

  • Each query can be either a SQL User or Domain User.

  • You can specify an Application name for the connection.

  • You can specify the new of concurrent threads to use for each query.

  • You can start all queries, stop all queries, remove all queries.

  • There is logging (you can toggle on and off… it isn’t precisely thread safe, and can cause crashes when there are lots of failures on multiple threads) for failed queries.

  • You can set all the defaults on a per user basis, and persist them.

  • Each query has a # of Runs and a # of Fails counter.  You can use the ‘Reset Counters’ feature to reset the total counts (not the per query counts).

You can save your settings via the ‘Options’ menu.  You can add default items to the different dropdowns, provide default query settings, change the log locations, etc.  You can also modify the stock connection string… though keep in mind some of the settings (particularly ‘pooling=false’ will affect the way the application works… namely, the connections to SQL Server won’t be closed). 


SqlLoadGenerator2 


That’s about it… enjoy!


David

Posted in Uncategorized | Comments Off on SQL Load Generator Tool

SSIS Event Handlers

There was a question recently by a fellow MVP, who was wondering if the event handlers in SSIS were fired synchronously or asynchronously. The answer (which I double-checked with Matt Masson) is that they are synchronous. When events are fired asynchronously, they are queued up, and the system runs them as it gets to them. The main thread of the program may continue executing while the events are firing. However, when they are run synchronously, the main thread is paused while the event is executed. Events are not queued up, they are executed as soon as they fired.

What does this mean for your SSIS packages? Well, for the most part it reduces potential issues that can be caused by events firing out of sequence. There can also be threading issues with asynchronous events. So generally using synchronous events makes the programming model a little simpler. It also means that any work done in the event handler will be complete prior to the main thread picking back up. For an example, take the following package, which performs some setup prior to running a data flow. Assuming the setup runs correctly, the data flow should be run.

image

The Perform Setup Work has an OnPostExecute event handler that verifies that setup was performed properly, and sets the value of a variable to 1 or 0, depending on whether it was successful or not.

image

The expression on the precedence constraint checks the value of the variable, to determine whether the data flow should be executed.

image

Since the event handler is executed synchronously, you can count on any work that is done in the event handler being complete before execution moves on from that task. That includes the setting of any variables in the package, which makes the above approach possible. If the event handlers were run asynchronously, you would not be able to count on the variable being set before the expression on the precedence constraint was checked.

However, because your main package execution will be paused while event handlers are firing, you’ll want to make sure that any code in the event handlers runs quickly. You can’t use the event handlers to queue up long running processes, as you can in some languages.

Event handlers are a very powerful, but often underutilized, feature in SSIS. Hopefully this additional information will make them a little easier to use.

Posted in Uncategorized | Comments Off on SSIS Event Handlers