Archive for January 2009

What’s Happening on SQL Server When My Package Fails?

We encountered an issue recently in an ETL process during the extract portion. When extracting from one very large table, using a NOLOCK hint, we were encountering errors indicating that the data was changing while we were extracting it. Since no other processes were supposed to be using the table during the same time window as the ETL, this was a source of some confusion. Since there was nothing indicating what, if anything, was changing the data, we needed a quick way to see what was going on in the database at the point the package failed.

Normally, when troubleshooting database activity problems, I’d just run sp_who or sp_who2. However, since the problem was occurring between 2 and 3 am, nobody was terribly excited around getting up to monitor the server. We didn’t really want to leave Profiler running all night either. So, we set up the package to log the current database activity whenever an error occurred. Basically, in the OnError event handler, we just added a data flow. The data flow runs the sp_who stored procedure to get the current activity on the database, and saves it to a text file.

This was pretty simple, and I’ve created a sample package to show it. The main control flow has a script task that generates an error, purely for the purpose of firing the OnError event handler.

image

The event handler does a check to make sure this is the first time it’s fired, since we didn’t want to log the activity multiple times. It then runs a data flow, and once it completes, sets a flag to ensure it doesn’t doesn’t fire again.

image

The data flow runs sp_who. While sp_who2 gives better information, it doesn’t provide metadata in an easily consumable format for SSIS. sp_who, on the other hand, works right off, so we went with it for simplicity. The output is sent to a text file.

image

It was pretty simple to implement, and it served the purpose. By reviewing the information logged, we could tell what other processes were running, and what computer it originated from. If you ever encounter a similar situation, hopefully this can help. The template package is located on my SkyDrive.

Impact of the PerformancePoint Server Changes

If you follow business intelligence news at all, then you probably saw the news from Microsoft last week that PerformancePoint is becoming a component of SharePoint. However, it won’t be all of PerformancePoint – the Plan portion will see one additional service pack (SP3), then development will cease. The Monitor and Analyze portions of the product will become part of the SharePoint Enterprise license.


Reaction has been mixed. Generally, many people see the advantage in including the Monitor and Analyze functionality in SharePoint, as it will open that functionality to a much broader audience. This lines up nicely with Microsoft’s “BI for the masses” vision that they have working toward for several years. It also lines up with the more recent marketing message, “People-ready BI”. Seeing that SharePoint is becoming the place that many users go to do their work, it makes sense to incorporate their BI tools in the same location. I think that offering PerformancePoint Services (the new name for the Monitor and Analyze functionality under SharePoint) as part of SharePoint will make it easier to include BI functionality in new applications and lower the barrier to adoption of this functionality in organizations of all sizes.


The negative reactions are primarily around two things: discontinuing Plan, and not having a full-client story (besides Excel). I understand the reactions around discontinuing Plan. Version 1 had some rough edges (OK, a lot of rough edges), but Microsoft has a history of quickly releasing subsequent versions with much better functionality, and usually having a very good product by version 3. Breaking this pattern caught a lot of people by surprise. Version 1, while lacking in a few key areas, was definitely usable. Some of Microsoft’s customers are using it in production, and even more partners had made significant investments in it. Fortunately, while Mariner had done some work with it, we had not invested heavily in it. We were more focused on the Monitor and Analyze portions of the product. In part, this was because we recognized that performance management is a specialized discipline, requiring some specific skill sets. Just because you can deliver successful solutions on Microsoft technology doesn’t necessarily mean that you can deliver successful performance management solutions. I think that was a point of confusion for many partners (the “one stop shop” approach is very popular in the partner community), and that lead to Microsoft not having as strong of a partner base to support the product as they had hoped. On the other hand, there were some really strong partners in the performance management space who did some great things with Plan, and I can certainly empathize with those that made big investments and are now disappointed by the change in direction.


Mauro Cardarelli, a SharePoint MVP, had an interesting post on his concerns that making PerformancePoint available as part of SharePoint raises the same concerns. Competent  delivery of SharePoint solutions doesn’t necessarily correlate to competent delivery of BI functionality, and successful delivery of BI solutions doesn’t mean that you can deliver good SharePoint solutions. Since this was one of the challenges for Plan, it will be interesting to see how it plays out going forward. In the short term, I’d encourage companies to be sure that their vendors either have both sets of skills (and can demonstrate that they’ve used them in the same project), or look for best-of-breed partners who are willing to work together.


The full-client story is a concern. The current direction seems to be for Excel to become the full client for consuming Analysis Services data, and for SharePoint to become the thin client interface. I’m definitely in favor of SharePoint as the thin-client interface, but using Excel as the full client leaves a pretty big gap in the story. It used to be that you could recommend ProClarity desktop to fill that gap, but since ProClarity is in a support only mode now, that’s not a good option. In time, more of the functionality of ProClarity should surface in Excel and SharePoint, but that’s still some time off. And Excel, while improving as an Analysis Services client, is still not on par with a dedicated desktop client built to expose the full functionality of Analysis Services. Hopefully that will improve over the next couple of releases of Excel, but in the meantime it creates opportunities for third parties to fill the gap.


Overall, I think this move will promote broader adoption on the Monitor and Analyze functionality in Microsoft’s customer base, and will strengthen the value proposition for moving to SharePoint Enterprise licenses. It’s a good thing for Microsoft, and good for customers who have already invested in SharePoint. However, it remains to be seen what impact not having a planning component or a strong full client application in the BI stack will have.


Some other reactions from around the web:


Chris Web (he also has a set of links featuring other reactions)


Nigel Pendse


Cindi Howson

My New Blog

When I originally started this blog, I wanted to cover agile development in BI/DW projects, as well as technical topics about BI. However, it didn’t work out quite that way. I’ve posted a few thing here and there about agile development on this blog, but for the most part it’s been focused on technical topics (and most of those about SSIS). So, to avoid any confusion on this blog, I’ve started a second one named BIpartisan (no, it’s not about politics :) ). On BIpartisan, I’ll be focusing a bit more on higher level topics around agile development and delivering value with BI, and not so much on technical ones. If you are interested in that, please check it out.

I will continue to post here as well on the more technical topics. I’ve got several posts planned, but it’s been tough finding time to put them together. Hopefully, things will settle down a bit and I’ll have more free time for blogging.

Managing Scope for Iterative Development

As I mentioned in my previous post, defining an appropriate scope is difficult when you are attempting to do iterative development on BI/DW projects. The problems with this are not unique to BI/DW projects, of course. All projects have scope issues. Most of the same scope management techniques that work well on traditional application development projects also work for BI/DW projects. There are two unique aspect of scope for BI/DW projects, though. One is that there is often significant work required “behind the scenes” to deliver even small pieces of visible end user functionality. The other is that there can be significant “hidden” scope in properly cleansing the data and making it useful from a business perspective. This can be challenging because the end users may have a perception that they are not getting very much functionality, particularly in the first several iterations of the project.

What are some of the hidden aspects of BI/DW projects? ETL and data profiling are two of the most common. I consider these hidden because the end users of a BI application rarely are intimately involved in the ETL process development. They may be involved in the data profiling, but often are not involved in the data cleansing that usually accompanies it. These are time-consuming activities that the users only get to appreciate indirectly, so they often don’t put very much value on it.

How can this be addressed? I’ve found that there’s two parts to it. First, you need to do a certain amount of education with the stakeholder on what happens behind the scenes, so that they have a better understanding of the effort that has to be expended. The benefits that they get from this effort need to be explained as well. Telling them that ETL processes are time-consuming to implement isn’t very effective unless you also explain the benefits of well-implemented ETL: cleaner, consistent, conformed data, with appropriate controls to verify that the right data is going into the data warehouse, and the bad data is being excluded.

However, education is not enough by itself. The second part of it is to show them that they can get incremental benefits. Again, as pointed out in the previous article, each iteration should deliver something of value to the users. It’s important to do this from the first iteration on, and to continue to do it consistently. One effective way to determine what would be of value for an iteration is to ask the users to decide on one or two questions that they want to be able to answer. The scope of the iteration becomes delivering the information that allows them to answer those questions. But what if the questions are complex, and you don’t feel that they can be addressed in a single iteration? I’ve generally found that if the questions are that complex, you can break them up into smaller questions and negotiate with the stakeholders to deliver them over multiple iterations.

This does require that the development team on the project has an agile mindset, and is focused on meeting the deliverables for the iteration. It also poses a more significant challenge when a project is in an initial phase, and the infrastructure is still being put in place. I’ll discuss this challenge further in my next post.

In conclusion, scope management is important on all projects, not just BI/DW projects. However, perceptions of scope may be more challenging on BI/DW challenges, because of the hidden nature of some of the activities. It’s important to communicate the value of these activities to the project stakeholders, and to demonstrate that you can consistently produce incremental deliverables, while still carrying out these valuable activities.

As always, comments and feedback are welcome.

SQL Server Launch Event

The launch event in Charlotte went very well – we had over 100 people in attendance, and the sessions were packed. We had a lot of good information on the new features in SQL Server 2008, covering the new T-SQL features, server consolidation, dimensional modeling, Analysis Services, and Reporting Services.

If you have follow up questions from any of the presentations, or are interested in seeing more information on the topics, send me an email and we can see about lining the presenter up for a more in-depth presentation.

Challenges to an Iterative Approach to Business Intelligence

I’m a fan of agile development. Prior to focusing on business intelligence and data warehousing, I architected and developed client server and n-tier applications, and I found that agile development techniques delivered better end results. I believe that, in large part, this came about because of the focus on smaller, functional iterations over a more traditional waterfall approach. However, this approach is still not regularly used on BI projects. Since it can have a big impact on the success of your BI initiatives, I’d like to list some of the challenges that prevent adoption of this approach. I’ll go into more detail on each of these in my next few posts, and also cover some of the benefits you can see if you overcome the challenges.

First, though, let me define what I mean by an iteration1. An iteration is a short development cycle that takes a specific set of requirements and delivers working, useful functionality. Generally, I think the duration of an iteration should be 2 weeks to a month, but I don’t consider that an absolute rule. More than 2 months for an iteration, though, really impacts the agility of the project, so I prefer to keep them shorter. Delivering working functionality in an iteration doesn’t necessarily mean that it has to go to production. It does mean that it could go to production, if the project stakeholders choose to do that. Delivering useful functionality means that what’s been developed actually meets a stakeholder’s need.

There are a number of challenges that you might encounter in doing iterative development for BI. Fortunately, you can work around these, though it’s not always easy.

  1. Scope
    Many BI/DW initiatives are large and involve multiple systems and departments. Even smaller BI projects often have multiple audiences with differing needs and wants. Managing the scope of the effort appropriately can be challenging in that environment. This means that defining and managing scope is critical to successful iterative development.
  2. Foundation development
    Particularly when BI projects are getting off the ground, they need a lot of foundational setup – environments, software installations, data profiling, and data cleansing. This poses definite problems for the first couple of iterations, especially when you take into account the guideline of delivering working and useful functionality with each iteration. The foundation needs to be built in conjunction with delivering useful functionality.
  3. Existing infrastructure and architecture
    Iterative BI development requires an infrastructure for development that is flexible and adaptive, and it makes it much easier if existing solutions were architected to be easily modified. Sadly, this is not the case in many organizations. Existing process and infrastructure tends to be rigid and not support or encourage rapid development. Existing data warehouses tend to be monolithic applications that are difficult to modify to address changing business needs. And many BI development tools do not provide adequate support for rapidly changing BI applications.
  4. Changing requirements2
    Changing requirements is a fact of life in most IT projects, and it’s definitely the case in BI/DW projects. While agile and iterative development can help address changing requirements, it still poses a challenge. As requirements shift, they can have a ripple effect on the BI infrastructure – adding a new field to a report may require changes to the database and ETL processes in addition to the report itself. This can make seemingly inconsequential changes take much longer than expected, and lower the adaptability of the project.
  5. Design 2
    Due to the scope and complexity of many BI/DW initiatives, there is a tendency to get bogged down in design. This is particularly true when you consider that the database is a key part of BI/DW projects, and many BI developers feel most comfortable having a complete and stable data model before beginning development (which is perfectly reasonable). However, design by itself does not produce working functionality, so an iteration that involves nothing but design doesn’t really meet my definition of iterative.

After looking at all these challenges, you may feel that it’s pointless to try iterative development on BI projects. However, I’m happy to say that all these items can be addressed (granted, some of them are much more easily fixed than others). If you do make some changes, you can get a number of benefits, including the ability to rapidly accommodate change, deliver working functionality more quickly, and facilitate emergent requirements and design. The end result? Happy project stakeholders who are more satisfied with the results of their projects, and less wear and tear on the BI developers. Everybody wins!

Over the next couple of months, I’ll be posting more information about the challenges, and how you can work around them. Please keep reading, and if you have questions or comments, don’t hesitate to get in touch with me.

 

Notes
1. If you are familiar with Scrum, you will notice some similarities between what I describe as an iteration and a sprint. Scrum has influenced my thinking on agile methodologies quite a bit, and I’m a big fan of it. However, because I don’t follow the definitions in Scrum rigidly, I’ve found it better to not use the same terminology to avoid confusion. If I do refer to a sprint, though, I will be referring to the Scrum definition (a 30-day increment, resulting in potentially shippable product).
2. Please don’t take the above to mean that I don’t believe in requirements or design – I feel that both are vital to iterative development. However, the approach that many BI practitioners take to requirements and design does not lend itself to iterative development.

SQL Server 2008 Firestarter Event

There is a SQL Server 2008 Firestarter event at the Charlotte Microsoft campus next Thursday, January 15th. If you want to learn more about the new features in 2008, or just get a chance to network, it’s a great opportunity. We’ll have 5 SQL Server MVPs present, so it will be a good chance to ask some questions as well. If you are interested, please register at this link: http://msevents.microsoft.com/CUI/EventDetail.aspx?EventID=1032400468&Culture=en-US