Agile BI

Updating Custom Assembly References in SSIS

Posted on March 27, 2009 by jwelch

Just a quick tip related to a recent problem I had with a custom task I developed for SSIS. I had to recompile it recently, and in the process, managed to change the public key associated with the assembly. Since SSIS uses strong names (which include the public key) to reference custom tasks, this caused a bit of a problem. None of the packages that used the custom task would execute properly, and I really didn’t want to go through 20+ packages and delete the task, then add it back and try to remember how it was set up in each package. Fortunately, fixing this was much simpler. I opened one of packages in the XML view, located the original strong name (which looks something like this “Mariner.Pamlico.ETL.ControlFlowTasks.CustomTask, Mariner.Pamlico.ETL.ControlFlowTasks, Version=1.0.0.0, Culture=neutral, PublicKeyToken=73ea249dd43ab42e”), and did a search and replace on all the packages to update the reference with the new public key. I could then execute all my packages without any problems, and they all retained their original settings.

Like I said, just a quick tip if you ever have to update multiple packages to reference a new version of a custom task or component. A few words of warning – before updating the XML of a package directly, it’s always a good idea to make a backup. Also, this works as long as you haven’t changed the public interface for the task. If you have removed properties, it’s likely to cause new and interesting errors when you attempt to open the package in BIDS.

Posted in Uncategorized | Comments Off

SSWUG Business Intelligence Virtual Conference and SDS

Posted on March 27, 2009 by jwelch

It’s been a busy first quarter, but I should have a little more time available to blog now.

I’ve just completed shooting my sessions for the SSWUG Business Intelligence Virtual Conference, which has been one of the things occupying my time recently. I had a few changes in my lineup of presentations. With the recent announcements about changes to SQL Data Services (SDS), there was really no reason to present the sessions on using SSIS and SSRS against SDS. If you haven’t seen the changes, you can review them here, but as a quick overview, Microsoft is eliminating the ACE model and the requirement to access SDS through SOAP or REST. Instead, they will offer traditional TDS access to SDS, meaning that accessing a database in the cloud will be a very similar experience to accessing an on-premise SQL Server. In fact, many tools, like SSRS and SSIS, that required custom extensions in order to use SDS under the old model, should work out of the box with the new model. So, there really wasn’t much point in presenting how to do something that won’t be necessary any longer.

So, I ended up still doing the SDS 101 session, which gives an overview of the changes to the service, and some reasons why you might want to look into this for new applications. I also did two other sessions. “Doing More (ETL) With Less (Effort) by Automating SSIS” focuses on how to build packages programmatically, and focuses on some of the simpler options for accomplishing this, like EzAPI. “Processing Flat Files with SSIS” runs through many of the common issues developers encounter when using SSIS against flat files, and shows a number of ways to work around those problems. If you have trouble working with flat files, this would be a good session to attend.

You can register for the conference here, and if you use the code SPVJWESP09 when you register, you should get $10 off the registration fee.

Posted in Uncategorized | Comments Off

Scale Up or Scale Out for SQL Server Data Warehouses

Posted on March 1, 2009 by jwelch

Historically, scale up has been the model for Microsoft data warehouses. Running a large, multi-terabyte data warehouse meant buying a lot of hardware for a single server, and hoping that it would be enough, once the warehouse was fully loaded and under use. If the hardware wasn’t sized properly, you could be looking at big costs for purchasing a new server, with more capacity for memory, disk, and CPUs.

Over the past several months, though, there have been a number of announcements in the SQL Server space that change that. We now have the option of scaling our warehouses up or out. Project “Madison”, which is the integration of the massively parallel processing (MPP) technologies from the DATAllegro acquisition, promises to allow SQL Server 2008 to scale out to 100s of terabytes in the warehouse, by distributing processing among multiple commodity servers. Even though it’s not been officially released yet, I’ve seen several demos of the functionality, and it looks promising. The advantage of this approach is that as you need additional capacity, you simply add additional servers.

On the scale up front, last week Microsoft announced “SQL Server Fast Track Data Warehouse”, which is a set of reference architectures for symmetrical multi processing (SMP) data warehousing. These are single server configurations that are optimized for data warehousing workloads, and have been tested and validated. These take much of the guesswork out of sizing your data warehouse server. However, you still have to provide good estimates of query volume and size to use the reference architectures effectively.

So now the question becomes, should you target a scale up or scale out approach for your data warehouse? One of the deciding factors is going to be your data volume. The Fast Track reference architectures are currently targeted towards 4 to 32 terabyte warehouses. Given current hardware restrictions, that’s the practical limit for a single server. However, as the hardware continues to get better, that number is expected to go up. “Madison”, on the other hand, can scale well past 32 terabytes. So if your current data needs are greater than 32 terabytes, I’d be looking closely at “Madison”.

What if your current needs are less than 32 terabytes, but you expect to grow past that point over the next couple of years? Well, fortunately, the Fast Track reference architectures are designed to offer an easy transition to “Madison”, when your needs grow to that point. And if you expect your data volumes to stay below the 32 terabyte mark, then the Fast Track reference architectures certainly offer a greater degree of confidence that you are getting the appropriate configuration for your warehouse.

It’s always nice to have options, and improving the scaling abilities of SQL Server should certainly help Microsoft in the large data warehouse marketplace. However, the roadmap for how this might apply to the Analysis Services component of SQL Server hasn’t really been directly addressed yet. It would seem logical to offer the same sort of solutions in that space. It will be interesting to see which direction Microsoft takes on that.

Posted in Uncategorized | Comments Off

Check for SQL Agent Job Completion

Posted on February 20, 2009 by jwelch

Recently, a user on the MSDN forums asked how to check an Agent job from SSIS to see if it’s completed. Once it had completed, additional steps need to be performed, for example, archiving flat files. However, this couldn’t be done till the job completed. Fortunately, this isn’t too difficult to accomplish in SSIS.

I put together the following sample package to illustrate this. Basically, the package starts a job, then goes into a loop to see if the package has completed.

The Start Job Execute SQL Task uses the sp_start_job stored procedure to start an Agent job. sp_start_job is asynchronous, that it, it does not wait for the job to complete, but returns right away. This sample is just running a job that pauses execution for several seconds before completing.

The For Loop container is configured to look at the @executionStatus variable. The initial expression (InitExpression) sets the value of the variable to 1 at the beginning of the loop. The loop will continue executing until the EvalExpression is false, in this case when @executionStatus is no longer equal to 1.

The first step inside the loop is to delay for a second. This allows the job to get started initially, and prevents the package from bombarding the server with requests. The delay is implemented using a WAITFOR statement in the Execute SQL Task.

Finally, another Execute SQL Task is used to check the current status of the job. This is done by calling sp_help_job, which returns, among other things, the current_execution_status column. The value in this column indicates what the job is doing currently. 1 indicates that it is currently executing, so when the value is no longer 1, it’s time to exit the loop. The value of the current_execution_status column is assigned to the @executionStatus variable, and the loop repeats as long as the value is 1.

This is the Result Set page, where the column is assigned to the variable.

As I mentioned, this is fairly straightforward to implement. The one wrinkle I found is that you do need some amount of delay between calling sp_start_job and checking the execution status. If you do it immediately, it’s likely that the job hasn’t started yet, and the loop will exit immediately. On my system, introducing the 1 second delay gave the job time to actually get started. Also, you’ll need to alter the Agent job being called to match your system.

The sample package is on my Skydrive. Let me know if you have questions or feedback.

Posted in Uncategorized | Comments Off

Presenting at the Spring SSWUG Business Intelligence Virtual Conference

Posted on February 20, 2009 by jwelch

I’m going to be doing 3 sessions at the upcoming SSWUG BI Virtual Conference. If you are interested in getting an introduction to SQL Data Services (SDS), I’ll be doing an introductory session on it, and then covering how it can be used from SSIS and SSRS. SDS is pretty interesting, but pretty different from standard SQL Server.

There’s a great lineup of speakers and session for the conference – you can check them out here. If you want to sign up, you can use this code: SPVJWESP09 for $10 off the registration fee.

Posted in Uncategorized | Comments Off

Developing a Foundation While Doing Iterative Development

Posted on February 17, 2009 by jwelch

In my initial post about doing iterative development for BI/DW projects, I mentioned that one of the challenges was developing a solid foundation while doing iterative development, especially in the first few iterations. If you are started from scratch on a new BI initiative, there is often a lot of work to do in getting environments established, developing processes for moving data and code between environments, and exploring and validating the source data to be used. Unfortunately, most of this work does not result in deliverables that an end user would consider valuable. Since, as part of each iteration, you want to have the possibility of delivering working functionality to the stakeholders, this can present a problem.

Since most end users consider working functionality to be something that they can see in a nice user interface, you need to look at ways to minimize the development time required to present data to the end user. Some of the common time-consuming tasks in the first couple of iterations are:

Establishing the environments
Exploring and validating the source data
Developing ETL processes to move and cleanse the data

There are really no quick workarounds to setting up the environments. In fact, I’ve usually found that taking shortcuts on the environments leads to much bigger problems down the road. However, what can be effective is to minimize the number of of environments that you deal with in each iteration. While theoretically you should be able to deploy to production in the first iteration of a project, it’s rare that this is actually needed. So instead of creating a development, QA, and production environment, consider only establishing the development and QA environments. I do think that having at least two environments is important, so that you can begin validating your deployment procedures.

Exploring and validating the source data is definitely important. In the first couple of iterations, though, it’s often necessary to limit and restrict what you explore. For example, a project I was involved in recently had some very serious issues with data quality. The source database did not enforce referential integrity, so a large percentage of the data in the source was not related to the rest of the data correctly. Rather than derailing the current iteration to completely research and resolve the data quality issues, the project team and the stakeholders made the decision to only import data that was related correctly. This enabled the project team to still present a set of working reports to the stakeholders at the end of the iteration, rather than not being able to demonstrate any working functionality. The subsequent iterations were adjusted to better reflect the level of data quality.

ETL processes can be time-consuming to develop, particularly if the organization does not already have a framework in place for the ETL processes. In the first couple of iterations, an alternative approach is to load data in a more manual fashion, using SQL scripts or direct manipulation to get an initial set of data populated. This has a couple of benefits. One, it allows the time for building a full ETL process to be spread across multiple iterations. Two, it allows the end users to get a look at the data (in a friendly user interface, if you follow the advice above) and validate that it is correct.

A key part of developing the foundation, while still providing value in each iteration, is accepting that you can’t do it all in a single iteration. The foundation development will have to be spread across multiple iterations, and it is acceptable to build some scaffolding code in order to deliver functionality in each iteration. Clearly, you want to minimize that amount of code that can’t be reused. Generally, I’ve found that with the Microsoft toolset for BI, it’s pretty easy to build incrementally, with minimal rework. However, even if your tools don’t support this as well, in my experience the downsides of having some code that gets replaced in a later iteration are far outweighed by the benefits of being able to demonstrate tangible progress to the stakeholders in each iteration of the project.

Posted in Uncategorized | Comments Off

Code Snippets for SSIS 2008

Posted on February 3, 2009 by jwelch

If you develop in Visual Studio using C# or VB, then you are probably familiar with code snippets. They are little chunks of code that you can insert into your application and save you some typing. For example, if you are writing a C# application, and you type “if” and hit TAB twice, the following will be inserted into your code:

You can enter in the value to be evaluated, and then type in the rest of your code. I like code snippets because they save me from typing the same things over and over again. However, in SSIS 2005 scripting, code snippets weren’t available, due to the VSA environment used by the script objects. Now that we have a real scripting environment in SSIS 2008, code snippets can be used, although there are a few things you need to know.

Two of the most common activities I do in SSIS scripts are reading from and writing to variables. So, I put together a few code snippets that do this. Here’s what the snippet for reading a value from a variable looks like:

To create a snippet, you define the code snippet in an XML format, and save it to a file with a .snippet extension. I’m not going to post the code for the snippet here, as it is fairly verbose, but you can look at the .snippet file in any text editor. Once you have the file, you need to put it in a specific location for the Visual Studio environment to recognize it. This is where it can get a little confusing. Normally, snippets go into the “C:Users<user>DocumentsVisual Studio 2008Code Snippets<language>My Code Snippets” folder. However, this is the location used for the full Visual Studio environment. Since the script editors in SSIS use the Visual Studio Tools for Applications environments, they get their own folder. In fact, there are separate folders for the Script Task and the Script Component. Code snippets for the Script Task go into the “C:Users<user>DocumentsIntegration Services Script TaskCode Snippets<language>My Code Snippets” folder, and snippets for the Script Component go into the “C:Users<user>DocumentsIntegration Services Script ComponentCode Snippets<language>My Code Snippets” folder.

This actually works out well, because the code for the snippets is slightly different between the Script Task and Script Component. This is due to the object references being slightly different between the two.

I’ve provided two code snippets for C# for reading from and writing to variables in the Script Task, and the same two in C#, but altered to work in the Script Component. To use them, you can simply copy them to the above folders, and then type “SSISReadOneVariable” or “SSISWriteOneVariable” as a shortcut, followed by two tabs to insert them. You can change the shortcut in the snippet file if you’d like to use something else. It also wouldn’t be difficult to convert them to VB, if that’s your preference. The snippets are located on my SkyDrive. If you have ideas for other useful snippets, post a comment. Enjoy!

Posted in Uncategorized | Comments Off

Getting Business Value from BI Today

Posted on February 2, 2009 by jwelch

Today’s economy and business environment poses problems for many IT departments. Budgets are tight, companies are looking for ways to reduce headcount, and strategic investments are not very popular right now. Instead, companies are focusing on projects that have immediate ROI, primarily in the form of cutting costs. This has particular impact on BI projects, since they often have longer time frames, and in many cases, don’t have clearly defined ROI. The effects of this are becoming evident. A number of consulting firms in the southeast US have cut back on the number of BI consultants they employ, and a number of companies in the region are cutting back on planned BI expenditures.

Since I believe very strongly that BI is a valuable investment, I hate to see this, but it is understandable. If you are beginning a large data warehouse project, that projects 12 months or more before any value is delivered to the users, it’s not a very attractive investment right now. So, given the current environment, what can you do to ensure that your BI projects don’t end up on the chopping block?

Solve Existing Problems

Focus on solving existing problems, rather than trying to anticipate future needs. Don’t get me wrong, sometimes you do need to invest in creating the BI infrastructure to handle future needs. However, in the current environment, business decisions makers are much more interested in hearing about the problem you can solve today, not the problems that they may encounter in a few years (if the business is still around).

Identify ROI

Identify the ROI up front. Too many BI projects get started on the vague promise of “When we have all this data together, we’ll be able to do great things!”. That’s not enough to justify expenditures today. Instead, you need a clear understanding of the problem, the cost of solving it, and the cost of not solving it. In a cost cutting environment, the latter point is often effective in communicating why the project needs to be done.

Incremental Wins

Look for incremental wins. As I’ve commented before, many BI initiatives take on a massive amount of scope, and run for many months before producing anything of value. An incremental approach allows you to demonstrate value to the stakeholders rapidly, and those incremental demonstrations of your ability to solve the problem often result in additional funding coming your way.

Do More With What You Have

Now that PerformancePoint Services is being rolled into SharePoint, there is an opportunity to provide additional BI with minimal or no investment in software, particularly if your company already uses SharePoint. By combining the capabilities of PerformancePoint with the workflow capabilities in SharePoint, you can provide some very interesting solutions in optimizing business processes, and making sure that the users of the process have the appropriate information at their fingertips.

By focusing on quick wins that have immediate ROI, BI teams can still provide a lot of value to the business. Demonstrating this value is the key to keeping your BI initiatives alive in a down economy.

Posted in Uncategorized | Comments Off

What’s Happening on SQL Server When My Package Fails?

Posted on January 30, 2009 by jwelch

We encountered an issue recently in an ETL process during the extract portion. When extracting from one very large table, using a NOLOCK hint, we were encountering errors indicating that the data was changing while we were extracting it. Since no other processes were supposed to be using the table during the same time window as the ETL, this was a source of some confusion. Since there was nothing indicating what, if anything, was changing the data, we needed a quick way to see what was going on in the database at the point the package failed.

Normally, when troubleshooting database activity problems, I’d just run sp_who or sp_who2. However, since the problem was occurring between 2 and 3 am, nobody was terribly excited around getting up to monitor the server. We didn’t really want to leave Profiler running all night either. So, we set up the package to log the current database activity whenever an error occurred. Basically, in the OnError event handler, we just added a data flow. The data flow runs the sp_who stored procedure to get the current activity on the database, and saves it to a text file.

This was pretty simple, and I’ve created a sample package to show it. The main control flow has a script task that generates an error, purely for the purpose of firing the OnError event handler.

The event handler does a check to make sure this is the first time it’s fired, since we didn’t want to log the activity multiple times. It then runs a data flow, and once it completes, sets a flag to ensure it doesn’t doesn’t fire again.

The data flow runs sp_who. While sp_who2 gives better information, it doesn’t provide metadata in an easily consumable format for SSIS. sp_who, on the other hand, works right off, so we went with it for simplicity. The output is sent to a text file.

It was pretty simple to implement, and it served the purpose. By reviewing the information logged, we could tell what other processes were running, and what computer it originated from. If you ever encounter a similar situation, hopefully this can help. The template package is located on my SkyDrive.

Posted in Uncategorized | Comments Off

Impact of the PerformancePoint Server Changes

Posted on January 27, 2009 by jwelch

If you follow business intelligence news at all, then you probably saw the news from Microsoft last week that PerformancePoint is becoming a component of SharePoint. However, it won’t be all of PerformancePoint – the Plan portion will see one additional service pack (SP3), then development will cease. The Monitor and Analyze portions of the product will become part of the SharePoint Enterprise license.

Reaction has been mixed. Generally, many people see the advantage in including the Monitor and Analyze functionality in SharePoint, as it will open that functionality to a much broader audience. This lines up nicely with Microsoft’s “BI for the masses” vision that they have working toward for several years. It also lines up with the more recent marketing message, “People-ready BI”. Seeing that SharePoint is becoming the place that many users go to do their work, it makes sense to incorporate their BI tools in the same location. I think that offering PerformancePoint Services (the new name for the Monitor and Analyze functionality under SharePoint) as part of SharePoint will make it easier to include BI functionality in new applications and lower the barrier to adoption of this functionality in organizations of all sizes.

The negative reactions are primarily around two things: discontinuing Plan, and not having a full-client story (besides Excel). I understand the reactions around discontinuing Plan. Version 1 had some rough edges (OK, a lot of rough edges), but Microsoft has a history of quickly releasing subsequent versions with much better functionality, and usually having a very good product by version 3. Breaking this pattern caught a lot of people by surprise. Version 1, while lacking in a few key areas, was definitely usable. Some of Microsoft’s customers are using it in production, and even more partners had made significant investments in it. Fortunately, while Mariner had done some work with it, we had not invested heavily in it. We were more focused on the Monitor and Analyze portions of the product. In part, this was because we recognized that performance management is a specialized discipline, requiring some specific skill sets. Just because you can deliver successful solutions on Microsoft technology doesn’t necessarily mean that you can deliver successful performance management solutions. I think that was a point of confusion for many partners (the “one stop shop” approach is very popular in the partner community), and that lead to Microsoft not having as strong of a partner base to support the product as they had hoped. On the other hand, there were some really strong partners in the performance management space who did some great things with Plan, and I can certainly empathize with those that made big investments and are now disappointed by the change in direction.

Mauro Cardarelli, a SharePoint MVP, had an interesting post on his concerns that making PerformancePoint available as part of SharePoint raises the same concerns. Competent delivery of SharePoint solutions doesn’t necessarily correlate to competent delivery of BI functionality, and successful delivery of BI solutions doesn’t mean that you can deliver good SharePoint solutions. Since this was one of the challenges for Plan, it will be interesting to see how it plays out going forward. In the short term, I’d encourage companies to be sure that their vendors either have both sets of skills (and can demonstrate that they’ve used them in the same project), or look for best-of-breed partners who are willing to work together.

The full-client story is a concern. The current direction seems to be for Excel to become the full client for consuming Analysis Services data, and for SharePoint to become the thin client interface. I’m definitely in favor of SharePoint as the thin-client interface, but using Excel as the full client leaves a pretty big gap in the story. It used to be that you could recommend ProClarity desktop to fill that gap, but since ProClarity is in a support only mode now, that’s not a good option. In time, more of the functionality of ProClarity should surface in Excel and SharePoint, but that’s still some time off. And Excel, while improving as an Analysis Services client, is still not on par with a dedicated desktop client built to expose the full functionality of Analysis Services. Hopefully that will improve over the next couple of releases of Excel, but in the meantime it creates opportunities for third parties to fill the gap.

Overall, I think this move will promote broader adoption on the Monitor and Analyze functionality in Microsoft’s customer base, and will strengthen the value proposition for moving to SharePoint Enterprise licenses. It’s a good thing for Microsoft, and good for customers who have already invested in SharePoint. However, it remains to be seen what impact not having a planning component or a strong full client application in the BI stack will have.

Some other reactions from around the web:

Chris Web (he also has a set of links featuring other reactions)

Nigel Pendse

Cindi Howson

Posted in Uncategorized | Comments Off

Updating Custom Assembly References in SSIS

SSWUG Business Intelligence Virtual Conference and SDS

Scale Up or Scale Out for SQL Server Data Warehouses

Check for SQL Agent Job Completion

Presenting at the Spring SSWUG Business Intelligence Virtual Conference

Developing a Foundation While Doing Iterative Development

Code Snippets for SSIS 2008

Getting Business Value from BI Today

What’s Happening on SQL Server When My Package Fails?

Impact of the PerformancePoint Server Changes

Recent Posts

Recent Comments

Archives

Categories

Meta