Archive for February 2009

Check for SQL Agent Job Completion

Recently, a user on the MSDN forums asked how to check an Agent job from SSIS to see if it’s completed. Once it had completed, additional steps need to be performed, for example, archiving flat files. However, this couldn’t be done till the job completed. Fortunately, this isn’t too difficult to accomplish in SSIS.

I put together the following sample package to illustrate this. Basically, the package starts a job, then goes into a loop to see if the package has completed.

image

The Start Job Execute SQL Task uses the sp_start_job stored procedure to start an Agent job. sp_start_job is asynchronous, that it, it does not wait for the job to complete, but returns right away. This sample is just running a job that pauses execution for several seconds before completing.

image

The For Loop container is configured to look at the @executionStatus variable. The initial expression (InitExpression) sets the value of the variable to 1 at the beginning of the loop. The loop will continue executing until the EvalExpression is false, in this case when @executionStatus is no longer equal to 1.

image

The first step inside the loop is to delay for a second. This allows the job to get started initially, and prevents the package from bombarding the server with requests. The delay is implemented using a WAITFOR statement in the Execute SQL Task.

image

Finally, another Execute SQL Task is used to check the current status of the job. This is done by calling sp_help_job, which returns, among other things, the current_execution_status column. The value in this column indicates what the job is doing currently. 1 indicates that it is currently executing, so when the value is no longer 1, it’s time to exit the loop. The value of the current_execution_status column is assigned to the @executionStatus variable, and the loop repeats as long as the value is 1.

image

This is the Result Set page, where the column is assigned to the variable.

image

As I mentioned, this is fairly straightforward to implement. The one wrinkle I found is that you do need some amount of delay between calling sp_start_job and checking the execution status. If you do it immediately, it’s likely that the job hasn’t started yet, and the loop will exit immediately. On my system, introducing the 1 second delay gave the job time to actually get started. Also, you’ll need to alter the Agent job being called to match your system.

The sample package is on my Skydrive. Let me know if you have questions or feedback.

Presenting at the Spring SSWUG Business Intelligence Virtual Conference

I’m going to be doing 3 sessions at the upcoming SSWUG BI Virtual Conference. If you are interested in getting an introduction to SQL Data Services (SDS), I’ll be doing an introductory session on it, and then covering how it can be used from SSIS and SSRS. SDS is pretty interesting, but pretty different from standard SQL Server.

There’s a great lineup of speakers and session for the conference – you can check them out here. If you want to sign up, you can use this code: SPVJWESP09 for $10 off the registration fee.

Developing a Foundation While Doing Iterative Development

In my initial post about doing iterative development for BI/DW projects, I mentioned that one of the challenges was developing a solid foundation while doing iterative development, especially in the first few iterations. If you are started from scratch on a new BI initiative, there is often a lot of work to do in getting environments established, developing processes for moving data and code between environments, and exploring and validating the source data to be used. Unfortunately, most of this work does not result in deliverables that an end user would consider valuable. Since, as part of each iteration, you want to have the possibility of delivering working functionality to the stakeholders, this can present a problem.

Since most end users consider working functionality to be something that they can see in a nice user interface, you need to look at ways to minimize the development time required to present data to the end user. Some of the common time-consuming tasks in the first couple of iterations are:

  • Establishing the environments
  • Exploring and validating the source data
  • Developing ETL processes to move and cleanse the data

There are really no quick workarounds to setting up the environments. In fact, I’ve usually found that taking shortcuts on the environments leads to much bigger problems down the road. However, what can be effective is to minimize the number of of environments that you deal with in each iteration. While theoretically you should be able to deploy to production in the first iteration of a project, it’s rare that this is actually needed. So instead of creating a development, QA, and production environment, consider only establishing the development and QA environments. I do think that having at least two environments is important, so that you can begin validating your deployment procedures.

Exploring and validating the source data is definitely important. In the first couple of iterations, though, it’s often necessary to limit and restrict what you explore. For example, a project I was involved in recently had some very serious issues with data quality. The source database did not enforce referential integrity, so a large percentage of the data in the source was not related to the rest of the data correctly. Rather than derailing the current iteration to completely research and resolve the data quality issues, the project team and the stakeholders made the decision to only import data that was related correctly. This enabled the project team to still present a set of working reports to the stakeholders at the end of the iteration, rather than not being able to demonstrate any working functionality. The subsequent iterations were adjusted to better reflect the level of data quality.

ETL processes can be time-consuming to develop, particularly if the organization does not already have a framework in place for the ETL processes. In the first couple of iterations, an alternative approach is to load data in a more manual fashion, using SQL scripts or direct manipulation to get an initial set of data populated. This has a couple of benefits. One, it allows the time for building a full ETL process to be spread across multiple iterations. Two, it allows the end users to get a look at the data (in a friendly user interface, if you follow the advice above) and validate that it is correct.

A key part of developing the foundation, while still providing value in each iteration, is accepting that you can’t do it all in a single iteration. The foundation development will have to be spread across multiple iterations, and it is acceptable to build some scaffolding code in order to deliver functionality in each iteration. Clearly, you want to minimize that amount of code that can’t be reused. Generally, I’ve found that with the Microsoft toolset for BI, it’s pretty easy to build incrementally, with minimal rework. However, even if your tools don’t support this as well, in my experience the downsides of having some code that gets replaced in a later iteration are far outweighed by the benefits of being able to demonstrate tangible progress to the stakeholders in each iteration of the project.

Code Snippets for SSIS 2008

If you develop in Visual Studio using C# or VB, then you are probably familiar with code snippets. They are little chunks of code that you can insert into your application and save you some typing. For example, if you are writing a C# application, and you type “if” and hit TAB twice, the following will be inserted into your code:

image

You can enter in the value to be evaluated, and then type in the rest of your code. I like code snippets because they save me from typing the same things over and over again. However, in SSIS 2005 scripting, code snippets weren’t available, due to the VSA environment used by the script objects. Now that we have a real scripting environment in SSIS 2008, code snippets can be used, although there are a few things you need to know.

Two of the most common activities I do in SSIS scripts are reading from and writing to variables. So, I put together a few code snippets that do this. Here’s what the snippet for reading a value from a variable looks like:

image

To create a snippet, you define the code snippet in an XML format, and save it to a file with a .snippet extension. I’m not going to post the code for the snippet here, as it is fairly verbose, but you can look at the .snippet file in any text editor. Once you have the file, you need to put it in a specific location for the Visual Studio environment to recognize it. This is where it can get a little confusing. Normally, snippets go into the “C:\Users\<user>\Documents\Visual Studio 2008\Code Snippets\<language>\My Code Snippets” folder. However, this is the location used for the full Visual Studio environment. Since the script editors in SSIS use the Visual Studio Tools for Applications environments, they get their own folder. In fact, there are separate folders for the Script Task and the Script Component. Code snippets for the Script Task go into the “C:\Users\<user>\Documents\Integration Services Script Task\Code Snippets\<language>\My Code Snippets” folder, and snippets for the Script Component go into the “C:\Users\<user>\Documents\Integration Services Script Component\Code Snippets\<language>\My Code Snippets” folder.

This actually works out well, because the code for the snippets is slightly different between the Script Task and Script Component. This is due to the object references being slightly different between the two.

I’ve provided two code snippets for C# for reading from and writing to variables in the Script Task, and the same two in C#, but altered to work in the Script Component. To use them, you can simply copy them to the above folders, and then type “SSISReadOneVariable” or “SSISWriteOneVariable” as a shortcut, followed by two tabs to insert them. You can change the shortcut in the snippet file if you’d like to use something else. It also wouldn’t be difficult to convert them to VB, if that’s your preference. The snippets are located on my SkyDrive. If you have ideas for other useful snippets, post a comment. Enjoy!

Getting Business Value from BI Today

Today’s economy and business environment poses problems for many IT departments. Budgets are tight, companies are looking for ways to reduce headcount, and strategic investments are not very popular right now. Instead, companies are focusing on projects that have immediate ROI, primarily in the form of cutting costs. This has particular impact on BI projects, since they often have longer time frames, and in many cases, don’t have clearly defined ROI. The effects of this are becoming evident. A number of consulting firms in the southeast US have cut back on the number of BI consultants they employ, and a number of companies in the region are cutting back on planned BI expenditures.

Since I believe very strongly that BI is a valuable investment, I hate to see this, but it is understandable. If you are beginning a large data warehouse project, that projects 12 months or more before any value is delivered to the users, it’s not a very attractive investment right now. So, given the current environment, what can you do to ensure that your BI projects don’t end up on the chopping block?

Solve Existing Problems

Focus on solving existing problems, rather than trying to anticipate future needs. Don’t get me wrong, sometimes you do need to invest in creating the BI infrastructure to handle future needs. However, in the current environment, business decisions makers are much more interested in hearing about the problem you can solve today, not the problems that they may encounter in a few years (if the business is still around).

Identify ROI

Identify the ROI up front. Too many BI projects get started on the vague promise of “When we have all this data together, we’ll be able to do great things!”. That’s not enough to justify expenditures today. Instead, you need a clear understanding of the problem, the cost of solving it, and the cost of not solving it. In a cost cutting environment, the latter point is often effective in communicating why the project needs to be done.

Incremental Wins

Look for incremental wins. As I’ve commented before, many BI initiatives take on a massive amount of scope, and run for many months before producing anything of value. An incremental approach allows you to demonstrate value to the stakeholders rapidly, and those incremental demonstrations of your ability to solve the problem often result in additional funding coming your way.

Do More With What You Have

Now that PerformancePoint Services is being rolled into SharePoint, there is an opportunity to provide additional BI with minimal or no investment in software, particularly if your company already uses SharePoint. By combining the capabilities of PerformancePoint with the workflow capabilities in SharePoint, you can provide some very interesting solutions in optimizing business processes, and making sure that the users of the process have the appropriate information at their fingertips.

By focusing on quick wins that have immediate ROI, BI teams can still provide a lot of value to the business. Demonstrating this value is the key to keeping your BI initiatives alive in a down economy.