Presenting at Midlands PASS

Posted on July 17, 2008 by jwelch

I’m presenting at the Midlands PASS chapter in Columbia, SC tomorrow night, July 17th. I’ll be presenting on SSIS configurations and some common approaches for implementing them. Configurations are one of the best features about SSIS, but also one of the most confusing. If you’re in the area, and would like to learn more, please stop by.

Posted in Uncategorized | Comments Off

Continuing a Loop After an Error

Posted on June 29, 2008 by jwelch

A common question on the SSIS forums is how to continue a loop after an error occurs. For example, you might have a For Each Loop container that iterates through a set of files in a folder. If one file is invalid, you want to continue processing the remaining files. Fortunately, there is a simple pattern that allows this to be accomplished.

I created a package to illustrate this. Basically, the package implements the scenario above. It consists of a For Each Loop to iterate through all the files in a folder. There is a data flow task that processes each input file, and runs it through a row count (just for illustration purposes). The Script task is there to illustrate an error handling task could be attached to the data flow, but it’s empty in the example package. It would need to be replaced with the appropriate task to handle a failure, such as a File System task to move the file to another folder.

The key to handling the errors is the Sequence container, or more specifically, how the Sequence container is set up. The OnError event handler for the Sequence container has been enabled. You can do this by selecting the Sequence container, then clicking the Event Handlers tab. Select the OnError event in the drop down, then click the link in the designer to create an empty event handler. It’s not necessary to add anything to event handler.

Once it has been created, go to the Variables window, and select the option to show System variables. Select the Propagate variable, and set the value to FALSE. This prevents any errors occurring on tasks inside the Sequence container from being raised to the For Each loop.

This is all that really needs to be done, but if you run the package with only this set, you’ll notice that the Data Flow task and the Sequence container both report an error. To prevent the Sequence Container from reporting an error, set the MaximumErrorCount property of the container to 0.

Note that you do not have to use a sequence container to accomplish this. You could make the same settings on individual tasks inside the For Each loop. However, the Sequence container gives you a common place to make the settings, and you can have as many tasks inside it as you’d like.

The example package is posted on my Skydrive. The zip file contains a folder call Import. The package expects this folder to be located in C:Temp. It contains 4 text files for the package to process. The 2nd file contains an error (one of the rows has fewer columns than it is supposed to), so that you can see the error behavior.

Posted in Uncategorized | Comments Off

Configuration Changes in SSIS 2008

Posted on June 20, 2008 by jwelch

In a previous post, I detailed some of the challenges that I had encountered when trying to modify the connection string used for SQL Server configurations from the command line. In SSIS 2005, command line options were applied after the configurations were loaded (with the exception of parent package variables). Effectively, this meant that you could not change the location a configuration pointed to from the command line.

This has been changed in SSIS 2008. Configurations are now loaded twice, once when the package is originally loaded, and then again after any command line options are applied. The order of events is:

The package file is loaded.
The configurations specified in the package at design time are applied in the order specified in the package (except for Parent Package Variables).
Command line values are applied.
The configurations specified in the package at design time are reloaded in the order specified in the package (except for Parent Package Variables). The configurations will use any updated values from the command line when they are reloaded, so it is possible to change the connection string for a SQL Server configuration.
Parent Package Variable Configurations are applied.
The package is run.

This is a nice improvement, since you can now set up a package with SQL Server configurations, and point the package to the correct database are runtime by simply using the /CONN switch of DTEXEC. However, it still isn’t perfect . Since the design time configuration is applied twice, a value that you apply using /SET on the command line can be overridden by a preexisting configuration.

If you’d like to see this behavior, I’ve provided a simple package. The package has a single variable (“TestVar”) defined. It has an XML configuration enabled that sets the value of TestVar to “Dev”. There is a single Script Task that fires an information event that contains the value of the variable.

public void Main()

            bool fireAgain = false;

            Variables vars = null;

            Dts.VariableDispenser.LockOneForRead("TestVar", ref vars);

            Dts.Events.FireInformation(0, "", "Value is: " + vars["TestVar"].Value, "", 0, ref fireAgain);

            vars.Unlock();

            Dts.TaskResult = (int)ScriptResults.Success;

If I run this package using this command:

dtexec /file Configuration_Overwrite.dtsx /SET Package.Variables[User::TestVar].Properties[Value];Prod /REP EWI

I’d expect to see the value “Prod” reported in the output. Instead, “Dev” is displayed. If you disable the package configurations and re-run it using the same command, it does report “Prod”.

Overall, I prefer the new behavior, but it does introduce a new issue to be aware of. In general, you need to be careful to not /SET a value that is also specified in a configuration, as it will be overwritten. I’ve filed a Connect suggestion for the addition of /SetBefore and /SetAfter switches to DTEXEC, so that you can explicitly define when you want the /SET applied. If you think it’s a good idea, please vote for it.

The sample package is on my SkyDrive. You may need to update the location of the XML configuration, but it has no other dependencies.

Thanks to Cho Yeung on the SSIS team for clarifying the load order when the package is executed.

Posted in Uncategorized | Comments Off

Mariner Wins Microsoft Performance Management Partner of the Year

Posted on June 19, 2008 by jwelch

This is really cool – Mariner (my employer) was awarded the Microsoft Performance Management Partner of the Year award this year, based on a solution that we created for the Charlotte-Mecklenburg Schools (CMS) system. When I was first told about the award, I asked, “It’s for the Southeast region, right?”. The answer: “No, think bigger.” “The US? Very cool!” “No, bigger than that.” Yes, it’s the Worldwide Performance Management Partner of the Year – selected from over 2000 entries.

So what is this award winning solution? It’s a performance management system that allows school systems to track how well they are progressing on their long term goals for improving the learning environments in schools. This was an interesting solution to work on – most of my experience has been in applying business intelligence to more traditional business solutions. While there are some similarities, the core focus is very different. Success for a school system isn’t judged on profits, it’s based on how well the children are educated. Since I have one child already in the CMS system, and another one going in soon, there was much more of a personal impact from seeing what they were doing, and their approach to meeting their goals.

It was also interesting from a technology standpoint. The technology for the solution involves most of Microsoft’s BI stack – SQL Server for the data repository, SSIS for the data integration, SSAS for analysis, SSRS for some reporting, and Sharepoint + PerformancePoint Server for the front-end display of the analytic information. It’s pretty impressive seeing how Microsoft’s BI story has really come together over the last couple of years, and actually getting the chance to put all these pieces in action with each other was great (and challenging in a few spots 🙂 ). The team that did the development work did a great job with it, and it really shows in the end product.

Posted in Uncategorized | Comments Off

Updated Information on 64-bit Considerations for SSIS

Posted on June 12, 2008 by jwelch

Douglas Laudenschlager has posted some good information about the considerations you need to make if you are running SSIS on a 64-bit box. This is additional information to what is currently in Books Online, and it’s well worth looking at. Considering that a lot of developers are developing on x86 machines, and deploying to x64 servers, it’s even more valuable. I currently do all my development on a x64 machine, but since the development environment (BIDS) is 32-bit only, some of these items still have impact.

64-bit Considerations for SQL Server Integration Services – Douglas Laudenschlager – Dougbert.com

Posted in Uncategorized | Comments Off

Calculating a Running Total Part 2

Posted on June 12, 2008 by jwelch

To finish up the topic that I started in my previous post, I wanted to post a more complex example of running total calculations. This is based off a request on the MSDN forums, for a running total calculation that would “slide” along as additional rows were processed. In the poster’s example, the running total needed to reflect the total for the last 10 weeks.

To show how to accomplish this, I’m going to update the same package that was used in the previous example.

The script component has the same RunningTotal column defined on the output:

The primary difference is in the script itself. Since I want to keep a list of the previous 10 records, I am using a Queue object, which is part of the System.Collections.Generic namespace. This object lets you easily remove the oldest item from the queue, so it works very well for this example. In the script, if the queue has more than 10 items, the oldest item is removed prior to adding a new one. The running total is produced by iterating through all the items in the collection, and storing the result in the RunningTotal output column.

Public Class ScriptMain

    Inherits UserComponent

    Dim runningTotalCollection As Queue(Of Integer) = New Queue(Of Integer)

    Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)

        If runningTotalCollection.Count >= 10 Then

            runningTotalCollection.Dequeue()

        End If

        runningTotalCollection.Enqueue(Row.OrderQty)

        Dim x As Integer

        Dim sumX As Integer

        For Each x In runningTotalCollection

            sumX += x

        Next

        Row.RunningTotal = sumX

    End Sub

End Class

This is another example of the flexibility of the Script component. It also helps highlight that by using the Script component, you can store the values from previous rows, and then reference them later. The example package is posted on my Skydrive.

Posted in Uncategorized | Comments Off

Calculating a Running Total In SSIS

Posted on June 9, 2008 by jwelch

Running totals are a common calculation for reporting, and are occasionally needed in ETL processes to populate aggregate tables. One way to accomplish this in SSIS, if you are retrieving data from a relational database, is to do the running total in the SELECT statement. There are a number of resources on the Internet that illustrate how to accomplish this. However, this may not work for you if you are retrieving information from a flat file. Also, the logic to implement a running total may not be straightforward in SQL, particularly if you need sliding windows for the totals (I’ll explain this later).

Fortunately, there are other ways to accomplish this in SSIS. A very flexible method is to use a script component, as this will let you customize the logic however you prefer. As an example, I’ll take a look at a couple of examples from AdventureWorks.

First, the more straightforward example. The AdventureWorks database has a Sales.SalesOrderDetail table that lists the itemized purchases for each order. Note – I’m only filtering the results for example purposes. The actual code in the package works with the whole table.

If I was to calculate a running total for the OrderQty for each SalesOrderID above, I’d expect to see:

SalesOrderID	ProductID	OrderQty	RunningTotal
43685	765	3	3
43685	763	1	4
43685	754	1	5
43685	725	1	6
43686	758	3	3
43686	762	1	4
43686	770	1	5
43687	768	1	1
43687	765	2	3

Notice that the running total resets each time the SalesOrderID changes. To implement a script component that will calculate the running total, I’ve created a new script component and added an output column to it to hold the running total.

Inside the script component, I’ve added a variable to store the value of the control number (SalesOrderID), and another variable to capture the running total. When the control number changes, the running total is reset.

Public Class ScriptMain

    Inherits UserComponent

    Dim runningTotal As Integer

    Dim previousValue As Integer

    Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)

        If Not previousValue = Row.SalesOrderID Then

            runningTotal = 0

            previousValue = Row.SalesOrderID

        End If

        runningTotal += Row.OrderQty

        Row.RunningTotal = runningTotal

    End Sub

End Class

This is an easy method to calculate running totals in the data flow. One thing to note is that the data must be sorted by the control number (SalesOrderID in the example) in order for this to work properly.

This is in an example package on my SkyDrive here. I will follow up this post with another sample showing a more complex "running total" problem and how it can be solved in SSIS as well.

Posted in Uncategorized | Comments Off

SSIS Configurations At Design Time

Posted on June 4, 2008 by jwelch

There’s an interesting “behavior” that occurs with configurations at design time. The value of the configuration will be saved in your package as the new stored value. I’d seen this before, but never really thought about it, until a poster on the forums questioned why it was happening. To understand it, you need to know how the package is processed when it is loaded in the design environment.

If you have configurations enabled in the package, then the configuration is applied to the package when it is initially loaded. When a configuration is applied, it updates the target property of the configuration. For example, you might have a package that has the Description property set to “No Configuration”. If you add a configuration that sets the Description property of your package to “Has Configuration”, when the package is loaded, the value specified in the configuration (“Has Configuration”) will overwrite the value currently stored in the Description property (“No Configuration”).

If you then close the package without saving, the change to the property is not saved, and it will reflect the original value (“No Configuration”). If, however, you save the package (and remember, running it in the IDE saves it automatically), the new value (“Has Configuration”) will be saved as the value in the package file.

Normally, this does not cause any issues. If you execute a package in production that has a configuration, it is not saved after execution, so the values of the configurations is not save in the package file. However, it can impact you in the development environment. This is one of the reasons that I recommend setting up configurations even in your development environment. If you do that, then you know that regardless of the value the package was saved with, it will have the value of your local configuration each time it is loaded in the IDE. This is also a good reason never to update a package directly on your production server. When you open it in the IDE on the production server, it will retrieve the production configurations. If you save or run it, it will be saved with the production configuration values. This can cause a number of headaches in troubleshooting problems down the road, so I recommend that you don’t do it.

Posted in Uncategorized | Comments Off

How to implement cascading parameters in a SQL Services Reporting Services MDX Report

Posted on May 18, 2008 by ddarden

Background

Last week, I needed to prototype a report against an Analysis Services 2005 cube for a client. The report wasn’t too complicated… I just had the requirement to allow a user to drill down a hierarchy by selecting a member at one level, then displaying all the children for that member, etc. This is really easy to do in SSRS… but I wouldn’t go so far as to call it super intuitive. It had been awhile since I wrote a MDX report, and it took me a few minutes to remember just how to do it… so I thought I’d document the steps, for my own benefit if no one else’s.

The sample requires the Adventure Works AS DB. You can download the sample here.

Scenario

So, I want to create a report that returns the Order Count by Product. The users need to select a particular Category, and then Subcategory for the products they want to see. The report also needs to return summary information for both the Category and Subcategory level, and it needs to contain the Category and Subcategory on the report. This can be a pretty standard business case for reports where the potential data set is to large to report on.

Underlying Data

In order for this to work, I need a hierarchy to drill down. Here, I’m going to use the Product Categories hierarchy of the Product dimension in the Adventure Works DW.

The Report

Next, I’m going to create a data set to use in my report. I’m going to go ahead and switch to the MDX view for the data set. Now, I’m going to go ahead and create a few parameters.

Now, I’m going to go to the Report Layout tab, right-click off the report, and choose to view my parameters. Yup, they’re there. Now, I’m going to go back to the Data tab, and look at my list of available data sets…

Cool! BIDS just created two new data sets for me, based on the parameters I created. Note that you have to view your Report Parameters on the Layout tab for these data sets to be created (you just have to open the dialog, you don’t have to do anything). This seems to be the part that I always forget.

Let’s look at the ParameterCategory data set first. The MDX that BIDS created looks like this:

   1: WITH MEMBER [Measures].[ParameterCaption] AS '[Product].[Product Categories].CURRENTMEMBER.MEMBER_CAPTION'

   2: MEMBER [Measures].[ParameterValue] AS '[Product].[Product Categories].CURRENTMEMBER.UNIQUENAME'

   3: MEMBER [Measures].[ParameterLevel] AS '[Product].[Product Categories].CURRENTMEMBER.LEVEL.ORDINAL'

   4: SELECT

   5:     {

   6:         [Measures].[ParameterCaption]

   7:         , [Measures].[ParameterValue]

   8:         , [Measures].[ParameterLevel]

   9:     } ON COLUMNS ,

  10:     [Product].[Product Categories].ALLMEMBERS ON ROWS

  11: FROM

  12:     [Adventure Works]

But when we run that, it returns everything in the hierarchy… including that nasty ‘All Products’, which we don’t want our users selecting. So, we’ll make a slight modification, and change ‘[Product].[Product Categories].ALLMEMBERS’ to ‘[Product].[Product Categories].Children’. Now, we get just the Categories that we want our users to select.

Now, we want to look at the ParameterSubCategory dataset. The MDX looks like this:

   1: WITH MEMBER [Measures].[ParameterCaption] AS '[Product].[Product Categories].CURRENTMEMBER.MEMBER_CAPTION'

   2: MEMBER [Measures].[ParameterValue] AS '[Product].[Product Categories].CURRENTMEMBER.UNIQUENAME'

   3: MEMBER [Measures].[ParameterLevel] AS '[Product].[Product Categories].CURRENTMEMBER.LEVEL.ORDINAL'

   4: SELECT

   5:     {

   6:         [Measures].[ParameterCaption]

   7:         , [Measures].[ParameterValue]

   8:         , [Measures].[ParameterLevel]

   9:     } ON COLUMNS ,

  10:     [Product].[Product Categories].ALLMEMBERS ON ROWS

  11: FROM

  12:     [Adventure Works]

Again, not precisely what we want… plus, we want this one to be parameter driven.

First, we need to add the ParameterCategory (re-use the original Parameter name to keep an extra data set from being created later) to the report:

Then, we change ‘[Product].[Product Categories].ALLMEMBERS’ to ‘STRTOMEMBER(@ParameterCategory).Children’. Now, our MDX looks like this:

   1: WITH MEMBER [Measures].[ParameterCaption] AS '[Product].[Product Categories].CURRENTMEMBER.MEMBER_CAPTION'

   2: MEMBER [Measures].[ParameterValue] AS '[Product].[Product Categories].CURRENTMEMBER.UNIQUENAME'

   3: MEMBER [Measures].[ParameterLevel] AS '[Product].[Product Categories].CURRENTMEMBER.LEVEL.ORDINAL'

   4: SELECT

   5:     {

   6:         [Measures].[ParameterCaption]

   7:         , [Measures].[ParameterValue]

   8:         , [Measures].[ParameterLevel]

   9:     } ON COLUMNS ,

  10:     STRTOMEMBER(@ParameterCategory).Children ON ROWS

  11: FROM

  12:     [Adventure Works]

and it will only return the Children of the selected category. You could also create your MDX using an expression, and directly use the the ‘Parameters!ParameterCategory.Value’ embedded in a quoted string. There is a performance hit for using anything like STRTOMEMBER, but in this case it isn’t worth the trouble to me to do it that way.

Almost done. Except that we haven’t gotten around to writing the actual query that drives the report… so we go back to our main data set.

Now, when we create our query, we might be tempted to return the children of the Subcategory… but this won’t get us summary information at the Category and Subcategory levels. So we’re going to write some MDX using the Hierarchize function to make sure our data set contains the selection at the Category level, at the Subcategory level, and all of the Products underneath the selected subcategory:

   1: SELECT

   2:     [Measures].[Order Count] ON 0,

   3:     HIERARCHIZE(

   4:         {

   5:             STRTOMEMBER(@ParameterCategory),

   6:             STRTOMEMBER(@ParameterSubCategory),

   7:             STRTOMEMBER(@ParameterSubCategory).Children

   8:         }

   9:     ) ON 1

  10: FROM

  11:     [Adventure Works]

Now I’m going to go back to the Layout tab, right-click off the report, pull up my Report Parameters, and change the Default value to Null (so the user will always have to make a selection… though you can make this part as clever as you’d like).

In your report, you may need to use some expressions, filters or other logic to format your results, suppress rows (i.e., the Category will have a Null for both Subcategory and Product… how you want to handle this is up to you). For this sample, I just wrote a little VB

   1: Dim _Indent As String = "  "

2:

   3: Public Function FormatProduct(ByVal Category As String, ByVal SubCategory As String, ByVal Product As String) As String

   4:     Dim ret As String = String.Empty

5:

   6:     If Not String.IsNullOrEmpty(Product) Then

   7:         ret = _Indent & _Indent & Product

   8:     Else If String.IsNullOrEmpty(Product) And Not String.IsNullOrEmpty(SubCategory) Then

   9:         ret = _Indent & SubCategory

  10:     Else

  11:         ret = Category

  12:     End If

13:

  14:     Return ret

  15: End Function

And used an expression to format my Product column in the report.

Summary

That’s about it. We now have a simple report with some cascading parameters.

Posted in Uncategorized | Tagged MDX, SSAS, SSRS | Comments Off

See if a File is Locked

Posted on May 13, 2008 by jwelch

A question that comes up occasionally on the forums is how to determine if a file is being written to by another process. It can cause issues if a file is partially written when a source component starts extracting data from it.

There is a way to check this through a Script Task. The script below attempts to lock the file for writing. If the file is be written to be another process, the attempt to lock it will fail. If the file is available, the attempt will succeed.

Public Sub Main()

    Dim vars As Variables

    Dim file As System.IO.FileStream

Try

        file = System.IO.File.Open(Dts.Connections("FlatFile").ConnectionString, System.IO.FileMode.Open, System.IO.FileAccess.Write)

    Catch ex As Exception

        Dts.VariableDispenser.LockOneForRead("FileLocked", vars)

        vars("FileLocked").Value = True

        vars.Unlock()

    Finally

        If Not file Is Nothing Then

            file.Close()

        End If

    End Try

    Dts.TaskResult = Dts.Results.Success

End Sub

The package needs to have a connection manager that points to the file you want to check (named "FlatFile" in the script above), and a variable ("FileLocked") that is set to True if the file is locked. The variable should be set to False by default. The variable can be used in a precedence constraint to control whether other tasks execute.

Posted in Uncategorized | Comments Off

Agile BI

Presenting at Midlands PASS

Continuing a Loop After an Error

Configuration Changes in SSIS 2008

Mariner Wins Microsoft Performance Management Partner of the Year

Updated Information on 64-bit Considerations for SSIS

Calculating a Running Total Part 2

Calculating a Running Total In SSIS

SSIS Configurations At Design Time

How to implement cascading parameters in a SQL Services Reporting Services MDX Report

Background

Scenario

Underlying Data

The Report

Summary

See if a File is Locked

Recent Posts

Recent Comments

Archives

Categories

Meta