Archive for December 2007

SSIS Slow Down Performance Script Component

I’ve been meaning to post this for a while, but real life keeps intruding. I did a presentation on SSIS performance for the Charlotte SQL Server User Group at the end of November. It went well, though I was trying to squeeze a 90 minute presentation into 60 minutes, so I had to skim over some of the good stuff. You can find the presentation here (it’s the first link, at least till we have another meeting).

One item that I did need for the demos in the presentation was the ability to slow down the data flow. Normally, SSIS processes so fast that you can’t really see what’s happening. I found a few examples of script components that introduced a delay every time a buffer was processed. Unfortunately, that wasn’t flexible enough for me, as part of my demo was adjusting the numbers of rows per buffer. I really wanted something that would give a delay every X rows, while being easy to configure without having to edit the script each time. I ended up writing the following code:

Public Class ScriptMain
    Inherits UserComponent

    Private RowCount As Integer
    Private rowDelay As Integer
    Private everyNRows As Integer

    Public Overrides Sub PreExecute()
        MyBase.PreExecute()
        Dim vars As Microsoft.SqlServer.Dts.Runtime.Wrapper.IDTSVariables90
        Me.VariableDispenser.LockForRead("EveryNRows")
        Me.VariableDispenser.LockForRead("MSTimeDelay")
        Me.VariableDispenser.GetVariables(vars)
        rowDelay = CType(vars("MSTimeDelay").Value, Integer)
        everyNRows = CType(vars("EveryNRows").Value, Integer)
        vars.Unlock()
    End Sub

    Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
        RowCount += 1

        If RowCount Mod everyNRows = 0 Then
            System.Threading.Thread.Sleep(rowDelay)
        End If
    End Sub
End Class

This relies on two integer variables to be declared in the package. EveryNRows specifies how many rows should be processed before introducing a delay. MSTimeDelay sets how long the the delay should be, in milliseconds.

One thing to be aware of – in the above script, I am reading the package variables in the PreExecute method and storing them in instance variables. Locking and unlocking the variables is time consuming, and will drastically slow things down if you do it in the ProcessInputRow method.

Null Date Bug Fixed in SSIS

A few months back I reported a minor but annoying bug in the way NULL dates are handled in SSIS. It’s documented here, but basically, if you use an Execute SQL Task to set a DateTime variable to NULL, the value is 11/30/1999. However, a NULL date in SSIS returns 12/30/1899.

To explain a bit more, the SSIS package variables can’t hold NULL in the same way that a database column can. The variable always has to have a value in it. You can set a variable to NULL, but this just sets a default value in the variable. To see an example of this, create a new variable of type DateTime, set the EvaluateAsExpression property to TRUE, and put this expression in it:

NULL(DT_DATE)

The resulting value will be 12/30/1899. It isn’t uncommon to translate NULL values to a known, but unlikely to be used, value. However, since the Execute SQL Task is setting the NULL value to 11/30/1999, and the NULL value in SSIS is 12/30/1899, you can’t easily compare the value of a DateTime variable to NULL(DT_DATE) to see if the variable is “NULL”.

Fortunately, with this fix, this should be a little easier. Unfortunately, I don’t know exactly when we will see it released.

Fun with SSIS Configurations

As part of my standard approach (and the one used by Mariner), I use configurations in SSIS quite heavily. My preference is to use the SQL Server configuration type, since it keeps all the configurations in one place. XML configurations tend to result in a lot of extra files to manage. With SQL Server based configurations, though, you need to provide the initial connection string to the SQL Server. We typically use a system environment variable to control that initial connection to the configuration table (either directly, using a specific configuration to control the connection as Rafael has show in the past, or using the indirect approach, as highlighted by Jamie). We use a separate connection manager for the SQL Server configuration connection, usually named Configuration (I like self-explanatory names).

One of our current clients has their development and QA databases as separate SQL Server instances on the same server. There a number of reasons why I don’t care for that approach, but that’s another post. We also needed to use this same server as the host for both the development and QA versions of the SSIS packages. This posed a problem, as the both versions of the package would pick up the same environment variable, so we would only be able to run for one environment at a time.

My initial thought was to use the /CONN switch of DTEXEC utility to override the Configuration connection. Unfortunately, this didn’t work. As best as I can tell, based on some information here and my own testing, the order of events at package load time is:

  1. Configurations are applied, based on what is set in the package at design time.
  2. Any /ConfigFile, /Connection, and /Set values are applied.
  3. Package validation is run.
  4. The package is executed.

Basically, what this boils down to is that there is no way to override a SQL Server configuration from the command line. Even setting the Configuration connection manager from an XML configuration on the command line doesn’t work, because by the time it is applied, the design time configuration is has already been set. Setting the connection string doesn’t force the configurations to be reapplied.

There are a few workarounds for this. You could apply an XML configuration at runtime to override all the configured values in the package, but that defeats the purpose of using SQL Server configurations. We ended up with a different approach, that still lets us use the SQL Server configurations. By setting the environment variable up as a user variable, we can override it based on the user account that the packages are run under. This has a few caveats as well. For one, you need a separate account for each unique environment (not unnecessarily a bad thing). Another is that you can’t use the Integration Services job step type to run the package. Evidently, it doesn’t create a full user session, which is needed to access user environment variables. You can work around it by using the CmdExec job step type.

If you find yourself needing to support multiple environments on the same server, using configurations can be difficult. Hopefully, this information will be helpful for determining the best approach.