Archive for April 2008

Renaming the Script Project in SSIS 2008

I found something interesting in SSIS 2008 today, and thought I’d share it. I was working with David Darden on a package that included a script task. In 2008, the Visual Studio Tools for Applications (VSTA) environment is used to edit script tasks. It exposes a few more details about the script task, including the project name. By default, it creates a rather long project name, as you can see in the screen shot below. (Yes, that is C#. It’s my preferred language, so expect to see more of it here now that it’s available in 2008). The namespace is based on the project name, so you get the same long value for it.

image

While poking around in the script task’s expressions editor, we found the ScriptProjectName property. This property does not show up in the Properties window (F4), only in the expressions editor. By setting this, you can update the project name in the script. For example, setting this property to "NewScriptProjectName", as shown here:

image

results in this showing up in the script project:

image 

There are a few caveats to this. One, if you do this after initially editing the script, the project name will be updated, but the namespace will remain the same. Two, you must save the package after setting the expression, in order for the expression to be applied. If you add the expression, then edit the script without saving, it will not use the new project name.

So, if you want to name your script project with a friendly name, do the following:

  1. Add the Script Task.
  2. Add an expression to set the ScriptProjectName property to your desired value.
  3. Save the package.
  4. Now you can use the Edit Script button to edit the script project with the friendly name.

I do want to point out that the name of the project has no real impact on functionality. The script will not run any differently if you do this. It’s purely for aesthetic purposes. But if the long, random looking project names bother you, it’s nice to know you can fix them. There is also a MSDN forum post here that details the steps to edit the namespace after the script has been edited for the first time. It’s a little easier to set it this way, though, so that it is handled up front.

David Darden’s Blog

One of my colleagues, David Darden, has started blogging on www.agilebi.com. He’s been working with the Microsoft BI stack for several years now, and has a lot of great knowledge to share. His first post is on using the PerformancePoint SDK to create scorecards. It’s a good read, and if you are interested in extending the built-in PerformancePoint functionality, I highly recommend taking a look.

http://agilebi.com/cs/blogs/ddarden/archive/2008/04/26/creating-a-scorecard-transform-in-performancepoint-server-monitor.aspx

Common Expressions In the Data Flow

I use the Derived Column transform in the Data Flow a lot. In fact, I can’t remember the last time I wrote a production package that didn’t include at least one Derived Column transform in it. Over time, I’ve built up a lot of common expressions that I keep using, as well as some that I stored "just in case". I thought it would be good to post a few of them here, primarily because it makes it easier for me to find them, but hopefully others will find these useful as well.

References to columns ([StringColumn], for example) need to be replaced with the appropriate column name in the data flow. Values in italics need to be updated with the appropriate value (literal, column, or variable). Also, some of the date expressions may not work if your regional settings are not US date formats.

Description Expression
Count the occurrences of a specific character in a string LEN([StringColumn]) – LEN(REPLACE([StringColumn],"Character to count",""))
Cast a date stored in a string column as YYMMDD to DT_DBTIMESTAMP (DT_DBTIMESTAMP)(SUBSTRING([StringColumn],3,2) + "-" + SUBSTRING([StringColumn],5,2) + "-" + SUBSTRING([StringColumn],1,2))
Cast a date stored in a string column as YYYYMMDD to DT_DBTIMESTAMP (DT_DBTIMESTAMP)(SUBSTRING([StringColumn],1,4) + "-" + SUBSTRING([StringColumn],5,2) + "-" + SUBSTRING([StringColumn],7,2))
Cast a date stored in a string column as DDMMMYYYY to DT_DBTIMESTAMP (DT_DBTIMESTAMP)(SUBSTRING([StringColumn],3,3) + "-" + SUBSTRING([StringColumn],1,2) + "-" + SUBSTRING([StringColumn],6,4))
Pad a string to the specified length [StringColumn] + REPLICATE(" ", 10 – LEN([StringColumn]))
Check a date to see if it in a valid range and replace with NULL if it is not [DateColumn] < (DT_DBTIMESTAMP)"1753/01/01" ? NULL(DT_DBTIMESTAMP) : [DateColumn > (DT_DBTIMESTAMP)"9999/12/31" ? NULL(DT_DBTIMESTAMP) : [DateColumn]
Convert seconds to minutes and seconds (DT_WSTR,20)([NumberOfSeconds] / 60) + ":" + (DT_WSTR,20)([NumberOfSeconds] % 60)
Combine date and time strings to DT_DBTIMESTAMP (DT_DBTIMESTAMP)("2008-03-25" + " " + "10:20:00")
Return True if Date is a Weekend DATEPART("Weekday",[DateColumn]) == 7 ? (DT_BOOL)"True" : DATEPART("Weekday",[DateColumn]) == 1 ? (DT_BOOL)"True" : (DT_BOOL)"False"
Cast a string to a numeric with a range check and return 0 if out of range [StringColumn] > "9" ? 0 : [StringColumn] < "0" ? 0 : ((DT_NUMERIC,1,0)[StringColumn])

Sending an Alert if a Package Runs Too Long

A question that has come up in a few different situations recently is how to send someone an alert if your package is taking too long to execute. For example, you might have a package that loads a large number rows from a source system to a destination on a nightly basis. There are a number of other packages that depend on this one to complete before they can begin running. The initial package normally takes 15 minutes to run, but on occasion, the source system can be under heavy load, so the package can run 45 minutes to an hour. In this scenario, you’d like to alert people that the load will be delayed.

If you are using an scheduling tool, you should first check to see whether it has this capability. Ideally, you want this monitoring process outside of the package itself. If it is internal to the package, then there is potential for the alerts not to be sent, if the DTEXEC process has really gotten hung up. Some scheduling tools have the capability to monitor the duration of the execution. However, SQL Agent does not have this, so you may need to implement this in the package itself. Another reason you may need to do it in the package is to monitor the duration of a specific set of tasks, but not the entire package.

Fortunately, this isn’t too difficult to implement in SSIS. All it takes is a little scripting. I’ve set up a sample package that illustrates how to do this. The main package looks like this:

image

The Long Running Task script task simulates a long running task by running a specific number of seconds (the value of the TaskDelay variable below). The Mark Task Successfully Completed sets the TaskSuccessful variable to True, so that the package knows when the long running task completed. The Monitor Duration script task is the one that actually does the checking on execution time, based on the value in the TaskDuration variable. If the execution time has taken more seconds than the value of TaskDuration specifies, the script exits. There is a precedence constraint that checks that the value of the TaskSuccessful variable is still False, then the Send an Alert task is run. In the sample, the Send an Alert task is a script task, but in a real package it would likely be a Send Mail task.

image

As stated above, the real work happens in the script. Basically, it performs a check once a second to see if either:

  1. The TaskSuccessful variable is True, meaning that the long running task completed successfully, or
  2. The difference in seconds between the current time and the time when the script started running is greater than the value in the TaskDuration variable.

If either case is true, the script exits. If both cases are false, the script sleeps for a second, then checks again. Putting the thread to sleep helps make sure that CPU isn’t tied up checking the conditions too often.

Public Sub Main()
    Dim vars As Variables
    Dim maxDuration As Integer
    Dim startTime As DateTime = DateTime.Now
    Dim continueWaiting As Boolean = True
    Dim taskCompleted As Boolean = False
 
    Dts.VariableDispenser.LockOneForRead("TaskDuration", vars)
    maxDuration = System.Convert.ToInt32(vars("TaskDuration").Value)
    vars.Unlock()
 
    Do While continueWaiting
        System.Threading.Thread.Sleep(1000)
 
 
        Dts.VariableDispenser.LockOneForRead("TaskSuccessful", vars)
        taskCompleted = System.Convert.ToBoolean(vars("TaskSuccessful").Value)
        vars.Unlock()
 
        If taskCompleted Or _
            (DateDiff(DateInterval.Second, startTime, DateTime.Now) > maxDuration) Then
            continueWaiting = False
        End If
    Loop
 
    Dts.TaskResult = Dts.Results.Success
End Sub

 

There are few things to be aware of with this approach. It’s not foolproof by any means. Any process that ties up 100% of the CPU is likely going to make this fail. Also, this depends on parallel execution of the long running task and the monitoring task. If you set the MaxConcurrentExecutables property in your package to 1, this will not work. It needs to be run on a machine with enough power to start the task at roughly the same time. In my experience, this hasn’t been a problem with the hardware that I work with, but it is a potential failure point.

The sample file is posted on my Skydrive, as usual.