Sending an Alert if a Package Runs Too Long

A question that has come up in a few different situations recently is how to send someone an alert if your package is taking too long to execute. For example, you might have a package that loads a large number rows from a source system to a destination on a nightly basis. There are a number of other packages that depend on this one to complete before they can begin running. The initial package normally takes 15 minutes to run, but on occasion, the source system can be under heavy load, so the package can run 45 minutes to an hour. In this scenario, you’d like to alert people that the load will be delayed.

If you are using an scheduling tool, you should first check to see whether it has this capability. Ideally, you want this monitoring process outside of the package itself. If it is internal to the package, then there is potential for the alerts not to be sent, if the DTEXEC process has really gotten hung up. Some scheduling tools have the capability to monitor the duration of the execution. However, SQL Agent does not have this, so you may need to implement this in the package itself. Another reason you may need to do it in the package is to monitor the duration of a specific set of tasks, but not the entire package.

Fortunately, this isn’t too difficult to implement in SSIS. All it takes is a little scripting. I’ve set up a sample package that illustrates how to do this. The main package looks like this:

image

The Long Running Task script task simulates a long running task by running a specific number of seconds (the value of the TaskDelay variable below). The Mark Task Successfully Completed sets the TaskSuccessful variable to True, so that the package knows when the long running task completed. The Monitor Duration script task is the one that actually does the checking on execution time, based on the value in the TaskDuration variable. If the execution time has taken more seconds than the value of TaskDuration specifies, the script exits. There is a precedence constraint that checks that the value of the TaskSuccessful variable is still False, then the Send an Alert task is run. In the sample, the Send an Alert task is a script task, but in a real package it would likely be a Send Mail task.

image

As stated above, the real work happens in the script. Basically, it performs a check once a second to see if either:

  1. The TaskSuccessful variable is True, meaning that the long running task completed successfully, or
  2. The difference in seconds between the current time and the time when the script started running is greater than the value in the TaskDuration variable.

If either case is true, the script exits. If both cases are false, the script sleeps for a second, then checks again. Putting the thread to sleep helps make sure that CPU isn’t tied up checking the conditions too often.

Public Sub Main()
    Dim vars As Variables
    Dim maxDuration As Integer
    Dim startTime As DateTime = DateTime.Now
    Dim continueWaiting As Boolean = True
    Dim taskCompleted As Boolean = False
 
    Dts.VariableDispenser.LockOneForRead("TaskDuration", vars)
    maxDuration = System.Convert.ToInt32(vars("TaskDuration").Value)
    vars.Unlock()
 
    Do While continueWaiting
        System.Threading.Thread.Sleep(1000)
 
 
        Dts.VariableDispenser.LockOneForRead("TaskSuccessful", vars)
        taskCompleted = System.Convert.ToBoolean(vars("TaskSuccessful").Value)
        vars.Unlock()
 
        If taskCompleted Or _
            (DateDiff(DateInterval.Second, startTime, DateTime.Now) > maxDuration) Then
            continueWaiting = False
        End If
    Loop
 
    Dts.TaskResult = Dts.Results.Success
End Sub

 

There are few things to be aware of with this approach. It’s not foolproof by any means. Any process that ties up 100% of the CPU is likely going to make this fail. Also, this depends on parallel execution of the long running task and the monitoring task. If you set the MaxConcurrentExecutables property in your package to 1, this will not work. It needs to be run on a machine with enough power to start the task at roughly the same time. In my experience, this hasn’t been a problem with the hardware that I work with, but it is a potential failure point.

The sample file is posted on my Skydrive, as usual.

Leave a Reply