Posts tagged ‘Script’

Sending an Alert if a Package Runs Too Long

A question that has come up in a few different situations recently is how to send someone an alert if your package is taking too long to execute. For example, you might have a package that loads a large number rows from a source system to a destination on a nightly basis. There are a number of other packages that depend on this one to complete before they can begin running. The initial package normally takes 15 minutes to run, but on occasion, the source system can be under heavy load, so the package can run 45 minutes to an hour. In this scenario, you’d like to alert people that the load will be delayed.

If you are using an scheduling tool, you should first check to see whether it has this capability. Ideally, you want this monitoring process outside of the package itself. If it is internal to the package, then there is potential for the alerts not to be sent, if the DTEXEC process has really gotten hung up. Some scheduling tools have the capability to monitor the duration of the execution. However, SQL Agent does not have this, so you may need to implement this in the package itself. Another reason you may need to do it in the package is to monitor the duration of a specific set of tasks, but not the entire package.

Fortunately, this isn’t too difficult to implement in SSIS. All it takes is a little scripting. I’ve set up a sample package that illustrates how to do this. The main package looks like this:

image

The Long Running Task script task simulates a long running task by running a specific number of seconds (the value of the TaskDelay variable below). The Mark Task Successfully Completed sets the TaskSuccessful variable to True, so that the package knows when the long running task completed. The Monitor Duration script task is the one that actually does the checking on execution time, based on the value in the TaskDuration variable. If the execution time has taken more seconds than the value of TaskDuration specifies, the script exits. There is a precedence constraint that checks that the value of the TaskSuccessful variable is still False, then the Send an Alert task is run. In the sample, the Send an Alert task is a script task, but in a real package it would likely be a Send Mail task.

image

As stated above, the real work happens in the script. Basically, it performs a check once a second to see if either:

  1. The TaskSuccessful variable is True, meaning that the long running task completed successfully, or
  2. The difference in seconds between the current time and the time when the script started running is greater than the value in the TaskDuration variable.

If either case is true, the script exits. If both cases are false, the script sleeps for a second, then checks again. Putting the thread to sleep helps make sure that CPU isn’t tied up checking the conditions too often.

Public Sub Main()
    Dim vars As Variables
    Dim maxDuration As Integer
    Dim startTime As DateTime = DateTime.Now
    Dim continueWaiting As Boolean = True
    Dim taskCompleted As Boolean = False
 
    Dts.VariableDispenser.LockOneForRead("TaskDuration", vars)
    maxDuration = System.Convert.ToInt32(vars("TaskDuration").Value)
    vars.Unlock()
 
    Do While continueWaiting
        System.Threading.Thread.Sleep(1000)
 
 
        Dts.VariableDispenser.LockOneForRead("TaskSuccessful", vars)
        taskCompleted = System.Convert.ToBoolean(vars("TaskSuccessful").Value)
        vars.Unlock()
 
        If taskCompleted Or _
            (DateDiff(DateInterval.Second, startTime, DateTime.Now) > maxDuration) Then
            continueWaiting = False
        End If
    Loop
 
    Dts.TaskResult = Dts.Results.Success
End Sub

 

There are few things to be aware of with this approach. It’s not foolproof by any means. Any process that ties up 100% of the CPU is likely going to make this fail. Also, this depends on parallel execution of the long running task and the monitoring task. If you set the MaxConcurrentExecutables property in your package to 1, this will not work. It needs to be run on a machine with enough power to start the task at roughly the same time. In my experience, this hasn’t been a problem with the hardware that I work with, but it is a potential failure point.

The sample file is posted on my Skydrive, as usual.

Importing Files Using SSIS

A topic that has come up a few times recently is the idea of loading a set of files into a database using SSIS. One of the data flow components in SSIS is the Import Column transform, which allows you to load files into a binary column. There is a great video by Brian Knight that shows how to use it, and I recommend viewing that  (12/01/2013 update: evidently, the video this post originally linked to is no longer available). If you are looking for a quick overview of the component, and a different approach for loading the list of filenames to import, then read on.

 

The Import Column transform works in the data flow, and imports one file for each row that is passed through it. It expects a column that contains the file name to import as an input. It outputs a column of type DT_TEXT, DT_NTEXT, or DT_IMAGE, that contains the file contents.

 

I’ve included a sample package with this post that uses the Import Column transform. It has a single data flow that uses a script component to get the list of files to import.

 

pic1[4]

 

The package has two connection managers, one of which points to a SQL Server database where the files will be stored. The other connection manager is a File connection manager, that is pointed to a folder. This is the folder that we want to import the files from.

 

p1

 

The script component was created as a Source. A single output column of type DT_WSTR was added to contain the filenames.

 

1

 

On the connection managers page, the File connection manager is specified so that it can be accessed from the script.

 

1[4]

 

The script uses the Directory class from the System.IO namespace. By calling the GetFiles method, the code can iterate through all of the files in the directory, and output one row for each file.

 

Imports System
Imports System.Data
Imports System.Math
Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper
Imports Microsoft.SqlServer.Dts.Runtime.Wrapper
Imports System.IO

Public Class ScriptMain
    Inherits UserComponent

    Public Overrides Sub CreateNewOutputRows()
        Dim fileName As String

        For Each fileName In Directory.GetFiles(Me.Connections.ImportFilesDir.AcquireConnection(Nothing).ToString())
            Output0Buffer.AddRow()
            Output0Buffer.Filename = fileName
        Next

        Output0Buffer.SetEndOfRowset()
    End Sub

End Class

 

The next component is the Import Columns component. Configuring it can be a little difficult. I found the Books Online documentation a little unclear on what had to be done. On the Input Columns tab, the column that contains the filename (including path) to import needs to be selected.

 

1[1]

 

On the Input and Output Properties tab, a new column was added to hold the binary contents of the file. When adding this column, make a note of the LineageID value, as it needs to be used in the next step.

 

1[1]

 

After adding the output column, the input column (that contains the filename, not the file contents), needs to be selected. The LinageID from the previous step needs to be put into the FileDataColumnID property. This tells the component which column to populate with the file contents.

 

1[3]

 

The OLE DB Destination is fairly straightforward, as it just maps the columns from the data flow to the database.

 

Hopefully this helps if you are working with the Import Column transform. The samples are located on my Skydrive.

Checking for the Existence of a File

The File System Task in SSIS doesn’t support checking to see if a file exists. You can work around this easily with a script task. Create a new script task and add a Imports statement referencing the System.IO namespace to the top of the script.

Imports System.IO

Then add the following to the Main method:

If File.Exists(Dts.Connections(“ConnMgrA”).AcquireConnection(Nothing).ToString()) Then
Dts.TaskResult = Dts.Results.Success
Else
Dts.TaskResult = Dts.Results.Failure
End If

This script checks the file referenced by the ConnMgrA connection manager. If it exists, the script task returns Success, meaning execution will follow the Success constraint from the Script Task. If the file does not exist, the task will fail, and the Error constraint will be used. You could also set a variable with the results, and use that in an expression on a precedence constraint.


If, instead of using a connection manager, you want to get the file name from a variable, you can replace the If statement with the following:

If File.Exists(ReadVariable(“FileNameVariable”).ToString()) Then

The variable locking is occurring in the ReadVariable method. To see the definition for it, please refer to Daniel Read’s blog post here. This is a good practice to follow when working with variables in Script Tasks.


There is a Connect posting here requesting that the File System Task be enhanced to support checking for a file’s existence. If you’d like to see this in a future version, please vote for issue.

Address Columns Generically In a Script Component

When writing script components in SSIS, it’s often useful to be able to process the columns without knowing exactly what they are. For example, you may want to loop through all the columns to check for a conditional, like NULL values in the columns. Or you may want to take a set of columns and concatenate them into a single string for output, or output each one as an XML element. In any of these cases, you don’t necessarily care about the individual column details.


One option is to use Reflection. This is fairly easy to set up, but is not the fastest performing option. To use Reflection, add this to the top of the Script class:

Imports System.Reflection

In the ProcessInputRow, you can use the following code:

    Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
Dim column As IDTSInputColumn90
Dim rowType As Type = Row.GetType()
Dim columnValue As PropertyInfo

For Each column In Me.ComponentMetaData.InputCollection(0).InputColumnCollection
columnValue = rowType.GetProperty(column.Name)
If columnValue.GetValue(Row, Nothing).ToString() = “January” Then
columnValue.SetValue(Row, String.Empty, Nothing)
End If
Next
End Sub


This example is checking the value of each column to see if it is “January”, and if it is, writing an empty string back to the column. Be aware that this option is not the best performing option. It’s OK for small data sets, but I would not use it for large ones. You can get better performance from the Reflection option by caching the PropertyInfo objects for each column in an instance variable, but if you are really interested in performance, keep reading.


This option is a bit more complex, but performs better. This is based off some code originally written by MSDN Forum member jaegd (original post here). It’s been simplified a good bit, but that introduces some limitations. As written, this isn’t very type safe, so be sure to add code to check the data type if you are not working with strings.

Public Class ScriptMain
Inherits UserComponent
Private inputBuffer As PipelineBuffer

Public Overrides Sub ProcessInput(ByVal InputID As Integer, ByVal Buffer As Microsoft.SqlServer.Dts.Pipeline.PipelineBuffer)
inputBuffer = Buffer
MyBase.ProcessInput(InputID, Buffer)
End Sub

Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
Dim counter As Integer = 0

For counter = 0 To inputBuffer.ColumnCount – 1
If inputBuffer.Item(counter).ToString() = “January” Then
inputBuffer.Item(counter) = String.Empty
End If
Next
End Sub
End Class


In the ProcessInput method, the Buffer object is cached, as we’ll need it to reference the columns by their index. In the ProcessInputRow method, we can use the Item property and the column index to read from or write to the column.


That’s two options for working with columns generically in the Script component. As mentioned above, use the second option for performance.



As usual, I’ve posted a sample with both methods on my SkyDrive.

Dynamically Pivoting Rows to Columns

Previously, I posted about dynamically un-pivoting columns to rows. In this post, we’ll look at the reverse scenario. Suppose that you have a list of sales orders, in this format:
























Sales Order Number Product
SO-001 Hammer
SO-001 Table Saw
SO-001 Cordless Screwdriver
SO-002 Table Saw
SO-002 Wrench
SO-003 Flashlight

If there is a (reasonable) maximum number of products, you can create the maximum number of columns needed, and use the Pivot transform. However, if the number of products per order vary significantly, you might want to use something a little more flexible. This can be accomplished by storing the values in a single column as XML, or as a delimited set of values. 


For this example, we’ll take the input described above and transform it to this:















Sales Order Number Product
SO-001 Hammer|Table Saw|Cordless Screwdriver
SO-002 Table Saw|Wrench
SO-003 Flashlight

The data flow for this is straightforward – a Flat File Source, a Script component, and a Multicast (so that a data viewer can be added to see the results of the script). The Script component is where the bulk of the work is done.


image 


The script transform is configured with a single asynchronous output, by setting Output 0′s SynchronousInputID property to “None”. Two columns are added to Output 0, one for the Sales Order Number, and one for the list of products.


image


The script itself is listed below. Essentially, the script checks each row to see if the sales order number has changed from the previous row. If it is the same, it appends the product to a list. If the sales order number is different, the currently stored values for sales order number and the product list are sent to the asynchronous output.


There is some special handling for the first and last rows. On the first row, we don’t want to output anything, as we need to check the next row first. When the input buffer is finished, we need send the last value for sales order number and product list, and then call the SetEndOfRowSet on the output buffer. To make sure we’ve processed all the rows, we need to check EndOfRowSet on the input buffer. This should be checked in the Input0_ProcessInput routine, as it will never be equal to true in the Input0_ProcessInputRow method.


 



   1: Public Class ScriptMain
   2:     Inherits UserComponent
   3:  
   4:     Private orderNum As String
   5:     Private productList As String
   6:     Private blnFirstRow As Boolean = True
   7:  
   8:     Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
   9:  
  10:         If Row.SalesOrderNumber = orderNum Then
  11:             ‘append this item to the list
  12:             productList = productList + “|” + Row.Product
  13:         Else
  14:             If blnFirstRow Then
  15:                 blnFirstRow = False
  16:             Else
  17:                 ‘output the current values
  18:                 Output0Buffer.AddRow()
  19:                 Output0Buffer.SalesOrderNumber = orderNum
  20:                 Output0Buffer.ProductList = productList
  21:             End If
  22:  
  23:             ‘reset the control variables
  24:             orderNum = Row.SalesOrderNumber
  25:             productList = Row.Product
  26:         End If
  27:     End Sub
  28:  
  29:     Public Overrides Sub Input0_ProcessInput(ByVal Buffer As Input0Buffer)
  30:         MyBase.Input0_ProcessInput(Buffer)
  31:         If Buffer.EndOfRowset Then
  32:             Output0Buffer.AddRow()
  33:             Output0Buffer.SalesOrderNumber = orderNum
  34:             Output0Buffer.ProductList = productList
  35:             Output0Buffer.SetEndOfRowset()
  36:         End If
  37:     End Sub
  38: End Class

This can be useful if you need to pivot a varied number of columns. The same techniques can also be used anywhere you want to process a set of rows and change the output based on previous row values. If you’d like to take a more in-depth look, the sample files are on my SkyDrive:



 


 

Using the WMI Data Reader in SSIS

A recent post on the MSDN forums was asking about using the WMI Data Reader Task to retrieve information and send it to a SQL Server destination. I wasn’t very familiar with the task, so I decided to try it out. I wasn’t expecting much, given some of my experiences with the WMI Event Watcher (which I have found fairly difficult to get working consistently). However, it was very easier to configure and use.


WMI (Windows Management Instrumentation) allows you to get information about your hardware, software, and  network. I’m not going to cover it in depth (I’m not anywhere close to being knowledgeable about it), but if you want more information, there is an article from Microsoft on using the WMI Query Language here (http://www.microsoft.com/technet/scriptcenter/resources/begin/ss1206.mspx).


As a sample, I am querying the WIn32_Service class to get a list of services running on my computer, and the start mode for each of them. The query for this is:

SELECT NAME, STARTMODE FROM Win32_Service

The control flow contains two tasks, the WMI Data Reader Task and a data flow to write the results out. For sample purposes, I am only sending it to a Multicast, but it could be sent to any destination.

image           image

The WMI Data Reader Task is configured with the query in the WqlQuerySource, and the OutputType is set to “Data table”. The results are being put into a variable (User::WMIVar) of type Object.


image


The task stores the result of the WMI Query in the specified variable as an ADO.NET DataTable object. This variable is used in the data flow, within a Script Source component. The Script component is configured with two output columns, one for the service name and one for the StartupType. Both columns are using the DT_WSTR(50) data type. The WMIVar is passed in as a Read Only Variable. 


image        image


The Script Source shreds the DataTable into the data flow.

    Public Overrides Sub CreateNewOutputRows()
Dim dataTable As System.Data.DataTable
Dim dataRow As System.Data.DataRow

dataTable = CType(Me.Variables.WMIVar, Data.DataTable)

For Each dataRow In dataTable.Rows
Output0Buffer.AddRow()

Output0Buffer.Name = dataRow.Item(“NAME”).ToString()
Output0Buffer.StartupType = dataRow.Item(“STARTMODE”).ToString()
Next

Output0Buffer.SetEndOfRowset()
End Sub


That’s all that was necessary to use the results from the WMI Data Reader task. It was pretty straightforward to configure, and it runs quickly as well.