Posts tagged ‘Script’

Checking for the Existence of a File

The File System Task in SSIS doesn’t support checking to see if a file exists. You can work around this easily with a script task. Create a new script task and add a Imports statement referencing the System.IO namespace to the top of the script.

Imports System.IO

Then add the following to the Main method:

If File.Exists(Dts.Connections(“ConnMgrA”).AcquireConnection(Nothing).ToString()) Then
Dts.TaskResult = Dts.Results.Success
Else
Dts.TaskResult = Dts.Results.Failure
End If

This script checks the file referenced by the ConnMgrA connection manager. If it exists, the script task returns Success, meaning execution will follow the Success constraint from the Script Task. If the file does not exist, the task will fail, and the Error constraint will be used. You could also set a variable with the results, and use that in an expression on a precedence constraint.


If, instead of using a connection manager, you want to get the file name from a variable, you can replace the If statement with the following:

If File.Exists(ReadVariable(“FileNameVariable”).ToString()) Then

The variable locking is occurring in the ReadVariable method. To see the definition for it, please refer to Daniel Read’s blog post here. This is a good practice to follow when working with variables in Script Tasks.


There is a Connect posting here requesting that the File System Task be enhanced to support checking for a file’s existence. If you’d like to see this in a future version, please vote for issue.

Address Columns Generically In a Script Component

When writing script components in SSIS, it’s often useful to be able to process the columns without knowing exactly what they are. For example, you may want to loop through all the columns to check for a conditional, like NULL values in the columns. Or you may want to take a set of columns and concatenate them into a single string for output, or output each one as an XML element. In any of these cases, you don’t necessarily care about the individual column details.


One option is to use Reflection. This is fairly easy to set up, but is not the fastest performing option. To use Reflection, add this to the top of the Script class:

Imports System.Reflection

In the ProcessInputRow, you can use the following code:

    Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
Dim column As IDTSInputColumn90
Dim rowType As Type = Row.GetType()
Dim columnValue As PropertyInfo

For Each column In Me.ComponentMetaData.InputCollection(0).InputColumnCollection
columnValue = rowType.GetProperty(column.Name)
If columnValue.GetValue(Row, Nothing).ToString() = “January” Then
columnValue.SetValue(Row, String.Empty, Nothing)
End If
Next
End Sub


This example is checking the value of each column to see if it is “January”, and if it is, writing an empty string back to the column. Be aware that this option is not the best performing option. It’s OK for small data sets, but I would not use it for large ones. You can get better performance from the Reflection option by caching the PropertyInfo objects for each column in an instance variable, but if you are really interested in performance, keep reading.


This option is a bit more complex, but performs better. This is based off some code originally written by MSDN Forum member jaegd (original post here). It’s been simplified a good bit, but that introduces some limitations. As written, this isn’t very type safe, so be sure to add code to check the data type if you are not working with strings.

Public Class ScriptMain
Inherits UserComponent
Private inputBuffer As PipelineBuffer

Public Overrides Sub ProcessInput(ByVal InputID As Integer, ByVal Buffer As Microsoft.SqlServer.Dts.Pipeline.PipelineBuffer)
inputBuffer = Buffer
MyBase.ProcessInput(InputID, Buffer)
End Sub

Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
Dim counter As Integer = 0

For counter = 0 To inputBuffer.ColumnCount – 1
If inputBuffer.Item(counter).ToString() = “January” Then
inputBuffer.Item(counter) = String.Empty
End If
Next
End Sub
End Class


In the ProcessInput method, the Buffer object is cached, as we’ll need it to reference the columns by their index. In the ProcessInputRow method, we can use the Item property and the column index to read from or write to the column.


That’s two options for working with columns generically in the Script component. As mentioned above, use the second option for performance.



As usual, I’ve posted a sample with both methods on my SkyDrive.

Dynamically Pivoting Rows to Columns

Previously, I posted about dynamically un-pivoting columns to rows. In this post, we’ll look at the reverse scenario. Suppose that you have a list of sales orders, in this format:
























Sales Order Number Product
SO-001 Hammer
SO-001 Table Saw
SO-001 Cordless Screwdriver
SO-002 Table Saw
SO-002 Wrench
SO-003 Flashlight

If there is a (reasonable) maximum number of products, you can create the maximum number of columns needed, and use the Pivot transform. However, if the number of products per order vary significantly, you might want to use something a little more flexible. This can be accomplished by storing the values in a single column as XML, or as a delimited set of values. 


For this example, we’ll take the input described above and transform it to this:















Sales Order Number Product
SO-001 Hammer|Table Saw|Cordless Screwdriver
SO-002 Table Saw|Wrench
SO-003 Flashlight

The data flow for this is straightforward – a Flat File Source, a Script component, and a Multicast (so that a data viewer can be added to see the results of the script). The Script component is where the bulk of the work is done.


image 


The script transform is configured with a single asynchronous output, by setting Output 0′s SynchronousInputID property to “None”. Two columns are added to Output 0, one for the Sales Order Number, and one for the list of products.


image


The script itself is listed below. Essentially, the script checks each row to see if the sales order number has changed from the previous row. If it is the same, it appends the product to a list. If the sales order number is different, the currently stored values for sales order number and the product list are sent to the asynchronous output.


There is some special handling for the first and last rows. On the first row, we don’t want to output anything, as we need to check the next row first. When the input buffer is finished, we need send the last value for sales order number and product list, and then call the SetEndOfRowSet on the output buffer. To make sure we’ve processed all the rows, we need to check EndOfRowSet on the input buffer. This should be checked in the Input0_ProcessInput routine, as it will never be equal to true in the Input0_ProcessInputRow method.


 



   1: Public Class ScriptMain
   2:     Inherits UserComponent
   3:  
   4:     Private orderNum As String
   5:     Private productList As String
   6:     Private blnFirstRow As Boolean = True
   7:  
   8:     Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
   9:  
  10:         If Row.SalesOrderNumber = orderNum Then
  11:             ‘append this item to the list
  12:             productList = productList + “|” + Row.Product
  13:         Else
  14:             If blnFirstRow Then
  15:                 blnFirstRow = False
  16:             Else
  17:                 ‘output the current values
  18:                 Output0Buffer.AddRow()
  19:                 Output0Buffer.SalesOrderNumber = orderNum
  20:                 Output0Buffer.ProductList = productList
  21:             End If
  22:  
  23:             ‘reset the control variables
  24:             orderNum = Row.SalesOrderNumber
  25:             productList = Row.Product
  26:         End If
  27:     End Sub
  28:  
  29:     Public Overrides Sub Input0_ProcessInput(ByVal Buffer As Input0Buffer)
  30:         MyBase.Input0_ProcessInput(Buffer)
  31:         If Buffer.EndOfRowset Then
  32:             Output0Buffer.AddRow()
  33:             Output0Buffer.SalesOrderNumber = orderNum
  34:             Output0Buffer.ProductList = productList
  35:             Output0Buffer.SetEndOfRowset()
  36:         End If
  37:     End Sub
  38: End Class

This can be useful if you need to pivot a varied number of columns. The same techniques can also be used anywhere you want to process a set of rows and change the output based on previous row values. If you’d like to take a more in-depth look, the sample files are on my SkyDrive:



 


 

Using the WMI Data Reader in SSIS

A recent post on the MSDN forums was asking about using the WMI Data Reader Task to retrieve information and send it to a SQL Server destination. I wasn’t very familiar with the task, so I decided to try it out. I wasn’t expecting much, given some of my experiences with the WMI Event Watcher (which I have found fairly difficult to get working consistently). However, it was very easier to configure and use.


WMI (Windows Management Instrumentation) allows you to get information about your hardware, software, and  network. I’m not going to cover it in depth (I’m not anywhere close to being knowledgeable about it), but if you want more information, there is an article from Microsoft on using the WMI Query Language here (http://www.microsoft.com/technet/scriptcenter/resources/begin/ss1206.mspx).


As a sample, I am querying the WIn32_Service class to get a list of services running on my computer, and the start mode for each of them. The query for this is:

SELECT NAME, STARTMODE FROM Win32_Service

The control flow contains two tasks, the WMI Data Reader Task and a data flow to write the results out. For sample purposes, I am only sending it to a Multicast, but it could be sent to any destination.

image           image

The WMI Data Reader Task is configured with the query in the WqlQuerySource, and the OutputType is set to “Data table”. The results are being put into a variable (User::WMIVar) of type Object.


image


The task stores the result of the WMI Query in the specified variable as an ADO.NET DataTable object. This variable is used in the data flow, within a Script Source component. The Script component is configured with two output columns, one for the service name and one for the StartupType. Both columns are using the DT_WSTR(50) data type. The WMIVar is passed in as a Read Only Variable. 


image        image


The Script Source shreds the DataTable into the data flow.

    Public Overrides Sub CreateNewOutputRows()
Dim dataTable As System.Data.DataTable
Dim dataRow As System.Data.DataRow

dataTable = CType(Me.Variables.WMIVar, Data.DataTable)

For Each dataRow In dataTable.Rows
Output0Buffer.AddRow()

Output0Buffer.Name = dataRow.Item(“NAME”).ToString()
Output0Buffer.StartupType = dataRow.Item(“STARTMODE”).ToString()
Next

Output0Buffer.SetEndOfRowset()
End Sub


That’s all that was necessary to use the results from the WMI Data Reader task. It was pretty straightforward to configure, and it runs quickly as well.