Posts tagged ‘XML’

Using the XML Task to Run an XPath Query

I came across a question about this the other day, and thought I’d share the answer here, as there doesn’t seem to be much information out there about it.  The questioner wanted to know how to read a portion of an XML document into a variable. Fortunately, the XML Task allows you to do this by running an XPath query. To configure the task, follow these steps:

  1. Set the operation type to XPATH.
  2. Specify a source for the XML (file, variable, or directly input).
  3. Set the OperationResult to save the results in a variable.
  4. Set the SecondOperandType to Direct Input.
  5. Specify the XPath query in the SecondOperand.
  6. Set the PutResultInOneNode option to False.
  7. Set the XPathOperation to Node list.

image

If, after configuring this, the variable is empty, it’s likely that the XPath isn’t correct. This can be caused by a number of things. A good way to test your XPath is this web page: http://www.whitebeam.org/library/guide/TechNotes/xpathtestbed.rhtm It allows you to upload your XML file, and then run your XPath query against it.

Another common problem occurs when the XML file uses namespaces. If it does, you can try using XPath like this: //*[local-name()='Setup' and  namespace-uri()='http://test.org']

Hopefully this helps, if you are working with XPath in SSIS.

XML Transformations Part 2

Previously, I posted about using a script destination component to output an XML file. A post on the MSDN forums got me thinking about a slight tweak to the previous post that would result in a little more flexibility. The poster wanted to create an XML string containing his row values to save into a column of a database table. So instead of script destination, what was needed here was a script transform to create a new column.

The data flow is straightforward – a source, script transform, and a OLEDB Destination.

DataFlow

In the script component, each column from the input file is selected. I added a single output column (XML) to hold the results of the script task.

ScriptInput

The script used is below:

Imports System
Imports System.Data
Imports System.Math
Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper
Imports Microsoft.SqlServer.Dts.Runtime.Wrapper
Imports System.Reflection

Public Class ScriptMain
    Inherits UserComponent

    Private Const rowElement As String = "Row"

    Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
        Dim column As IDTSInputColumn90
        Dim rowType As Type = Row.GetType()
        Dim columnValue As PropertyInfo

        Dim newXML As String

        newXML = (FormatElement(rowElement))
        For Each column In Me.ComponentMetaData.InputCollection(0).InputColumnCollection
            columnValue = rowType.GetProperty(column.Name)
            newXML += ((FormatElement(column.Name) + columnValue.GetValue(Row, Nothing).ToString() + FormatElement(column.Name, True)))
        Next
        newXML += (FormatElement(rowElement, True))

        Row.XML = newXML
    End Sub

    Private Function FormatElement(ByVal elementName As String) As String
        Return FormatElement(elementName, False)
    End Function

    Private Function FormatElement(ByVal elementName As String, ByVal closingTag As Boolean) As String
        Dim returnValue As String
        If closingTag Then
            returnValue = "</"
        Else
            returnValue = "<"
        End If
        returnValue += elementName + ">"

        Return returnValue
    End Function

End Class

The script is very similar to the one from the previous article, though it is a bit simpler. The XML values are prepared by iterating through the Input Collection metadata, and using the Reflection classes to read the Row object dynamically. The resulting value is put into the XML column on the output.

The OLEDB Destination simply maps the XML column to an XML column on a database table.

This is a good alternate approach to the previous one, particularly if you need to send the XML to a database instead of a flat file.

XML Destination Script Component

I often hear comments along the lines of “Why doesn’t SSIS have an XML destination?”. Unfortunately, I don’t know the answer to that, but I can show you how to implement your own XML destination fairly easily. Thanks to the magic of the script component, it doesn’t take much work at all. This was also a good opportunity for me to use a destination script component, as I haven’t had much need for this in my work.


One caveat to this is that I have kept the XML format very simple and very generic. You can customize the script to handle more complex scenarios, but I’d imagine there would be a lot of work in producing a component that can handle all the different ways of formatting data in XML. Maybe that’s why MS didn’t include it out of the box :)


As an example, I have a very simple package with a single data flow. The data flow has a flat file source, and a script destination component.


XML Dest


The flat file source is standard, nothing interesting there. The work is all done in the script. When adding the script component, make sure to specify that it is a destination component.


When configuring the script component destination, I chose all the input columns that I wanted to include in the XML (in this case, all of the columns from my flat file).


DestScriptColumns


There is no need to alter anything on the Inputs and Outputs page. Since this is a destination component, no output is necessary. On the Connection Managers page, I added a reference to a connection manager of type ”File”, and named it Destination.


DestScriptCM


This is the complete script for the component. I’ll explain it section by section below.

Imports System
Imports System.Data
Imports System.Math
Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper
Imports Microsoft.SqlServer.Dts.Runtime.Wrapper
Imports System.IO
Imports System.Reflection

Public Class ScriptMain
Inherits UserComponent

Private targetFile As String
Private xmlWriter As StreamWriter
Private rootElement As String = “Root”
Private rowElement As String = “Row”

Public Overrides Sub AcquireConnections(ByVal Transaction As Object)
targetFile = CType(Me.Connections.Destination.AcquireConnection(Nothing), String)
End Sub

Public Overrides Sub PreExecute()
xmlWriter = New StreamWriter(targetFile, False)
xmlWriter.WriteLine(FormatElement(rootElement))
End Sub

Public Overrides Sub PostExecute()
xmlWriter.WriteLine(FormatElement(rootElement, True))
xmlWriter.Close()
End Sub

Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
Dim column As IDTSInputColumn90
Dim rowType As Type = Row.GetType()
Dim columnValue As PropertyInfo

With xmlWriter
.Write(FormatElement(rowElement))
For Each column In Me.ComponentMetaData.InputCollection(0).InputColumnCollection
columnValue = rowType.GetProperty(column.Name)
.Write(FormatElement(column.Name) + columnValue.GetValue(Row, Nothing).ToString() + FormatElement(column.Name, True))
Next
.WriteLine(FormatElement(rowElement, True))
End With
End Sub

Private Function FormatElement(ByVal elementName As String) As String
Return FormatElement(elementName, False)
End Function

Private Function FormatElement(ByVal elementName As String, ByVal closingTag As Boolean) As String
Dim returnValue As String
If closingTag Then
returnValue = “</”
Else
returnValue = “<”
End If
returnValue += elementName + “>”

Return returnValue
End Function

End Class


I added the System.IO and System.Reflection to the Imports as I am using objects from both namespaces. There are 4 class level variables defined:

    Private targetFile As String
Private xmlWriter As StreamWriter
Private rootElement As String = “Root”
Private rowElement As String = “Row”

xmlWriter is a StreamWriter, the .NET Framework object used to create the XML file. For more information on this, please see the MSDN documentation. rootElement is the value to enclose the entire XML document with, and rowElement defines what to enclose each individual row in. By changing the values in these variables, you affect what the final XML output looks like.


targetFile holds the path and file name for the destination file. It is set in the AcquireConnections method. Interesting note: when you are working with a file connection manager, calling AcquireConnection simply returns the path and file name of the file. Why not just use a variable or set it explicitly? Doing it this way makes the script behave more like a standard SSIS destination, and makes it clearer what you’d need to change to put the file in a new destination. As a plus, if you set an expression or configuration on the ConnectionString property of the file connection manager, the script will use it automatically.

    Public Overrides Sub AcquireConnections(ByVal Transaction As Object)
targetFile = CType(Me.Connections.Destination.AcquireConnection(Nothing), String)
End Sub

In the PreExecute method, the code creates a new instance of the StreamWriter, and writes the opening tag of the XML file.

    Public Overrides Sub PreExecute()
xmlWriter = New StreamWriter(targetFile, False)
xmlWriter.WriteLine(FormatElement(rootElement))
End Sub

The ProcessInputRow method is where most of the work occurs. I’m using the System.Reflection and the ComponentMetaData objects to process the Row object dynamically. Essentially, this code determines what input columns are available at runtime, based on what was selected in the Script component’s property pages. It then writes each of those columns to the file, surrounding it with an XML tag based on the column name.

    Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
Dim column As IDTSInputColumn90
Dim rowType As Type = Row.GetType()
Dim columnValue As PropertyInfo

With xmlWriter
.Write(FormatElement(rowElement))
For Each column In Me.ComponentMetaData.InputCollection(0).InputColumnCollection
columnValue = rowType.GetProperty(column.Name)
.Write(FormatElement(column.Name) + columnValue.GetValue(Row, Nothing).ToString() + FormatElement(column.Name, True))
Next
.WriteLine(FormatElement(rowElement, True))
End With
End Sub


 Finally, in the PostExecute method, the closing tag is written, and the file is closed.

    Public Overrides Sub PostExecute()
xmlWriter.WriteLine(FormatElement(rootElement, True))
xmlWriter.Close()
End Sub

 With the script component in place, if I read in a text file with these values:


NAME;VALUE;DATE
A;1;1/1/2000
A;2;1/2/2000
A;3;1/3/2000
A;4;1/4/2000
A;5;1/5/2000

I get an output that looks like this:

<Root>
<Row><NAME>A</NAME><VALUE>1</VALUE><DATE>1/1/2000 12:00:00 AM</DATE></Row>
<Row><NAME>A</NAME><VALUE>2</VALUE><DATE>1/2/2000 12:00:00 AM</DATE></Row>
<Row><NAME>A</NAME><VALUE>3</VALUE><DATE>1/3/2000 12:00:00 AM</DATE></Row>
<Row><NAME>A</NAME><VALUE>4</VALUE><DATE>1/4/2000 12:00:00 AM</DATE></Row>
<Row><NAME>A</NAME><VALUE>5</VALUE><DATE>1/5/2000 12:00:00 AM</DATE></Row>
</Root>

The nicest thing about this script, in my opinion, is that it does not have to be modified if the input columns change. The use of the System.Reflection classes allows the code to process the Row object at runtime. This does come at the expense of performance, though, so I wouldn’t recommend using this in high volume scenarios without some tweaks.


 


 


 


 

Dynamically Unpivoting Columns to Rows

There’s been a few questions on the forums recently about taking column values and pivoting them to rows. For example, I might have an input file with 5 columns: a key column and 4 category columns.


Key; C1; C2; C3; C4
1;1;2;3;4
2;1;2;3;4
3;1;2;3;4
4;1;2;3;4


and I’d like an output that has 2 columns (the key and the category value):


1;1
1;2
1;3
1;4
2;1
2;2
2;3


Since my input has a set number of columns, I can use the Unpivot Transform to get the desired result.



But what if I have a variable number of columns to pivot? In that case the input file might look like this:


1;1;2;3
2;1;2;3;4;5
3;1
4;1;2;3;4;5;6;7;8
5;1;2;3;4;5;6;7;8;9
6;1;2;3


The first column still indicates the key, but there are a variable number of categories. In Handling Flat Files with Varying Numbers of Columns, I showed how to handle the varying number of columns by treating each row as a single column. This post is going to expand on that technique and show how to incorporate a pivot into the script.


The connection manager and the flat file source should be set up the same way as in the previous post, so I won’t cover that again. The difference is in the script component. Since I want to output more rows than I’m getting as inputs, I’m setting the script up with an asynchronous output. That is done by setting the SynchronousInputID property to None. Two columns were added to the output, one for the key, and one for the category value.



This is the code inside the script task:


Dim Values() As String
Dim i As Integer

Values = Row.Column0.Split(CChar(“;”))

‘Array is zero based – but zero index is the key value,
‘so start with 1
For i = 1 To Values.GetUpperBound(0)
    Output0Buffer.AddRow()
    Output0Buffer.Key = CInt(Values(0))
    Output0Buffer.Value = CInt(Values(i))
Next

The Split function is used to break the string up into individual columns in an array. The code loops through the array, starting with the second column, since the first one represents the key. For each iteration through the loop, a row is added to the output buffer, and the key and and value are added to the row.


That’s pretty much all there is to it. This technique can be used to handle flat files and columns that contain multiple items (in XML, or as a delimited list).