XML Destination Script Component
I often hear comments along the lines of “Why doesn’t SSIS have an XML destination?”. Unfortunately, I don’t know the answer to that, but I can show you how to implement your own XML destination fairly easily. Thanks to the magic of the script component, it doesn’t take much work at all. This was also a good opportunity for me to use a destination script component, as I haven’t had much need for this in my work.
One caveat to this is that I have kept the XML format very simple and very generic. You can customize the script to handle more complex scenarios, but I’d imagine there would be a lot of work in producing a component that can handle all the different ways of formatting data in XML. Maybe that’s why MS didn’t include it out of the box
As an example, I have a very simple package with a single data flow. The data flow has a flat file source, and a script destination component.
The flat file source is standard, nothing interesting there. The work is all done in the script. When adding the script component, make sure to specify that it is a destination component.
When configuring the script component destination, I chose all the input columns that I wanted to include in the XML (in this case, all of the columns from my flat file).
There is no need to alter anything on the Inputs and Outputs page. Since this is a destination component, no output is necessary. On the Connection Managers page, I added a reference to a connection manager of type ”File”, and named it Destination.
This is the complete script for the component. I’ll explain it section by section below.
Imports System
Imports System.Data
Imports System.Math
Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper
Imports Microsoft.SqlServer.Dts.Runtime.Wrapper
Imports System.IO
Imports System.ReflectionPublic Class ScriptMain
Inherits UserComponentPrivate targetFile As String
Private xmlWriter As StreamWriter
Private rootElement As String = “Root”
Private rowElement As String = “Row”Public Overrides Sub AcquireConnections(ByVal Transaction As Object)
targetFile = CType(Me.Connections.Destination.AcquireConnection(Nothing), String)
End SubPublic Overrides Sub PreExecute()
xmlWriter = New StreamWriter(targetFile, False)
xmlWriter.WriteLine(FormatElement(rootElement))
End SubPublic Overrides Sub PostExecute()
xmlWriter.WriteLine(FormatElement(rootElement, True))
xmlWriter.Close()
End SubPublic Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
Dim column As IDTSInputColumn90
Dim rowType As Type = Row.GetType()
Dim columnValue As PropertyInfoWith xmlWriter
.Write(FormatElement(rowElement))
For Each column In Me.ComponentMetaData.InputCollection(0).InputColumnCollection
columnValue = rowType.GetProperty(column.Name)
.Write(FormatElement(column.Name) + columnValue.GetValue(Row, Nothing).ToString() + FormatElement(column.Name, True))
Next
.WriteLine(FormatElement(rowElement, True))
End With
End SubPrivate Function FormatElement(ByVal elementName As String) As String
Return FormatElement(elementName, False)
End FunctionPrivate Function FormatElement(ByVal elementName As String, ByVal closingTag As Boolean) As String
Dim returnValue As String
If closingTag Then
returnValue = “</”
Else
returnValue = “<”
End If
returnValue += elementName + “>”Return returnValue
End FunctionEnd Class
I added the System.IO and System.Reflection to the Imports as I am using objects from both namespaces. There are 4 class level variables defined:
Private targetFile As String
Private xmlWriter As StreamWriter
Private rootElement As String = “Root”
Private rowElement As String = “Row”
xmlWriter is a StreamWriter, the .NET Framework object used to create the XML file. For more information on this, please see the MSDN documentation. rootElement is the value to enclose the entire XML document with, and rowElement defines what to enclose each individual row in. By changing the values in these variables, you affect what the final XML output looks like.
targetFile holds the path and file name for the destination file. It is set in the AcquireConnections method. Interesting note: when you are working with a file connection manager, calling AcquireConnection simply returns the path and file name of the file. Why not just use a variable or set it explicitly? Doing it this way makes the script behave more like a standard SSIS destination, and makes it clearer what you’d need to change to put the file in a new destination. As a plus, if you set an expression or configuration on the ConnectionString property of the file connection manager, the script will use it automatically.
Public Overrides Sub AcquireConnections(ByVal Transaction As Object)
targetFile = CType(Me.Connections.Destination.AcquireConnection(Nothing), String)
End Sub
In the PreExecute method, the code creates a new instance of the StreamWriter, and writes the opening tag of the XML file.
Public Overrides Sub PreExecute()
xmlWriter = New StreamWriter(targetFile, False)
xmlWriter.WriteLine(FormatElement(rootElement))
End Sub
The ProcessInputRow method is where most of the work occurs. I’m using the System.Reflection and the ComponentMetaData objects to process the Row object dynamically. Essentially, this code determines what input columns are available at runtime, based on what was selected in the Script component’s property pages. It then writes each of those columns to the file, surrounding it with an XML tag based on the column name.
Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
Dim column As IDTSInputColumn90
Dim rowType As Type = Row.GetType()
Dim columnValue As PropertyInfoWith xmlWriter
.Write(FormatElement(rowElement))
For Each column In Me.ComponentMetaData.InputCollection(0).InputColumnCollection
columnValue = rowType.GetProperty(column.Name)
.Write(FormatElement(column.Name) + columnValue.GetValue(Row, Nothing).ToString() + FormatElement(column.Name, True))
Next
.WriteLine(FormatElement(rowElement, True))
End With
End Sub
Finally, in the PostExecute method, the closing tag is written, and the file is closed.
Public Overrides Sub PostExecute()
xmlWriter.WriteLine(FormatElement(rootElement, True))
xmlWriter.Close()
End Sub
With the script component in place, if I read in a text file with these values:
NAME;VALUE;DATE
A;1;1/1/2000
A;2;1/2/2000
A;3;1/3/2000
A;4;1/4/2000
A;5;1/5/2000
I get an output that looks like this:
<Root>
<Row><NAME>A</NAME><VALUE>1</VALUE><DATE>1/1/2000 12:00:00 AM</DATE></Row>
<Row><NAME>A</NAME><VALUE>2</VALUE><DATE>1/2/2000 12:00:00 AM</DATE></Row>
<Row><NAME>A</NAME><VALUE>3</VALUE><DATE>1/3/2000 12:00:00 AM</DATE></Row>
<Row><NAME>A</NAME><VALUE>4</VALUE><DATE>1/4/2000 12:00:00 AM</DATE></Row>
<Row><NAME>A</NAME><VALUE>5</VALUE><DATE>1/5/2000 12:00:00 AM</DATE></Row>
</Root>
The nicest thing about this script, in my opinion, is that it does not have to be modified if the input columns change. The use of the System.Reflection classes allows the code to process the Row object at runtime. This does come at the expense of performance, though, so I wouldn’t recommend using this in high volume scenarios without some tweaks.
Great stuff John. I’ve written something similar before but I’ve never used Reflection to do it. Very cool indeed.
-Jamie
I was looking for a way to create a generic script, and figured that I could use the metadata to make it a bit more dynamic. However, I haven’t done a lot of testing on it, so I’m not sure if there are any hidden problems.
Great stuff! Just what I need!
I got it working for a file with a few columns. But I recieve a .Net error when running it with 45 columns (711 chars record length).
The error is within this part op de code: “columnValue.GetValue(Row, Nothing).ToString()” When I remove that part is ‘works’ without errors. To bad you cann’t debug. Any suggestions?
Compleet Error:
at XmlScriptComponent.ScriptMain.Input0_ProcessInputRow(Input0Buffer Row) in dts://Scripts/ScriptComponent_e23d01095c0747ac9e08095ca648fb1d/ScriptMain:line 45
at XmlScriptComponent.UserComponent.Input0_ProcessInput(Input0Buffer Buffer) in dts://Scripts/ScriptComponent_e23d01095c0747ac9e08095ca648fb1d/ComponentWrapper:line 29
at XmlScriptComponent.UserComponent.ProcessInput(Int32 InputID, PipelineBuffer Buffer) in dts://Scripts/ScriptComponent_e23d01095c0747ac9e08095ca648fb1d/ComponentWrapper:line 21
at Microsoft.SqlServer.Dts.Pipeline.ScriptComponentHost.ProcessInput(Int32 inputID, PipelineBuffer buffer)
Is the column value NULL? That might cause a problem.
> Is the column value NULL? That might cause a problem.
I thought that was the problem too… so I filled a textfile with 10 rows with 711 times an ‘X’, but that didn’t fix the problem. I also added a Dataviewer between the source and the script destination and saw that all columns (all are dt_str) where filled.
After that I wrote a try catch around the “columnValue.GetValue(Row, Nothing).ToString()”:
Try
.Write(FormatElement(column.Name) + columnValue.GetValue(Row, Nothing).ToString() + FormatElement(column.Name, True))
Catch ex As Exception
.Write(FormatElement(column.Name) + ex.Message.ToString() + FormatElement(column.Name, True))
End Try
The result is that some columns have the correct value and others show “Object reference not set to an instance of an object.” as value:
XXX
Object reference not set to an instance of an object.
Object reference not set to an instance of an object.
X
After that I changed the data in de textfile and saw that the same rows failed. But all rows have the same definition. The only difference is the length of the column. I’m totally flabbergasted…
I received the “Object reference not set to an instance of an object.” error also.
I was unable to get the script to run until I changed the columnName variable to a String Type.
Hi Again,
I rewrote your script a little bit en used the ‘ProcessInput’ instead of ‘Input0_ProcessInputRow’ and now it works fine again. Don’t know what went wrong. Thanks for the help. Let me know if you want the source code.
Yes, I’d like to see what you ended up with. I was working with some similar code recently, and getting some very unexpected results. I’m wondering if we weren’t seeing a related problem.
‘the modified code:
Imports System
Imports System.Data
Imports System.Math
Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper
Imports Microsoft.SqlServer.Dts.Runtime.Wrapper
Imports System.IO
Imports System.Reflection
Public Class ScriptMain
Inherits UserComponent
Private targetFile As String
Private xmlWriter As StreamWriter
Private rootElement As String = “Root”
Private rowElement As String = “Row”
Private columns As Integer()
Private columnames As String()
Public Overrides Sub AcquireConnections(ByVal Transaction As Object)
targetFile = CType(Me.Connections.Destination.AcquireConnection(Nothing), String)
End Sub
Public Overrides Sub PreExecute()
xmlWriter = New StreamWriter(targetFile, False)
xmlWriter.WriteLine(FormatElement(rootElement))
Dim input As IDTSInput90 = ComponentMetaData.InputCollection(0)
ReDim columns(input.InputColumnCollection.Count)
columns = Me.GetColumnIndexes(input.ID)
Dim column As IDTSInputColumn90
ReDim columnames(input.InputColumnCollection.Count)
Dim counter As Integer
counter = 0
For Each column In Me.ComponentMetaData.InputCollection(0).InputColumnCollection
columnames(counter) = column.Name
counter = counter + 1
Next
End Sub
Public Overrides Sub PostExecute()
xmlWriter.WriteLine(FormatElement(rootElement, True))
xmlWriter.Close()
End Sub
‘Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
‘ Dim column As IDTSInputColumn90
‘ Dim rowType As Type = Row.GetType()
‘ Dim columnValue As PropertyInfo
‘ With xmlWriter
‘ .Write(FormatElement(rowElement))
‘ For Each column In Me.ComponentMetaData.InputCollection(0).InputColumnCollection
‘ columnValue = rowType.GetProperty(column.Name)
‘ Try
‘ .Write(FormatElement(column.Name) + columnValue.GetValue(Row, Nothing).ToString() + FormatElement(column.Name, True))
‘ Catch ex As Exception
‘ .Write(FormatElement(column.Name) + “” + FormatElement(column.Name, True))
‘ End Try
‘ Next
‘ .WriteLine(FormatElement(rowElement, True))
‘ End With
‘End Sub
Public Overrides Sub ProcessInput(ByVal InputID As Integer, ByVal Buffer As Microsoft.SqlServer.Dts.Pipeline.PipelineBuffer)
While Buffer.NextRow()
‘Dim values As New System.Text.StringBuilder
xmlWriter.Write(FormatElement(rowElement))
Dim counter As Integer
counter = 0
For Each index As Integer In columns
Dim value As Object = Buffer(index)
xmlWriter.Write(FormatElement(columnames(counter).ToString()) + value.ToString() + FormatElement(columnames(counter).ToString(), True))
counter = counter + 1
Next
xmlWriter.WriteLine(FormatElement(rowElement, True))
End While
End Sub
Private Function FormatElement(ByVal elementName As String) As String
Return FormatElement(elementName, False)
End Function
Private Function FormatElement(ByVal elementName As String, ByVal closingTag As Boolean) As String
Dim returnValue As String
If closingTag Then
returnValue = “”
Return returnValue
End Function
End Class
This line still causes Problems when NULL Values are processed:
xmlWriter.Write(FormatElement(columnames(counter).ToString()) + value.ToString() + FormatElement(columnames(counter).ToString(), True))
I used an IF Clause to check this case and write “” instead of value.ToString().
But this is a great script, thank you!
I used a similar approach when I needed to output an XML file this time last year. However, I did reflection on the rowType class so I could detect and skip null values. I also used the overloaded WriteValue method so values like dates are written out in the ISO 8601 format (2000-01-01T00:00:00 instead of 1/1/2000 12:00:00 AM).
Dim writer As XmlTextWriter
Public Overrides Sub PreExecute()
writer = New XmlTextWriter(“data.xml”, Nothing)
writer.Formatting = Formatting.Indented
writer.WriteStartDocument()
writer.WriteStartElement(“Rows”)
End Sub
Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
writer.WriteStartElement(“Row”)
For Each prop As PropertyInfo In rowType.GetProperties()
If Not prop.Name.EndsWith(“IsNull”) Then
If CBool(rowType.GetProperty(prop.Name & _
“_IsNull”).GetValue(Row, Nothing)) = False Then
writer.WriteStartElement(prop.Name)
writer.WriteValue(prop.GetValue(Row, Nothing))
writer.WriteEndElement()
End If
End If
Next
writer.WriteEndElement()
End Sub
Public Overrides Sub PostExecute()
writer.WriteEndElement()
writer.WriteEndDocument()
writer.Close()
End Sub
[...] [...]
Thanks for this solution. Very helpful as always
For anyone using SSIS 2008, the IDTSInputColumn90 needs to be changed to IDTSInputColumn100 .
SSIS removes any “.” and “_” in your field names when it creates the objects in the script component, so “Some_Field” becomes “SomeField”. You need to handle this when getting the columnvalue object:
columnValue = rowType.GetProperty(column.Name.Replace(“_”, “”).Replace(“.”, “”))
Hopefully this saves someone else some time!
Sean
Thanks for the handy script. It really saved me time and reduced my stress level over the suspiciously absent XML Destination.
I am newbie to SSIS and C#
when I am trying to execute it it is giving me following warnings, Can any one please guide me please how to solve this issue and make it working
Warning 1 Return type of function ‘Destination’ is not CLS-compliant. C:\Documents and Settings\umemon01\Local Settings\Temp\SSIS\SC_dba966fa656444d3b5d08a765ea50f03\ComponentWrapper.vb 50 30 SC_dba966fa656444d3b5d08a765ea50f03
Warning 2 ‘ScriptMain’ is not CLS-compliant because it derives from ‘UserComponent’, which is not CLS-compliant. C:\Documents and Settings\umemon01\Local Settings\Temp\SSIS\SC_dba966fa656444d3b5d08a765ea50f03\main.vb 8 14 SC_dba966fa656444d3b5d08a765ea50f03
Warning 3 Type of parameter ‘Row’ is not CLS-compliant. C:\Documents and Settings\umemon01\Local Settings\Temp\SSIS\SC_dba966fa656444d3b5d08a765ea50f03\main.vb 25 55 SC_dba966fa656444d3b5d08a765ea50f03
The script example in the post is VB – you’ll need to change it to work with C#.