How to Export BLOB data to Files in SSIS

Overview

This topic will cover how to export data from BLOB fields in SQL Server and save those files out to the filesystem. This is accomplished within VB.Net code, but could just as easily be ported over to C#.

DBServer: SQL Server 2008 SP2
Development Environment: Visual Studio 2008

Package Setup

  1. Add an Execute SQL Task that retrieves the BLOB information from the Database
    • ResultSet: Full result set
    • Parameter Mapping: Add any necessary parameters for your stored procedure
    • Result Set
      • Result Name: 0 (zero)
      • Variable Name: Doesn’t matter, so long as the variable is of type System.Object
  2. Connect the Execute SQL Task to a For Each Loop Container (FELC)
    1. Collection: Foreach ADO Enumerator
    2. ADO object source variable: Result Set Variable Name from Step 1
    3. Enumeration mode: Rows in the first table
    4. Variable Mapping: Map out the columns from the Result Set to Package Variables (Index is 0-based)
      1. NOTE: Be sure to setup the BLOB output variable as a System.Object
  3. Add a Script Task to output the BLOB information to the file system

Script Code (VB.Net)


Public Sub SaveMemoryStream(ByVal buffer2 As Array, ByVal FileName As String, ByVal ms As MemoryStream)
        Dim outStream As FileStream

        outStream = File.OpenWrite(FileName)
        outStream.Write(buffer2, 0, ms.Position)
        outStream.Flush()
        outStream.Close()
    End Sub

    Public Sub Main()
        Dim Folder_Path As String
        Dim File_Name As String
        Dim s_File_Name As String
        Dim buffer() As Byte
        Dim oStream As System.IO.MemoryStream = Nothing
        Dim oFileStream As System.IO.FileStream = Nothing
        Dim buffer2() As Byte

        Folder_Path = Dts.Variables("Working_Directory").Value.ToString() + "\"
        File_Name = Dts.Variables("File_Path").Value.ToString()

        s_File_Name = Folder_Path & File_Name

        Try
            buffer = CType(Dts.Variables("Blob").Value, Byte())
            oStream = New System.IO.MemoryStream(buffer)
            oStream.Write(buffer, 0, buffer.Length)

            buffer2 = oStream.ToArray()
            SaveMemoryStream(buffer2, s_File_Name, oStream)

            'Close the stream
            oStream.Flush()

            Dts.TaskResult = ScriptResults.Success
        Catch ex As Exception
            Dts.Events.FireError(0, "Create Temp Image", ex.Message, "", 0)
            Dts.TaskResult = ScriptResults.Failure

        Finally
            If Not oStream Is Nothing Then
                oStream.Close()
            End If

            oStream = Nothing

            If Not oFileStream Is Nothing Then
                oFileStream.Close()
            End If

            oFileStream = Nothing
        End Try
        '
        Dts.TaskResult = ScriptResults.Success
    End Sub

How to Extract SAP Data in SSIS

This tutorial will step you through installing the Microsoft SAP ADO.Net connector and extracting data from SAP, all via SSIS. But first, some background information.

Overview

This week brought me the unique challenge of extracting data from SAP within SSIS. My boss had already done some work on researching components, and the official word was that we were going to use a product from Theobald Software — Extract IS. After downloading a trial version and working on getting connected to SAP (which took waaaay longer than it should have, namely because of sparse documentation), I was disappointed to realize that the component was not returning all of the records within my dataset. No matter what date range I specified for my table extract, I still received only 2,476 records — FAR fewer than the multi-millions in the table.

That brings us to where we are today. While troubleshooting my problems with the Theobald solution, I ran across the Microsoft SAP ADO.Net connector. I figured it wouldn’t hurt to see if I had the same issues with this connector; that way, I could rule out if it was a Theobald issue or not. Turns out that I was able to successfully connect to SAP and extract my data!!

Now, onto the important stuff: How to make it work!

Making It Work

There were many moving parts involved in getting this working, from installing SAP Adapters on my local machine to installing custom RFC Functions on the SAP box. We will go over all of that in detail in this post.

Configuring Local Environment

We need to start with installing components on the local machine first, since we will use some of the output later on for installing necessary items in the SAP environment.

Installing Necessary Software

  1. Visual C++ 2005 SP1 Redistributable: If you do not already have it installed, grab the download from here. Accept defaults for installation
  2. WCF LOB Adapter SDK Install: Grab the install files from here, picking the appropriate download file for your system architecture (x86 vs x64). Under Choose Setup Type, select Custom. In the resulting window, ensure that Runtime and Samples are both selected to install to local disk. Accept remaining defaults.
  3. BizTalk Adapter Pack install: Grab the install files from here, picking the appropriate download file for your system architecture (x86 vs x64). Under Choose Setup Type, select Custom. In the resulting window navigate to Base Adapters –> Microsoft BixTalk Adapter 3.0 for mySAP Business Suite, and ensure it is selected to install to local disk. Also, navigate to ADO Providers –> .Net Framework Data Provider for mySAP Business Suite, and ensure it is selected to install to local disk. Accept remaining defaults.

You should now have an ADO.NET SAP connector available for your use in SSIS:

You will need to secure the following information from your SAP Administrator in order to connect to SAP from SSIS:

  • Application Server Host
  • System Number
  • Client
  • Language
  • Username
  • Password

All the options under Messaging Server and Destination can be left blank in the SAP ADO.Net connection manager configuration page. Finally, click Test Connection to ensure that everything is good to go.

Configuring SAP Environment

The details of the SAP Environment configuration are beyond the scope of this blog post [especially since someone else installed the custom RFC function and set permissions, not me =)]. However, in the attached PPT Presentation, you can find detailed instructions for SAP configuration, from slides 28-39. Also, reference the attached help file for more detailed information regarding SAP configuration and setup.

Usage

Now that we have both pieces of the puzzle working together, it’s time to get some data! Fire up BIDS, and create a new Integrations Services Project. In that new project, add a Data Flow Task to the Control Flow. Within the Data Flow Task, drag over ADO NET Source from your toolbox. Having already configured the SAP Connection Manager above, select it as the ADO.NET connection manager from the drop down box. Under Data access mode, select  SQL command. Then, proceed to type in your SQL Command.

NOTE: SQL Syntax for SAP is ever so slightly different from normal T-SQL syntax. Have a look here for some important information. Pay particular attention to dates, if you are using those as part of your WHERE clause.

Click OK to exit the ADO.NET Source Editor. Next, connect up an OLE DB Destination to dump your data; open the destination and map the columns appropriately. If you have not already done so, create the table into which you will dump your data.

Hit F5 on the keyboard and let the good times roll!!

Conclusion

In this post, you learned how to install the SAP ADO.Net Adapters, configure the necessary RFC Functions on the SAP side, and extract data from a SAP System. I’ve uploaded the files referenced in this post to my SkyDrive. If you have any questions, please feel free to leave a comment!

XML as Source in SSIS — Part 2

Now that our data is cleansed, onto loading the data into our table! The cleanest, most sane approach that I have found to parsing through XML within a SSIS script task is by utilizing Linq to XML. More information can be found on this here. In order to use Linq to XML within your script task, you need to add the following references:

using System.Xml;
using System.Linq;
using System.Xml.Linq;

The flow of the package basically as follows: For each file in the input folder, read the contents into a StreamReader, pick out the pieces of XML that we care about, and repeat.

The XML files that I’m working with have three levels. For illustrations’ sake, we will call them Grandfather, Father, and Child. The Grandfather (Route) has multiple children (Trips), who then have multiple children (Stop(s)).

Assign Values to Variables

Now that we have added the necessary references to our script task, the next task is to code the task.

foreach (var routes in (from route in StreamReader(outfilename, "ExtRoute")
                   select new
                   {
                       //Route-level attributes
                       RouteKey = (string)route.Attributes("RouteKey").FirstOrDefault(),
                       ActualStartTime = (string)route.Attributes("start").FirstOrDefault(),
                       ActualStopTime = (string)route.Attributes("end").FirstOrDefault(),
                       PlannedStartTime = (string)route.Attributes("DepartDate").FirstOrDefault(),
                       PlannedRunTime = (string)route.Attributes("PlannedElapsedTime").FirstOrDefault(),
                       ActualDriverName = (string)route.Attributes("driver").FirstOrDefault(),
                       ActualDriverID = (string)route.Attributes("driverid").FirstOrDefault(),
                       ActualTruckID = (string)route.Attributes("truck").FirstOrDefault(),
                       RouteUnPlanned = (string)route.Attributes("unplanned").FirstOrDefault(),
                       RoutingDate = (string)route.Attributes("RoutingDate").FirstOrDefault(),
                       Route = (string)route.Attributes("id").FirstOrDefault(),
                       Branch = (string)route.Attributes("group").FirstOrDefault(),
                       Trips = (from trip in route.Elements()
                        select new
                        {
                           //Trip-level attributes
                           TripKey = (string)trip.Attributes("TripKey").FirstOrDefault(),

                           //Stop-level attributes
                           Stops = (from stop in trip.Elements()
                            select new
                            {
                                LocationType = (string)stop.Attributes("LocationType").FirstOrDefault(),
                                Longitude = (string)stop.Attributes("Longitude").FirstOrDefault(),
                                Latitude = (string)stop.Attributes("Latitude").FirstOrDefault(),
                                CustomerName = (string)stop.Attributes("locationName").FirstOrDefault(),
                                CustomerNumber = (string)stop.Attributes("LocationRefNumber").FirstOrDefault(),
                                ActualSequence = (string)stop.Attributes("actualSequence").FirstOrDefault(),
                                GeoStop = (string)stop.Attributes("geoStop").FirstOrDefault(),
                                ArrivalTime = (string)stop.Attributes("arrivalTime").FirstOrDefault(),
                                DepartureTime = (string)stop.Attributes("departureTime").FirstOrDefault(),
                                StopDuration = (string)stop.Attributes("stopDuration").FirstOrDefault(),
                                ActualDistance = (string)stop.Attributes("ActualDistance").FirstOrDefault(),
                                DeltaDistance = (string)stop.Attributes("DeltaDistance").FirstOrDefault(),
                                DeltaTimeDriven = (string)stop.Attributes("DeltaTimeDriven").FirstOrDefault(),
                                WorkTime = (string)stop.Attributes("WorkTime").FirstOrDefault()
                            })
                       })
                   }
                 ))

The code above uses the IEnumerable C# function to quickly and efficiently query only the XML nodes that we care about. The following code is used for this purpose:

{
                    foreach (var trip in routes.Trips)
                        foreach (var stop in trip.Stops)
                        {
                            DefaultBuffer.AddRow();
                            DefaultBuffer.TripKey = trip.TripKey;
                            DefaultBuffer.RouteKey = routes.RouteKey;
                            DefaultBuffer.CustomerName = stop.CustomerName;
                            DefaultBuffer.CustomerNumber = stop.CustomerNumber;
                           //etc...

                }

Be sure to notice the beginning and ending curly brackets. Don’t forget them — they’re important!!

Conclusion

I hope that these two articles have given you some insight into a quick and efficient way to import XML data into a SQL database. Any thoughts, suggestions, or criticisms are welcomed!

XML as Source in SSIS

XML is becoming increasingly popular in the workplace. Many systems are using it as a way to transmit data efficiently across the network. While XML certainly has its place, consuming it and deconstructing it can be a real headache.

Unfortunately, the native XML Source components in SSIS are lousy for such a purpose as this. Before I get into the details, I’ll provide a high-level overview.

I was provided roughly 1,000 XML files that I had to import into a SQL database. Sounded easy enough. However, the XML files had no in-line schema, and I was not provided a stand-alone schema so I had to derive one, again using the XML Source components. This would seem fine; however, the schema makes a “best-guess” with relation to the XML file that is presented. It guesses on the datatypes; so in one file, you could have an attribute such as filevalue=”1″ and the schema parsing says that that attribute’s datatype is a Bit. However, the next XML file has filevalue=”2432″. In this case, the import fails because 2432 is obviously not a Bit.

Removing Unwanted Data

Any and all users will tell you “My data is clean!”. But reality often proves different. Eventually, as the project morphed, these individual XML files were rolled into one BIG XML file, formatting, metadata and all. As such, I needed a way to strip all of the superfluous junk out of the file on the fly. Enter — C# StringExtension! I just add this as a script task inside my For Each Loop Container to strip out the junk from each file as it’s processed.

public static class StringExtension
    {
        private static Dictionary<string, string> _replacements = new Dictionary<string, string>();
        static StringExtension()
        {
            _replacements["<?xml version=\"1.0\" encoding=\"ISO-8859-1\"?>"] = "";
            _replacements["<parmExtRoute>"] = "";
            _replacements["</parmExtRoute>"] = "";
            _replacements["<ReportCacheID>"] = "";
            _replacements["</ReportCacheID>"] = "";
            // etc...

        }
        public static string clean(this string s)
        {
            foreach (string to_replace in _replacements.Keys)
            {
                s = s.Replace(to_replace, _replacements[to_replace]);
            }
            return s;
        }
    }
        public void Main()
        {
            string filename = Dts.Variables["_FileName"].Value.ToString();
            string outfilename = Dts.Variables["_FileName"].Value.ToString();
            try
            {
                StreamReader sr = new StreamReader(filename);

                string content = sr.ReadToEnd();
                content = content.clean();
                sr.Close();

                StreamWriter sw = new StreamWriter(outfilename);
                sw.WriteLine("<Root>");
                sw.WriteLine(content);
                sw.WriteLine("</Root>");

                sw.Close();
                Dts.TaskResult = (int)ScriptResults.Success;
            }
            catch (Exception e)
            {
                Console.WriteLine("{0} Exception caught.", e);
                Dts.TaskResult = (int)ScriptResults.Failure;
            }
        }
    }

All that needs to be modified is the key dictionary pairs, adding additional pairs for whatever needs to be parsed out. Then, simply call the string extension on any string variable that needs to be cleansed, as I’ve done above. I’ve also taken the liberty to surround the entire XML fragment with a singular <root></root> node, so that everything will be happy.

Now, onto importing the data into our table!

Create XML Fragment in SQL Server

The XML datatype, first introduced in SQL Server 2005, can be very  handy when utilized properly. It is an industry standard, and with it, it is easy to import/export data across different servers, as well as different systems entirely. This tutorial will guide you through creating an XML fragment on the fly in SQL Server. Additional resources, as well as reference materials can be found at the bottom of this post. These instructions are valid for both SQL Server 2005 and SQL Server 2008.

First off, we need some sample data:

DECLARE @Students TABLE (
 FirstName nvarchar(50),
 LastName nvarchar(50),
 DisplayName nvarchar(100)
)
INSERT INTO @Students
SELECT 'Jim' as FirstName, 'Bob' as LastName, 'Jim Bob' as DisplayName
UNION
SELECT 'John', 'Doe', 'John Doe'
UNION
SELECT 'Jane', 'Doe', 'Jane Doe'
UNION
SELECT 'Yuri', 'Tao', 'Yuri Tao'

Now that we have our test data, we can attempt to create an XML string for each distinct row. Ultimately, we want our table to end up like the sample below:

XML

We could try our first option:

select * from @Students FOR XML RAW

However, this option returns an XML fragment for the entire dataset – not what we’re looking for here. We need an individual XML fragment for each row.

After searching around on the net and a few forums, I was finally able to get an answer:

SELECT *
FROM @Students s
CROSS APPLY
(
 SELECT
 (
 SELECT *
 FROM @Students t
 WHERE t.DisplayName = s.DisplayName
 FOR XML RAW
 ) x1
)x

The CROSS APPLY function basically creates a Cartesian product. In this case, it iterates over each row in our table and produces the desired XML result. However, if you were to run this query as-is on our sample data, you would notice that the XML output, while formatted as XML, isn’t of an XML datatype.

To fix this, simply convert the column, like so:

SELECT s.*, CONVERT(XML,x.x1) as ErrorData
FROM @Students s
CROSS APPLY
(
 SELECT
 (
 SELECT *
 FROM @Students t
 WHERE t.DisplayName = s.DisplayName
 FOR XML RAW
 ) x1
)x

That’s all there is to it! We now have an XML fragment for each row of data in our sample. As always, test the example for performance bottlenecks. I’ve utilized this 50,000 records and it returns in about 1 minute. Not too bad, given that the records are very wide (40-50 columns).

Later, I will write a post detailing querying and shredding XML data into a relational dataset – good stuff indeed!!

Additional Resources: