Posts tagged ‘XML’

XML as Source in SSIS — Part 2

Now that our data is cleansed, onto loading the data into our table! The cleanest, most sane approach that I have found to parsing through XML within a SSIS script task is by utilizing Linq to XML. More information can be found on this here. In order to use Linq to XML within your script task, you need to add the following references:

using System.Xml;
using System.Linq;
using System.Xml.Linq;

The flow of the package basically as follows: For each file in the input folder, read the contents into a StreamReader, pick out the pieces of XML that we care about, and repeat.

The XML files that I’m working with have three levels. For illustrations’ sake, we will call them Grandfather, Father, and Child. The Grandfather (Route) has multiple children (Trips), who then have multiple children (Stop(s)).

Assign Values to Variables

Now that we have added the necessary references to our script task, the next task is to code the task.

foreach (var routes in (from route in StreamReader(outfilename, "ExtRoute")
                   select new
                   {
                       //Route-level attributes
                       RouteKey = (string)route.Attributes("RouteKey").FirstOrDefault(),
                       ActualStartTime = (string)route.Attributes("start").FirstOrDefault(),
                       ActualStopTime = (string)route.Attributes("end").FirstOrDefault(),
                       PlannedStartTime = (string)route.Attributes("DepartDate").FirstOrDefault(),
                       PlannedRunTime = (string)route.Attributes("PlannedElapsedTime").FirstOrDefault(),
                       ActualDriverName = (string)route.Attributes("driver").FirstOrDefault(),
                       ActualDriverID = (string)route.Attributes("driverid").FirstOrDefault(),
                       ActualTruckID = (string)route.Attributes("truck").FirstOrDefault(),
                       RouteUnPlanned = (string)route.Attributes("unplanned").FirstOrDefault(),
                       RoutingDate = (string)route.Attributes("RoutingDate").FirstOrDefault(),
                       Route = (string)route.Attributes("id").FirstOrDefault(),
                       Branch = (string)route.Attributes("group").FirstOrDefault(),
                       Trips = (from trip in route.Elements()
                        select new
                        {
                           //Trip-level attributes
                           TripKey = (string)trip.Attributes("TripKey").FirstOrDefault(),

                           //Stop-level attributes
                           Stops = (from stop in trip.Elements()
                            select new
                            {
                                LocationType = (string)stop.Attributes("LocationType").FirstOrDefault(),
                                Longitude = (string)stop.Attributes("Longitude").FirstOrDefault(),
                                Latitude = (string)stop.Attributes("Latitude").FirstOrDefault(),
                                CustomerName = (string)stop.Attributes("locationName").FirstOrDefault(),
                                CustomerNumber = (string)stop.Attributes("LocationRefNumber").FirstOrDefault(),
                                ActualSequence = (string)stop.Attributes("actualSequence").FirstOrDefault(),
                                GeoStop = (string)stop.Attributes("geoStop").FirstOrDefault(),
                                ArrivalTime = (string)stop.Attributes("arrivalTime").FirstOrDefault(),
                                DepartureTime = (string)stop.Attributes("departureTime").FirstOrDefault(),
                                StopDuration = (string)stop.Attributes("stopDuration").FirstOrDefault(),
                                ActualDistance = (string)stop.Attributes("ActualDistance").FirstOrDefault(),
                                DeltaDistance = (string)stop.Attributes("DeltaDistance").FirstOrDefault(),
                                DeltaTimeDriven = (string)stop.Attributes("DeltaTimeDriven").FirstOrDefault(),
                                WorkTime = (string)stop.Attributes("WorkTime").FirstOrDefault()
                            })
                       })
                   }
                 ))

The code above uses the IEnumerable C# function to quickly and efficiently query only the XML nodes that we care about. The following code is used for this purpose:

{
                    foreach (var trip in routes.Trips)
                        foreach (var stop in trip.Stops)
                        {
                            DefaultBuffer.AddRow();
                            DefaultBuffer.TripKey = trip.TripKey;
                            DefaultBuffer.RouteKey = routes.RouteKey;
                            DefaultBuffer.CustomerName = stop.CustomerName;
                            DefaultBuffer.CustomerNumber = stop.CustomerNumber;
                           //etc...

                }

Be sure to notice the beginning and ending curly brackets. Don’t forget them — they’re important!!

Conclusion

I hope that these two articles have given you some insight into a quick and efficient way to import XML data into a SQL database. Any thoughts, suggestions, or criticisms are welcomed!

XML as Source in SSIS

XML is becoming increasingly popular in the workplace. Many systems are using it as a way to transmit data efficiently across the network. While XML certainly has its place, consuming it and deconstructing it can be a real headache.

Unfortunately, the native XML Source components in SSIS are lousy for such a purpose as this. Before I get into the details, I’ll provide a high-level overview.

I was provided roughly 1,000 XML files that I had to import into a SQL database. Sounded easy enough. However, the XML files had no in-line schema, and I was not provided a stand-alone schema so I had to derive one, again using the XML Source components. This would seem fine; however, the schema makes a “best-guess” with relation to the XML file that is presented. It guesses on the datatypes; so in one file, you could have an attribute such as filevalue=”1″ and the schema parsing says that that attribute’s datatype is a Bit. However, the next XML file has filevalue=”2432″. In this case, the import fails because 2432 is obviously not a Bit.

Removing Unwanted Data

Any and all users will tell you “My data is clean!”. But reality often proves different. Eventually, as the project morphed, these individual XML files were rolled into one BIG XML file, formatting, metadata and all. As such, I needed a way to strip all of the superfluous junk out of the file on the fly. Enter — C# StringExtension! I just add this as a script task inside my For Each Loop Container to strip out the junk from each file as it’s processed.

public static class StringExtension
    {
        private static Dictionary<string, string> _replacements = new Dictionary<string, string>();
        static StringExtension()
        {
            _replacements["<?xml version=\"1.0\" encoding=\"ISO-8859-1\"?>"] = "";
            _replacements["<parmExtRoute>"] = "";
            _replacements["</parmExtRoute>"] = "";
            _replacements["<ReportCacheID>"] = "";
            _replacements["</ReportCacheID>"] = "";
            // etc...

        }
        public static string clean(this string s)
        {
            foreach (string to_replace in _replacements.Keys)
            {
                s = s.Replace(to_replace, _replacements[to_replace]);
            }
            return s;
        }
    }
        public void Main()
        {
            string filename = Dts.Variables["_FileName"].Value.ToString();
            string outfilename = Dts.Variables["_FileName"].Value.ToString();
            try
            {
                StreamReader sr = new StreamReader(filename);

                string content = sr.ReadToEnd();
                content = content.clean();
                sr.Close();

                StreamWriter sw = new StreamWriter(outfilename);
                sw.WriteLine("<Root>");
                sw.WriteLine(content);
                sw.WriteLine("</Root>");

                sw.Close();
                Dts.TaskResult = (int)ScriptResults.Success;
            }
            catch (Exception e)
            {
                Console.WriteLine("{0} Exception caught.", e);
                Dts.TaskResult = (int)ScriptResults.Failure;
            }
        }
    }

All that needs to be modified is the key dictionary pairs, adding additional pairs for whatever needs to be parsed out. Then, simply call the string extension on any string variable that needs to be cleansed, as I’ve done above. I’ve also taken the liberty to surround the entire XML fragment with a singular <root></root> node, so that everything will be happy.

Now, onto importing the data into our table!

Create XML Fragment in SQL Server

The XML datatype, first introduced in SQL Server 2005, can be very  handy when utilized properly. It is an industry standard, and with it, it is easy to import/export data across different servers, as well as different systems entirely. This tutorial will guide you through creating an XML fragment on the fly in SQL Server. Additional resources, as well as reference materials can be found at the bottom of this post. These instructions are valid for both SQL Server 2005 and SQL Server 2008.

First off, we need some sample data:

DECLARE @Students TABLE (
 FirstName nvarchar(50),
 LastName nvarchar(50),
 DisplayName nvarchar(100)
)
INSERT INTO @Students
SELECT 'Jim' as FirstName, 'Bob' as LastName, 'Jim Bob' as DisplayName
UNION
SELECT 'John', 'Doe', 'John Doe'
UNION
SELECT 'Jane', 'Doe', 'Jane Doe'
UNION
SELECT 'Yuri', 'Tao', 'Yuri Tao'

Now that we have our test data, we can attempt to create an XML string for each distinct row. Ultimately, we want our table to end up like the sample below:

XML

We could try our first option:

select * from @Students FOR XML RAW

However, this option returns an XML fragment for the entire dataset – not what we’re looking for here. We need an individual XML fragment for each row.

After searching around on the net and a few forums, I was finally able to get an answer:

SELECT *
FROM @Students s
CROSS APPLY
(
 SELECT
 (
 SELECT *
 FROM @Students t
 WHERE t.DisplayName = s.DisplayName
 FOR XML RAW
 ) x1
)x

The CROSS APPLY function basically creates a Cartesian product. In this case, it iterates over each row in our table and produces the desired XML result. However, if you were to run this query as-is on our sample data, you would notice that the XML output, while formatted as XML, isn’t of an XML datatype.

To fix this, simply convert the column, like so:

SELECT s.*, CONVERT(XML,x.x1) as ErrorData
FROM @Students s
CROSS APPLY
(
 SELECT
 (
 SELECT *
 FROM @Students t
 WHERE t.DisplayName = s.DisplayName
 FOR XML RAW
 ) x1
)x

That’s all there is to it! We now have an XML fragment for each row of data in our sample. As always, test the example for performance bottlenecks. I’ve utilized this 50,000 records and it returns in about 1 minute. Not too bad, given that the records are very wide (40-50 columns).

Later, I will write a post detailing querying and shredding XML data into a relational dataset – good stuff indeed!!

Additional Resources:

Delete Multiple Connection Managers SSIS

Have you ever wanted to save yourself some time by copying a package, renaming it, and then modifying its contents to suit your needs? What if the package that you are copying from has 60 connection managers, and your new package only needs 1? In SSIS, there is no ability to do a “select all” on the connection managers that you don’t need and delete them. So, we could go hand by hand and delete every single one by right-clicking it and clicking Delete.

But where’s the fun in that?! :) I asked one of my co-workers about this scenario, and he suggested editing the XML of the package directly, which was a GREAT idea! (I wish I would have thought of that myself)

If you wish to delete these connections en masse, follow the instructions below.

1) Close your package if you currently have it open.

2) Right-click on your package and click View Code. This opens your package in XML mode.

3) Look for XML tags such as the one below (these blocks of code are the ones that you will want to delete):

<pre><DTS:ConnectionManager>

 

All you need to do now is to delete the block of code associated with the Connection Managers that you wish to delete.

That’s it! Now you don’t have to worry about wasting time trying to delete multiple Connection Managers one-by-one.