Now that our data is cleansed, onto loading the data into our table! The cleanest, most sane approach that I have found to parsing through XML within a SSIS script task is by utilizing Linq to XML. More information can be found on this here. In order to use Linq to XML within your script task, you need to add the following references:
[sourcecode language=”csharp”]
using System.Xml;
using System.Linq;
using System.Xml.Linq;
[/sourcecode]
The flow of the package basically as follows: For each file in the input folder, read the contents into a StreamReader, pick out the pieces of XML that we care about, and repeat.
The XML files that I’m working with have three levels. For illustrations’ sake, we will call them Grandfather, Father, and Child. The Grandfather (Route) has multiple children (Trips), who then have multiple children (Stop(s)).
Assign Values to Variables
Now that we have added the necessary references to our script task, the next task is to code the task.
[sourcecode language=”csharp”]
foreach (var routes in (from route in StreamReader(outfilename, "ExtRoute")
select new
{
//Route-level attributes
RouteKey = (string)route.Attributes("RouteKey").FirstOrDefault(),
ActualStartTime = (string)route.Attributes("start").FirstOrDefault(),
ActualStopTime = (string)route.Attributes("end").FirstOrDefault(),
PlannedStartTime = (string)route.Attributes("DepartDate").FirstOrDefault(),
PlannedRunTime = (string)route.Attributes("PlannedElapsedTime").FirstOrDefault(),
ActualDriverName = (string)route.Attributes("driver").FirstOrDefault(),
ActualDriverID = (string)route.Attributes("driverid").FirstOrDefault(),
ActualTruckID = (string)route.Attributes("truck").FirstOrDefault(),
RouteUnPlanned = (string)route.Attributes("unplanned").FirstOrDefault(),
RoutingDate = (string)route.Attributes("RoutingDate").FirstOrDefault(),
Route = (string)route.Attributes("id").FirstOrDefault(),
Branch = (string)route.Attributes("group").FirstOrDefault(),
Trips = (from trip in route.Elements()
select new
{
//Trip-level attributes
TripKey = (string)trip.Attributes("TripKey").FirstOrDefault(),
//Stop-level attributes
Stops = (from stop in trip.Elements()
select new
{
LocationType = (string)stop.Attributes("LocationType").FirstOrDefault(),
Longitude = (string)stop.Attributes("Longitude").FirstOrDefault(),
Latitude = (string)stop.Attributes("Latitude").FirstOrDefault(),
CustomerName = (string)stop.Attributes("locationName").FirstOrDefault(),
CustomerNumber = (string)stop.Attributes("LocationRefNumber").FirstOrDefault(),
ActualSequence = (string)stop.Attributes("actualSequence").FirstOrDefault(),
GeoStop = (string)stop.Attributes("geoStop").FirstOrDefault(),
ArrivalTime = (string)stop.Attributes("arrivalTime").FirstOrDefault(),
DepartureTime = (string)stop.Attributes("departureTime").FirstOrDefault(),
StopDuration = (string)stop.Attributes("stopDuration").FirstOrDefault(),
ActualDistance = (string)stop.Attributes("ActualDistance").FirstOrDefault(),
DeltaDistance = (string)stop.Attributes("DeltaDistance").FirstOrDefault(),
DeltaTimeDriven = (string)stop.Attributes("DeltaTimeDriven").FirstOrDefault(),
WorkTime = (string)stop.Attributes("WorkTime").FirstOrDefault()
})
})
}
))
[/sourcecode]
The code above uses the IEnumerable C# function to quickly and efficiently query only the XML nodes that we care about. The following code is used for this purpose:
[sourcecode language=”csharp”]
{
foreach (var trip in routes.Trips)
foreach (var stop in trip.Stops)
{
DefaultBuffer.AddRow();
DefaultBuffer.TripKey = trip.TripKey;
DefaultBuffer.RouteKey = routes.RouteKey;
DefaultBuffer.CustomerName = stop.CustomerName;
DefaultBuffer.CustomerNumber = stop.CustomerNumber;
//etc…
}
[/sourcecode]
Be sure to notice the beginning and ending curly brackets. Don’t forget them — they’re important!!
Conclusion
I hope that these two articles have given you some insight into a quick and efficient way to import XML data into a SQL database. Any thoughts, suggestions, or criticisms are welcomed!