Posts tagged ‘SSIS’

How to Extract SAP Data in SSIS

This tutorial will step you through installing the Microsoft SAP ADO.Net connector and extracting data from SAP, all via SSIS. But first, some background information.

Overview

This week brought me the unique challenge of extracting data from SAP within SSIS. My boss had already done some work on researching components, and the official word was that we were going to use a product from Theobald Software — Extract IS. After downloading a trial version and working on getting connected to SAP (which took waaaay longer than it should have, namely because of sparse documentation), I was disappointed to realize that the component was not returning all of the records within my dataset. No matter what date range I specified for my table extract, I still received only 2,476 records — FAR fewer than the multi-millions in the table.

That brings us to where we are today. While troubleshooting my problems with the Theobald solution, I ran across the Microsoft SAP ADO.Net connector. I figured it wouldn’t hurt to see if I had the same issues with this connector; that way, I could rule out if it was a Theobald issue or not. Turns out that I was able to successfully connect to SAP and extract my data!!

Now, onto the important stuff: How to make it work!

Making It Work

There were many moving parts involved in getting this working, from installing SAP Adapters on my local machine to installing custom RFC Functions on the SAP box. We will go over all of that in detail in this post.

Configuring Local Environment

We need to start with installing components on the local machine first, since we will use some of the output later on for installing necessary items in the SAP environment.

Installing Necessary Software

  1. Visual C++ 2005 SP1 Redistributable: If you do not already have it installed, grab the download from here. Accept defaults for installation
  2. WCF LOB Adapter SDK Install: Grab the install files from here, picking the appropriate download file for your system architecture (x86 vs x64). Under Choose Setup Type, select Custom. In the resulting window, ensure that Runtime and Samples are both selected to install to local disk. Accept remaining defaults.
  3. BizTalk Adapter Pack install: Grab the install files from here, picking the appropriate download file for your system architecture (x86 vs x64). Under Choose Setup Type, select Custom. In the resulting window navigate to Base Adapters –> Microsoft BixTalk Adapter 3.0 for mySAP Business Suite, and ensure it is selected to install to local disk. Also, navigate to ADO Providers –> .Net Framework Data Provider for mySAP Business Suite, and ensure it is selected to install to local disk. Accept remaining defaults.

You should now have an ADO.NET SAP connector available for your use in SSIS:

You will need to secure the following information from your SAP Administrator in order to connect to SAP from SSIS:

  • Application Server Host
  • System Number
  • Client
  • Language
  • Username
  • Password

All the options under Messaging Server and Destination can be left blank in the SAP ADO.Net connection manager configuration page. Finally, click Test Connection to ensure that everything is good to go.

Configuring SAP Environment

The details of the SAP Environment configuration are beyond the scope of this blog post [especially since someone else installed the custom RFC function and set permissions, not me =)]. However, in the attached PPT Presentation, you can find detailed instructions for SAP configuration, from slides 28-39. Also, reference the attached help file for more detailed information regarding SAP configuration and setup.

Usage

Now that we have both pieces of the puzzle working together, it’s time to get some data! Fire up BIDS, and create a new Integrations Services Project. In that new project, add a Data Flow Task to the Control Flow. Within the Data Flow Task, drag over ADO NET Source from your toolbox. Having already configured the SAP Connection Manager above, select it as the ADO.NET connection manager from the drop down box. Under Data access mode, select  SQL command. Then, proceed to type in your SQL Command.

NOTE: SQL Syntax for SAP is ever so slightly different from normal T-SQL syntax. Have a look here for some important information. Pay particular attention to dates, if you are using those as part of your WHERE clause.

Click OK to exit the ADO.NET Source Editor. Next, connect up an OLE DB Destination to dump your data; open the destination and map the columns appropriately. If you have not already done so, create the table into which you will dump your data.

Hit F5 on the keyboard and let the good times roll!!

Conclusion

In this post, you learned how to install the SAP ADO.Net Adapters, configure the necessary RFC Functions on the SAP side, and extract data from a SAP System. I’ve uploaded the files referenced in this post to my SkyDrive. If you have any questions, please feel free to leave a comment!

XML as Source in SSIS — Part 2

Now that our data is cleansed, onto loading the data into our table! The cleanest, most sane approach that I have found to parsing through XML within a SSIS script task is by utilizing Linq to XML. More information can be found on this here. In order to use Linq to XML within your script task, you need to add the following references:

using System.Xml;
using System.Linq;
using System.Xml.Linq;

The flow of the package basically as follows: For each file in the input folder, read the contents into a StreamReader, pick out the pieces of XML that we care about, and repeat.

The XML files that I’m working with have three levels. For illustrations’ sake, we will call them Grandfather, Father, and Child. The Grandfather (Route) has multiple children (Trips), who then have multiple children (Stop(s)).

Assign Values to Variables

Now that we have added the necessary references to our script task, the next task is to code the task.

foreach (var routes in (from route in StreamReader(outfilename, "ExtRoute")
                   select new
                   {
                       //Route-level attributes
                       RouteKey = (string)route.Attributes("RouteKey").FirstOrDefault(),
                       ActualStartTime = (string)route.Attributes("start").FirstOrDefault(),
                       ActualStopTime = (string)route.Attributes("end").FirstOrDefault(),
                       PlannedStartTime = (string)route.Attributes("DepartDate").FirstOrDefault(),
                       PlannedRunTime = (string)route.Attributes("PlannedElapsedTime").FirstOrDefault(),
                       ActualDriverName = (string)route.Attributes("driver").FirstOrDefault(),
                       ActualDriverID = (string)route.Attributes("driverid").FirstOrDefault(),
                       ActualTruckID = (string)route.Attributes("truck").FirstOrDefault(),
                       RouteUnPlanned = (string)route.Attributes("unplanned").FirstOrDefault(),
                       RoutingDate = (string)route.Attributes("RoutingDate").FirstOrDefault(),
                       Route = (string)route.Attributes("id").FirstOrDefault(),
                       Branch = (string)route.Attributes("group").FirstOrDefault(),
                       Trips = (from trip in route.Elements()
                        select new
                        {
                           //Trip-level attributes
                           TripKey = (string)trip.Attributes("TripKey").FirstOrDefault(),

                           //Stop-level attributes
                           Stops = (from stop in trip.Elements()
                            select new
                            {
                                LocationType = (string)stop.Attributes("LocationType").FirstOrDefault(),
                                Longitude = (string)stop.Attributes("Longitude").FirstOrDefault(),
                                Latitude = (string)stop.Attributes("Latitude").FirstOrDefault(),
                                CustomerName = (string)stop.Attributes("locationName").FirstOrDefault(),
                                CustomerNumber = (string)stop.Attributes("LocationRefNumber").FirstOrDefault(),
                                ActualSequence = (string)stop.Attributes("actualSequence").FirstOrDefault(),
                                GeoStop = (string)stop.Attributes("geoStop").FirstOrDefault(),
                                ArrivalTime = (string)stop.Attributes("arrivalTime").FirstOrDefault(),
                                DepartureTime = (string)stop.Attributes("departureTime").FirstOrDefault(),
                                StopDuration = (string)stop.Attributes("stopDuration").FirstOrDefault(),
                                ActualDistance = (string)stop.Attributes("ActualDistance").FirstOrDefault(),
                                DeltaDistance = (string)stop.Attributes("DeltaDistance").FirstOrDefault(),
                                DeltaTimeDriven = (string)stop.Attributes("DeltaTimeDriven").FirstOrDefault(),
                                WorkTime = (string)stop.Attributes("WorkTime").FirstOrDefault()
                            })
                       })
                   }
                 ))

The code above uses the IEnumerable C# function to quickly and efficiently query only the XML nodes that we care about. The following code is used for this purpose:

{
                    foreach (var trip in routes.Trips)
                        foreach (var stop in trip.Stops)
                        {
                            DefaultBuffer.AddRow();
                            DefaultBuffer.TripKey = trip.TripKey;
                            DefaultBuffer.RouteKey = routes.RouteKey;
                            DefaultBuffer.CustomerName = stop.CustomerName;
                            DefaultBuffer.CustomerNumber = stop.CustomerNumber;
                           //etc...

                }

Be sure to notice the beginning and ending curly brackets. Don’t forget them — they’re important!!

Conclusion

I hope that these two articles have given you some insight into a quick and efficient way to import XML data into a SQL database. Any thoughts, suggestions, or criticisms are welcomed!

XML as Source in SSIS

XML is becoming increasingly popular in the workplace. Many systems are using it as a way to transmit data efficiently across the network. While XML certainly has its place, consuming it and deconstructing it can be a real headache.

Unfortunately, the native XML Source components in SSIS are lousy for such a purpose as this. Before I get into the details, I’ll provide a high-level overview.

I was provided roughly 1,000 XML files that I had to import into a SQL database. Sounded easy enough. However, the XML files had no in-line schema, and I was not provided a stand-alone schema so I had to derive one, again using the XML Source components. This would seem fine; however, the schema makes a “best-guess” with relation to the XML file that is presented. It guesses on the datatypes; so in one file, you could have an attribute such as filevalue=”1″ and the schema parsing says that that attribute’s datatype is a Bit. However, the next XML file has filevalue=”2432″. In this case, the import fails because 2432 is obviously not a Bit.

Removing Unwanted Data

Any and all users will tell you “My data is clean!”. But reality often proves different. Eventually, as the project morphed, these individual XML files were rolled into one BIG XML file, formatting, metadata and all. As such, I needed a way to strip all of the superfluous junk out of the file on the fly. Enter — C# StringExtension! I just add this as a script task inside my For Each Loop Container to strip out the junk from each file as it’s processed.

public static class StringExtension
    {
        private static Dictionary<string, string> _replacements = new Dictionary<string, string>();
        static StringExtension()
        {
            _replacements["<?xml version=\"1.0\" encoding=\"ISO-8859-1\"?>"] = "";
            _replacements["<parmExtRoute>"] = "";
            _replacements["</parmExtRoute>"] = "";
            _replacements["<ReportCacheID>"] = "";
            _replacements["</ReportCacheID>"] = "";
            // etc...

        }
        public static string clean(this string s)
        {
            foreach (string to_replace in _replacements.Keys)
            {
                s = s.Replace(to_replace, _replacements[to_replace]);
            }
            return s;
        }
    }
        public void Main()
        {
            string filename = Dts.Variables["_FileName"].Value.ToString();
            string outfilename = Dts.Variables["_FileName"].Value.ToString();
            try
            {
                StreamReader sr = new StreamReader(filename);

                string content = sr.ReadToEnd();
                content = content.clean();
                sr.Close();

                StreamWriter sw = new StreamWriter(outfilename);
                sw.WriteLine("<Root>");
                sw.WriteLine(content);
                sw.WriteLine("</Root>");

                sw.Close();
                Dts.TaskResult = (int)ScriptResults.Success;
            }
            catch (Exception e)
            {
                Console.WriteLine("{0} Exception caught.", e);
                Dts.TaskResult = (int)ScriptResults.Failure;
            }
        }
    }

All that needs to be modified is the key dictionary pairs, adding additional pairs for whatever needs to be parsed out. Then, simply call the string extension on any string variable that needs to be cleansed, as I’ve done above. I’ve also taken the liberty to surround the entire XML fragment with a singular <root></root> node, so that everything will be happy.

Now, onto importing the data into our table!

Delete Multiple Connection Managers SSIS

Have you ever wanted to save yourself some time by copying a package, renaming it, and then modifying its contents to suit your needs? What if the package that you are copying from has 60 connection managers, and your new package only needs 1? In SSIS, there is no ability to do a “select all” on the connection managers that you don’t need and delete them. So, we could go hand by hand and delete every single one by right-clicking it and clicking Delete.

But where’s the fun in that?! :) I asked one of my co-workers about this scenario, and he suggested editing the XML of the package directly, which was a GREAT idea! (I wish I would have thought of that myself)

If you wish to delete these connections en masse, follow the instructions below.

1) Close your package if you currently have it open.

2) Right-click on your package and click View Code. This opens your package in XML mode.

3) Look for XML tags such as the one below (these blocks of code are the ones that you will want to delete):

<pre><DTS:ConnectionManager>

 

All you need to do now is to delete the block of code associated with the Connection Managers that you wish to delete.

That’s it! Now you don’t have to worry about wasting time trying to delete multiple Connection Managers one-by-one.