How to Extract SAP Data in SSIS

This tutorial will step you through installing the Microsoft SAP ADO.Net connector and extracting data from SAP, all via SSIS. But first, some background information.

Overview

This week brought me the unique challenge of extracting data from SAP within SSIS. My boss had already done some work on researching components, and the official word was that we were going to use a product from Theobald Software — Extract IS. After downloading a trial version and working on getting connected to SAP (which took waaaay longer than it should have, namely because of sparse documentation), I was disappointed to realize that the component was not returning all of the records within my dataset. No matter what date range I specified for my table extract, I still received only 2,476 records — FAR fewer than the multi-millions in the table.

That brings us to where we are today. While troubleshooting my problems with the Theobald solution, I ran across the Microsoft SAP ADO.Net connector. I figured it wouldn’t hurt to see if I had the same issues with this connector; that way, I could rule out if it was a Theobald issue or not. Turns out that I was able to successfully connect to SAP and extract my data!!

Now, onto the important stuff: How to make it work!

Making It Work

There were many moving parts involved in getting this working, from installing SAP Adapters on my local machine to installing custom RFC Functions on the SAP box. We will go over all of that in detail in this post.

Configuring Local Environment

We need to start with installing components on the local machine first, since we will use some of the output later on for installing necessary items in the SAP environment.

Installing Necessary Software

  1. Visual C++ 2005 SP1 Redistributable: If you do not already have it installed, grab the download from here. Accept defaults for installation
  2. WCF LOB Adapter SDK Install: Grab the install files from here, picking the appropriate download file for your system architecture (x86 vs x64). Under Choose Setup Type, select Custom. In the resulting window, ensure that Runtime and Samples are both selected to install to local disk. Accept remaining defaults.
  3. BizTalk Adapter Pack install: Grab the install files from here, picking the appropriate download file for your system architecture (x86 vs x64). Under Choose Setup Type, select Custom. In the resulting window navigate to Base Adapters –> Microsoft BixTalk Adapter 3.0 for mySAP Business Suite, and ensure it is selected to install to local disk. Also, navigate to ADO Providers –> .Net Framework Data Provider for mySAP Business Suite, and ensure it is selected to install to local disk. Accept remaining defaults.

You should now have an ADO.NET SAP connector available for your use in SSIS:

You will need to secure the following information from your SAP Administrator in order to connect to SAP from SSIS:

  • Application Server Host
  • System Number
  • Client
  • Language
  • Username
  • Password

All the options under Messaging Server and Destination can be left blank in the SAP ADO.Net connection manager configuration page. Finally, click Test Connection to ensure that everything is good to go.

Configuring SAP Environment

The details of the SAP Environment configuration are beyond the scope of this blog post [especially since someone else installed the custom RFC function and set permissions, not me =)]. However, in the attached PPT Presentation, you can find detailed instructions for SAP configuration, from slides 28-39. Also, reference the attached help file for more detailed information regarding SAP configuration and setup.

Usage

Now that we have both pieces of the puzzle working together, it’s time to get some data! Fire up BIDS, and create a new Integrations Services Project. In that new project, add a Data Flow Task to the Control Flow. Within the Data Flow Task, drag over ADO NET Source from your toolbox. Having already configured the SAP Connection Manager above, select it as the ADO.NET connection manager from the drop down box. Under Data access mode, select  SQL command. Then, proceed to type in your SQL Command.

NOTE: SQL Syntax for SAP is ever so slightly different from normal T-SQL syntax. Have a look here for some important information. Pay particular attention to dates, if you are using those as part of your WHERE clause.

Click OK to exit the ADO.NET Source Editor. Next, connect up an OLE DB Destination to dump your data; open the destination and map the columns appropriately. If you have not already done so, create the table into which you will dump your data.

Hit F5 on the keyboard and let the good times roll!!

Conclusion

In this post, you learned how to install the SAP ADO.Net Adapters, configure the necessary RFC Functions on the SAP side, and extract data from a SAP System. I’ve uploaded the files referenced in this post to my SkyDrive. If you have any questions, please feel free to leave a comment!

Posted in Uncategorized | Tagged , | Comments Off

Submissions for PASS Summit 2011

I’ve presented at the PASS Summit several times over the past years, and I’ve really enjoyed it every time. The Summit is great event for learning from some of the best in the SQL Server community, and you get a chance to spend time with them at the various events going on at the Summit. This year, the community’s getting the opportunity to vote for their preferred sessions, which I think is a great way to get people involved in the selection process. The deadline is May 20th according to the webpage (though another page says the 30th – probably better to vote early than to miss the deadline).

This year, I’ve submitted several sessions.

Do More (ETL) with Less (Effort) – Automating SSIS

SSIS is a great tool for transferring data from one data source to another, and for implementing complex ETL processes. However, for simple, straightforward data transfer tasks or packages that adhere to a pattern, creating SSIS packages by hand can be time-consuming and repetitious. By attending this session, you’ll learn how to automate package creation in SSIS, including the dynamic generation of data flows. We’ll cover some of the free and open source tools available for this, and discuss “roll your own” options.

Do You Know the Data Flow?

The Data Flow task is one of the most powerful and most complex tools available in SSIS. Whether you are brand new to SSIS, or you’ve been using it for a while, it’s likely you’ve had some questions about the Data Flow. Why are some components so much slower than others? Why can’t I store a value (like a row count) in one component, and use it in another component later in the Data Flow? And why does it always seem to be the part of my package that fails when I run it against real data? Well, you’re not alone. During this session, we’ll answer these questions (and many others) by learning how the Data Flow operates internally. We’ll cover the Data Flow from the basic (what’s a component?) to the advanced (how can I determine how many threads my Data Flow is using?). After attending this session, you’ll know a lot more about getting the most out of Data Flows in SSIS.

Handling Advanced Data Warehouse Scenarios in SSIS

So you’ve used SSIS to populate a simple star schema data mart, and everybody’s happy. But now you have new requirements that require more advanced data warehouse approaches, like late arriving dimensions, bridge tables, parent child dimensions, and Type 3 or Type 6 slowly changing dimensions (SCD). How do you handle those in a scalable, efficient way in SSIS? This session will present some common patterns for handling these scenarios. You’ll learn when to use each advanced approach and the pros and cons associated with each pattern. You will learn how to implement these patterns in SSIS, and how to tune them for high performance.

Tuning Analysis Services Processing Performance

You’ve got your Analysis Services cube created, and deployed in production. However, you notice that every night, the cube is taking longer and longer to process, and users are starting to complain about their data not being ready when they arrive in the morning. If you’ve found yourself in this situation, or want to avoid being in it in the first place, come to this session. We’ll cover how to benchmark processing performance, track down bottlenecks, and how to tune things to get the best performance for processing your cube.

Other Sessions I’d Like to See At the Summit

This is not an exhaustive list, by any means – so if you aren’t on the list, please don’t take it personally. These are the ones that happened to catch my eye as I looked through the list. Some because they had an interesting technical focus, some because I know the speaker will bring an interesting perspective on the topic, and some because…, well, just because I can.

Dynamic Business Rules Processing Using SSIS – Tim Mitchell

Techniques for Automating T-SQL Unit Tests and User Acceptance Testing – Scott Currie

Advanced Analysis Services Development – David Darden

Advanced SSAS Security – Chris Webb

Analysis Services Power Tools – Darren Gosbell

DAX Deep Dive – Marco Russo, Alberto Ferrari

Posted in PASS | Tagged , | Comments Off

Biml Functionality in BIDS Helper

If you follow along with the BIDS Helper project on CodePlex, you may have noticed that we released a new beta version on April 20th. The new build includes some bug fixes, and two new features.

Duplicate Role (SSAS)

The Duplicate Role feature allows you to create a  copy of an existing SSAS role, along with all of it’s settings and permissions. Performing a regular copy and paste on a Role in SSAS will only copy the role membership, not all the permissions. The new feature will copy the role itself, and also copy the permissions associated with the original role. I’ve found this pretty useful over the past few months.

Biml Package Generator (SSIS)

The Biml Package Generator feature is one I’m pretty excited about, as it leverages some of the things I’ve been working on at Varigence over the past year. This feature enables you to use BI Markup Language (Biml) to generate SSIS packages. This is nice for a couple of reasons. First, Biml is a XML based language with a readable syntax, and for a lot of packages, I find it easier to just type the Biml to create the package, rather than using the designers. Second, the Biml compiler includes the ability to script the generation of Biml code, in a similar way to how you can use ASP.NET to script the production of HTML code. Using that, I can easily generate large numbers of packages very easily.

I’m starting a series of posts that will illustrate how to use the Biml functionality in BIDS Helper. This post will serve as an index page to locate these, and they will also be posted to the BIDS Helper documentation and the  Varigence blog.

Part 1: Creating a Basic Package Using Biml
Part 2: Creating Tables using Biml and BimlScript
Part 3: Copy Data Dynamically with BimlScript
Part 4: Controlling the Control Flow with Biml
Part 5: Defining the Data Flow in Biml


As always, if you have questions, let me know in the comments.

Posted in SSIS | Comments Off

Creating a Basic Package Using Biml

This article is going to walk through the process of creating a simple package using Biml and the Biml Package Generator feature in BIDS Helper. To start out, you need to install the latest beta of BIDS Helper from CodePlex. Once that is set up, you should create a new Integration Services project in BIDS. In the project, right-click on the Project in the Solution Explorer. There’s a new item in this menu – Add New Biml File.

image

Clicking Add New Biml File will add a new file to the Miscellaneous folder in the solution named BimlScript.biml. (The name is automatically generated, so it may be BimlScript1.biml, etc). You can right-click on the file and choose rename to give the file a more specific name. For this example, rename the file “BasicPackage.biml”.

Double-clicking on the file will open the XML editor inside of BIDS. The editor supports Intellisense for Biml, so typing an opening tag (“<”) will give you a list of valid options for tags you can use. (If you aren’t seeing the Intellisense, please check this link for troubleshooting steps.)

image

For this example, copy and paste the following Biml into the document. Since the code below includes the document root tags (<Biml>), you’ll want to make sure you replace the entire contents of the Biml file.

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
    <Connections>
        <Connection Name="AdventureWorks" ConnectionString="Server=.;Initial Catalog=AdventureWorks;Integrated Security=SSPI;Provider=SQLNCLI10"/>
    </Connections>
    <Packages>
        <Package Name="Biml Sample" AutoCreateConfigurationsType="None" ConstraintMode="Linear">
            <Tasks>
                <Dataflow Name="Extract Table List">
                    <Transformations>
                        <OleDbSource Name="Get Table List" ConnectionName="AdventureWorks">
                            <DirectInput>SELECT * FROM sys.tables</DirectInput>
                        </OleDbSource>
                        <Multicast Name="Multicast"/>
                    </Transformations>
                </Dataflow>
            </Tasks>
        </Package>
    </Packages>
</Biml>

The first section (<Connections>) of this Biml defines an OleDbConnection that points to the AdventureWorks database. The next section (inside the <Packages> tag) defines a single package that contains a Dataflow task (the <Dataflow> tag). The Dataflow task contains two components, an OleDb Source and an Union All transformation.

The next step is to take this definition of a package, and actually generate the package from it. To do this, right-click on the Biml file, and choose Expand Biml File from the context menu.

image

A new package will be added to the SSIS Packages folder, named Biml Sample.dtsx. If you review the generated package, you’ll see that it matches up to what was defined in the Biml code.

imageimage

That’s a quick introduction to the Biml functionality in BIDS Helper. In the next article, we’ll set the stage for some more advanced (read: more interesting) uses of Biml, including some scripting.

Posted in SSIS | Tagged , , , | Comments Off

Presenting at Carolina Code Camp 2011

I’m going to be presenting a couple of sessions at the Carolina Code Camp this weekend. The Carolina Code Camp is always a good time – well organized, educational, and fun. I’m looking forward to it.

Do More (ETL) with Less (Effort) – Automating SSIS
SSIS is a great tool for transferring data from one data source to another, and for implementing complex ETL processes. However, for simple, straightforward data transfer tasks or packages that adhere to a pattern, creating SSIS packages by hand can be time-consuming and repetitious. By attending this session, you’ll learn how to automate package creation in SSIS, including the dynamic generation of data flows. We’ll cover some of the free and open source tools available for this, and discuss “roll your own” options.

 

Do You Know the Data Flow?
The Data Flow task is one of the most powerful and most complex tools available in SSIS. Whether you are brand new to SSIS, or you’ve been using it for a while, it’s likely you’ve had some questions about the Data Flow. Why are some components so much slower than others? Why can’t I store a value (like a row count) in one component, and use it in another component later in the Data Flow? And why does it always seem to be the part of my package that fails when I run it against real data? Well, you’re not alone. During this session, we’ll answer these questions (and many others) by learning how the Data Flow operates internally. After attending this session, you’ll know a lot more about getting the most out of Data Flows in SSIS.

Posted in Events | Tagged | Comments Off

XML as Source in SSIS — Part 2

Now that our data is cleansed, onto loading the data into our table! The cleanest, most sane approach that I have found to parsing through XML within a SSIS script task is by utilizing Linq to XML. More information can be found on this here. In order to use Linq to XML within your script task, you need to add the following references:

using System.Xml;
using System.Linq;
using System.Xml.Linq;

The flow of the package basically as follows: For each file in the input folder, read the contents into a StreamReader, pick out the pieces of XML that we care about, and repeat.

The XML files that I’m working with have three levels. For illustrations’ sake, we will call them Grandfather, Father, and Child. The Grandfather (Route) has multiple children (Trips), who then have multiple children (Stop(s)).

Assign Values to Variables

Now that we have added the necessary references to our script task, the next task is to code the task.

foreach (var routes in (from route in StreamReader(outfilename, "ExtRoute")
                   select new
                   {
                       //Route-level attributes
                       RouteKey = (string)route.Attributes("RouteKey").FirstOrDefault(),
                       ActualStartTime = (string)route.Attributes("start").FirstOrDefault(),
                       ActualStopTime = (string)route.Attributes("end").FirstOrDefault(),
                       PlannedStartTime = (string)route.Attributes("DepartDate").FirstOrDefault(),
                       PlannedRunTime = (string)route.Attributes("PlannedElapsedTime").FirstOrDefault(),
                       ActualDriverName = (string)route.Attributes("driver").FirstOrDefault(),
                       ActualDriverID = (string)route.Attributes("driverid").FirstOrDefault(),
                       ActualTruckID = (string)route.Attributes("truck").FirstOrDefault(),
                       RouteUnPlanned = (string)route.Attributes("unplanned").FirstOrDefault(),
                       RoutingDate = (string)route.Attributes("RoutingDate").FirstOrDefault(),
                       Route = (string)route.Attributes("id").FirstOrDefault(),
                       Branch = (string)route.Attributes("group").FirstOrDefault(),
                       Trips = (from trip in route.Elements()
                        select new
                        {
                           //Trip-level attributes
                           TripKey = (string)trip.Attributes("TripKey").FirstOrDefault(),

                           //Stop-level attributes
                           Stops = (from stop in trip.Elements()
                            select new
                            {
                                LocationType = (string)stop.Attributes("LocationType").FirstOrDefault(),
                                Longitude = (string)stop.Attributes("Longitude").FirstOrDefault(),
                                Latitude = (string)stop.Attributes("Latitude").FirstOrDefault(),
                                CustomerName = (string)stop.Attributes("locationName").FirstOrDefault(),
                                CustomerNumber = (string)stop.Attributes("LocationRefNumber").FirstOrDefault(),
                                ActualSequence = (string)stop.Attributes("actualSequence").FirstOrDefault(),
                                GeoStop = (string)stop.Attributes("geoStop").FirstOrDefault(),
                                ArrivalTime = (string)stop.Attributes("arrivalTime").FirstOrDefault(),
                                DepartureTime = (string)stop.Attributes("departureTime").FirstOrDefault(),
                                StopDuration = (string)stop.Attributes("stopDuration").FirstOrDefault(),
                                ActualDistance = (string)stop.Attributes("ActualDistance").FirstOrDefault(),
                                DeltaDistance = (string)stop.Attributes("DeltaDistance").FirstOrDefault(),
                                DeltaTimeDriven = (string)stop.Attributes("DeltaTimeDriven").FirstOrDefault(),
                                WorkTime = (string)stop.Attributes("WorkTime").FirstOrDefault()
                            })
                       })
                   }
                 ))

The code above uses the IEnumerable C# function to quickly and efficiently query only the XML nodes that we care about. The following code is used for this purpose:

{
                    foreach (var trip in routes.Trips)
                        foreach (var stop in trip.Stops)
                        {
                            DefaultBuffer.AddRow();
                            DefaultBuffer.TripKey = trip.TripKey;
                            DefaultBuffer.RouteKey = routes.RouteKey;
                            DefaultBuffer.CustomerName = stop.CustomerName;
                            DefaultBuffer.CustomerNumber = stop.CustomerNumber;
                           //etc...

                }

Be sure to notice the beginning and ending curly brackets. Don’t forget them — they’re important!!

Conclusion

I hope that these two articles have given you some insight into a quick and efficient way to import XML data into a SQL database. Any thoughts, suggestions, or criticisms are welcomed!

Posted in Uncategorized | Tagged , | Comments Off

XML as Source in SSIS

XML is becoming increasingly popular in the workplace. Many systems are using it as a way to transmit data efficiently across the network. While XML certainly has its place, consuming it and deconstructing it can be a real headache.

Unfortunately, the native XML Source components in SSIS are lousy for such a purpose as this. Before I get into the details, I’ll provide a high-level overview.

I was provided roughly 1,000 XML files that I had to import into a SQL database. Sounded easy enough. However, the XML files had no in-line schema, and I was not provided a stand-alone schema so I had to derive one, again using the XML Source components. This would seem fine; however, the schema makes a “best-guess” with relation to the XML file that is presented. It guesses on the datatypes; so in one file, you could have an attribute such as filevalue=”1″ and the schema parsing says that that attribute’s datatype is a Bit. However, the next XML file has filevalue=”2432″. In this case, the import fails because 2432 is obviously not a Bit.

Removing Unwanted Data

Any and all users will tell you “My data is clean!”. But reality often proves different. Eventually, as the project morphed, these individual XML files were rolled into one BIG XML file, formatting, metadata and all. As such, I needed a way to strip all of the superfluous junk out of the file on the fly. Enter — C# StringExtension! I just add this as a script task inside my For Each Loop Container to strip out the junk from each file as it’s processed.

public static class StringExtension
    {
        private static Dictionary<string, string> _replacements = new Dictionary<string, string>();
        static StringExtension()
        {
            _replacements["<?xml version="1.0" encoding="ISO-8859-1"?>"] = "";
            _replacements["<parmExtRoute>"] = "";
            _replacements["</parmExtRoute>"] = "";
            _replacements["<ReportCacheID>"] = "";
            _replacements["</ReportCacheID>"] = "";
            // etc...

        }
        public static string clean(this string s)
        {
            foreach (string to_replace in _replacements.Keys)
            {
                s = s.Replace(to_replace, _replacements[to_replace]);
            }
            return s;
        }
    }
        public void Main()
        {
            string filename = Dts.Variables["_FileName"].Value.ToString();
            string outfilename = Dts.Variables["_FileName"].Value.ToString();
            try
            {
                StreamReader sr = new StreamReader(filename);

                string content = sr.ReadToEnd();
                content = content.clean();
                sr.Close();

                StreamWriter sw = new StreamWriter(outfilename);
                sw.WriteLine("<Root>");
                sw.WriteLine(content);
                sw.WriteLine("</Root>");

                sw.Close();
                Dts.TaskResult = (int)ScriptResults.Success;
            }
            catch (Exception e)
            {
                Console.WriteLine("{0} Exception caught.", e);
                Dts.TaskResult = (int)ScriptResults.Failure;
            }
        }
    }

All that needs to be modified is the key dictionary pairs, adding additional pairs for whatever needs to be parsed out. Then, simply call the string extension on any string variable that needs to be cleansed, as I’ve done above. I’ve also taken the liberty to surround the entire XML fragment with a singular <root></root> node, so that everything will be happy.

Now, onto importing the data into our table!

Posted in Uncategorized | Tagged , | Comments Off

Tuning SSAS Processing Performance

Thanks to all those that attended either the webcast of this that I did for the PASS BI virtual chapter, or the presentation at SQLSaturday #74 in Jacksonville this weekend. I really appreciate all the great feedback that I’ve received from the attendees at both events.

I’ve attached the slides from the presentation. I’ve also attached a few definition files, one for a Performance Monitor data collector set, and two trace templates, one for SSAS and one for SQL Server. Feel free to customize these files as necessary for your own tuning efforts. Also, while these trace templates can be used in Profiler, the best thing to do is to generate the scripts from them to run the trace directly on the server – lower overhead and less impact on performance.

The file is located on my SkyDrive.

Posted in SQL Saturday, SSAS | Tagged , | Comments Off

The Mentoring Experiment Applications

Andy Warren (blog | twitter) and Steve Jones (blog | twitter) have started another great initiative in the SQL Server community – The Mentoring Experiment. If you haven’t heard about it yet, I highly recommend reading up on it. The post is mainly to remind people that the deadline for applications is April 30th – so if you are interested and haven’t submitted one yet, hurry up!

Posted in Mentoring | Tagged | Comments Off

How to Generate Insert Scripts for Existing Data

Let’s say you have a bunch of data stored in a table and you need to generate an insert script for each record. Maybe you need to save the scripts in source control or maybe you need the scripts to initialize the same table in another environment. What would you do?

Traditional Methods

Several ideas come to mind, all painfully tedious:

  • You could dynamically create the insert statements with a stored procedure

If you do a quick search you’ll find several stored procedures that will generate insert scripts. This isn’t a bad way to go in terms of effort but I would definitely scrutinize the procedure before using it. And as you’ll see in a bit, there’s an even better approach.

  • You could dynamically create the insert statements with a SQL query

Following this approach your SQL query would look something like this:

SELECT 'INSERT INTO dbo.MyTable (MyCol1, MyCol2) VALUES (' + MyCol1 + ',' + MyCol2 + ')' AS InsertStatement FROM dbo.MyTable

If your table is very wide then the script can be very complex to write and difficult to read and maintain.

  • You could dynamically create the insert statements with Excel

This is a two-step approach where first you select all data from the table and then copy it into Excel.  You then add columns before and after each value so you can create an expression for the insert statement.  For example, cell A1 would look something like this:

=INSERT INTO dbo.MyTable (MyCol1, MyCol2) VALUES (

Using the fill handle you drag the expression down for each row and repeat for each of the other columns (remember, each value you insert needs to be separated by a comma and if you have character data then it must also be enclosed in single quotes). Once you copy your script out of Excel you will notice that the values in each column are separated by a tab. If you are inserting character data you will need to either update your expression and surround it with the RTRIM() and LTRIM() functions, or manually remove the tab yourself. This can get very ugly very quickly.

  • You could manually type the insert statements

When all else fails you can always start to write the script yourself. Depending on how many rows of data you have, and the width of the table, this would require a tremendous amount of effort,

  • You could convince someone else to generate the scripts for you

A-la Tom Sawyer and the fence, your mileage may vary with this approach.

A Better Way

Luckily, there is a far easier approach where SQL Server does all the work.  Unfortunately, if you’re running SQL Server 2005 or earlier you’re still limited to the approaches mentioned above – this well-hidden gem is only available in SQL Server 2008 and later.

Starting in SQL Server Management Studio, navigate to the database where your data lives in the Object Explorer.  Right click on the database name, go to Tasks, and select Generate Scripts.

The SQL Server Scripts Wizard will appear, click Next to begin.

Select the database where your data lives.  In this example I am choosing AdventureWorksDW2008.

In the Choose Script Options there are a number properties that you can tweak.  This is where the magic happens!  In order to generate insert scripts you need to set the “Script Data” property located in the “Table/View Options” group to True (by default it’s set to False). 

Select the type of objects you  want scripted.  Since we want to generate insert scripts the object type table nust be selected.

Select the tables for which you want to generate insert scripts.  Here I’ve chosen DimAccount.

Select your output option. 

Review your settings.  Notice that “Script Data” is set to True.

Let the wizard spin…

And voila!  An insert statement for each record (along with the table definition) has been generated!

So Now What?

I *highly* recommend that you check out and familiarize yourself with all of the options in the Check Options screen as it contains numerous ways to customize your scripts that could end up saving you a lot of time!

Posted in SQL Server, SSMS | Tagged , , | Comments Off