Archive for May 2011

Copy Data Dynamically with BimlScript

This post is part 3 of a series on using Biml in BIDS Helper. This post builds on some of the information and the sample from the previous posts.

BimlScript enables some interesting scenarios for generating large numbers of SSIS packages automatically. This can come in handy when you need to copy most or all of the data in one database to a different one. In this case, you could use something like the Transfer SQL Server Objects task, but it has a few problems. You can roll your own, but that might mean a fair amount of custom scripting. Or you could use the Import / Export Wizard. But in all these cases, you don’t have complete control of how the packages are produced. You could create all the packages by hand, which does give you full control, but then you are stuck doing a lot of repetitive work in SSIS.

BimlScript provides an alternative that lets you fully control the output, while automating the rote work of producing lots of packages that use the same pattern. Let’s take a look at a sample of this, using the scenario above (copying the data from one database to another).

<#@ template language="C#" hostspecific="true"#>
<#@ import namespace="System.Data" #>

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
      <Connections>
            <OleDbConnection Name="Source" ConnectionString="Provider=SQLNCLI10;Server=.;Initial Catalog=AdventureWorksDW2008R2;Integrated Security=SSPI;"/>
            <OleDbConnection Name="Target" ConnectionString="Provider=SQLNCLI10;Server=.;Initial Catalog=Target;Integrated Security=SSPI;"/>
      </Connections>
      <Packages>
            <#
                string metadataConnectionString = "Provider=SQLNCLI10;Server=.;Initial Catalog=AdventureWorksDW2008R2;Integrated Security=SSPI;";
                DataTable tables = ExternalDataAccess.GetDataTable(metadataConnectionString,
                    "SELECT '[' + s.name + '].[' + t.name + ']' FROM sys.tables t INNER JOIN sys.schemas s on t.schema_id = s.schema_id");
                foreach (DataRow row in tables.Rows)
                { #>
            <Package Name="Extract <#=row[0]#>" ConstraintMode="Linear" AutoCreateConfigurationsType="None">
                  <Tasks>
                        <Dataflow Name="Copy Data">
                              <Transformations>
                                    <OleDbSource Name="Retrieve Data" ConnectionName="Source">
                                          <DirectInput>SELECT * FROM <#=row[0]#></DirectInput>
                                    </OleDbSource>
                                    <OleDbDestination Name="Insert Data" ConnectionName="Target">
                                          <ExternalTableOutput Table="<#=row[0]#>"/>
                                    </OleDbDestination>
                              </Transformations>
                        </Dataflow>
                  </Tasks>
            </Package>
                <# } #>
      </Packages>
</Biml>

This script is set up to copy all the data in the AdventureWorksDW2008R2 database to a second database named Target (very inventive, I know). One note – the script is not creating the tables in the target database. We could actually automate that portion as well, but it’s beyond the scope of this post. To ensure you are set up properly to run this script, you should create an exact structural copy of your source database under a different name. You can use the Generate Scripts Wizard to do this. Just script the entire database, and then update the generated script to use a different database name (don’t forget to change the USE statement to the new name).

The script will produce a package per table, with a simple data flow that copies all the data using an OLE DB Source and OLE DB Destination. The script leverages the metadata already contained in the database, in the sys.tables view, to drive the loop that creates the packages.

What if you don’t want to select all the rows from each table? Instead, perhaps you want to specify a WHERE clause to use to filter some of the tables. To handle this, we can create a table in the target database that holds our WHERE information.

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
    <Connections>
        <OleDbConnection Name="Target" ConnectionString="Provider=SQLNCLI10;Server=.;Initial Catalog=Target;Integrated Security=SSPI;"/>
    </Connections>
    <Tables>
        <Table Name="WhereClause" ConnectionName="Target">
            <Columns>
                <Column Name="TableName" DataType="String" Length="255"/>
                <Column Name="WhereSql" DataType="String" Length="4000"/>
            </Columns>
        </Table>
    </Tables>
</Biml>

You can use the steps shown in Part 2 of this series to create this table in the Target database. Once it’s been created, populate it with some data. Note that since we are using the schema-qualified name of the table, you’ll need to specify that in the table. There’s an example of data for this table that will work with AdventureWorksDW2008R2 below. This will filter the rows down to only sales where the amount is greater than 1000.

TableName SelectSql
[dbo].[FactInternetSales]
WHERE [SalesAmount] >= 1000
[dbo].[FactResellerSales]
WHERE [SalesAmount] >= 1000

Now we need to alter the script to use the new information in this table. At the beginning of the block of script after the <Packages> element, add the following code:

string targetConnectionString = "Provider=SQLNCLI10;Server=.;Initial Catalog=Target;Integrated Security=SSPI;";
DataTable whereClauses = ExternalDataAccess.GetDataTable(targetConnectionString, "SELECT TableName, WhereSql FROM WhereClause");

This retrieves the WHERE clauses from the WhereClause table, and stores them in the whereClauses variable.

Next, replace the <Direct Input> line in the OleDbSource with this:

<#
  var dataRow = whereClauses.Select(string.Format("TableName = '{0}'", row[0]));
  string whereSql = dataRow.Length == 0 ? string.Empty : dataRow[0][1].ToString();
  string sql = string.Format("SELECT * FROM {0} {1}", row[0], whereSql);
#>
<DirectInput><#=sql#></DirectInput>

This code determines whether the whereClauses table has a row for the current table. If it does, it appends it to the end of the SELECT statement. The complete, final script looks like this:

<#@ template language="C#" hostspecific="true"#>
<#@ import namespace="System.Data" #>

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
      <Connections>
            <OleDbConnection Name="Source" ConnectionString="Provider=SQLNCLI10;Server=.;Initial Catalog=AdventureWorksDW2008R2;Integrated Security=SSPI;"/>
            <OleDbConnection Name="Target" ConnectionString="Provider=SQLNCLI10;Server=.;Initial Catalog=Target;Integrated Security=SSPI;"/>
      </Connections>
      <Packages>
            <#
                string targetConnectionString = "Provider=SQLNCLI10;Server=.;Initial Catalog=Target;Integrated Security=SSPI;";
                DataTable whereClauses = ExternalDataAccess.GetDataTable(targetConnectionString, "SELECT TableName, WhereSql FROM WhereClause");

                string metadataConnectionString = "Provider=SQLNCLI10;Server=.;Initial Catalog=AdventureWorksDW2008R2;Integrated Security=SSPI;";
                DataTable tables = ExternalDataAccess.GetDataTable(metadataConnectionString,
                    "SELECT '[' + s.name + '].[' + t.name + ']' FROM sys.tables t INNER JOIN sys.schemas s on t.schema_id = s.schema_id");
                foreach (DataRow row in tables.Rows)
                { #>
            <Package Name="Extract <#=row[0]#>" ConstraintMode="Linear" AutoCreateConfigurationsType="None">
                  <Tasks>
                        <Dataflow Name="Copy Data">
                              <Transformations>
                                    <OleDbSource Name="Retrieve Data" ConnectionName="Source">
                                        <#
                                            var dataRow = whereClauses.Select(string.Format("TableName = '{0}'", row[0]));
                                            string whereSql = dataRow.Length == 0 ? string.Empty : dataRow[0][1].ToString();
                                            string sql = string.Format("SELECT * FROM {0} {1}", row[0], whereSql);
                                        #>
                                          <DirectInput><#=sql#></DirectInput>
                                    </OleDbSource>
                                    <OleDbDestination Name="Insert Data" ConnectionName="Target">
                                          <ExternalTableOutput Table="<#=row[0]#>"/>
                                    </OleDbDestination>
                              </Transformations>
                        </Dataflow>
                  </Tasks>
            </Package>
                <# } #>
      </Packages>
</Biml>

You can see the results of this script by right-clicking on the Biml file, and choosing Expand. It may take a minute or two to process, but when it finishes, you should see a package for each table in your source database. The data flows will copy the data from Source to Target, and any WHERE clauses you add to the WhereClause table will be used.

There’s a lot more that could be done with this script (automating the recreation of the tables in the destination, or deleting existing data, for example), but it’s still a good example of what BimlScript can do. Instead of spending your time writing 10s or 100s of repetitive packages, automate it with BimlScript.

Creating Tables using Biml and BimlScript

This post is part 2 of a series on using Biml in BIDS Helper. This post builds on some of the information and the sample from the previous posts.

When I’m creating samples for SSIS, I often find it necessary to create supporting tables to go along with the package sample. One of the things I like about Biml is that you can define both your tables and packages in the language. Here’s an example of defining an OrderHeader and OrderDetail table in Biml:

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
    <Connections>
        <OleDbConnection Name="DbConnection" ConnectionString="Server=.;Initial Catalog=Sandbox;Provider=SQLNCLI10.1;Integrated Security=SSPI;"/>
    </Connections>
    <Tables>
        <Table Name="OrderHeader" ConnectionName="DbConnection">
            <Columns>
                <Column Name="OrderId" DataType="Int32" IdentityIncrement="1" IdentitySeed="1"/>
                <Column Name="SalesDate" DataType="DateTime"/>
                <Column Name="CustomerName" DataType="String" Length="50"/>
            </Columns>
            <Keys>
                <PrimaryKey Name="OrderHeaderPK">
                    <Columns>
                        <Column ColumnName="OrderId"/>
                    </Columns>
                </PrimaryKey>
            </Keys>
        </Table>
        <Table Name="OrderDetail" ConnectionName="DbConnection">
            <Columns>
                <Column Name="OrderDetailId" DataType="Int32" IdentityIncrement="1" IdentitySeed="1"/>
                <TableReference Name="OrderId" TableName="OrderHeader"/>
                <Column Name="ProductName" DataType="String" Length="50"/>
                <Column Name="Qty" DataType="Int16"/>
                <Column Name="UnitPrice" DataType="Currency"/>
            </Columns>
            <Keys>
                <PrimaryKey Name="OrderDetailPK">
                    <Columns>
                        <Column ColumnName="OrderDetailId"/>
                    </Columns>
                </PrimaryKey>
            </Keys>
        </Table>
    </Tables>
</Biml>

Tables are defined in a <Table> tag. They can have columns defined, as well as keys, and even indexes (not shown in the example above). Notice that the OrderId column doesn’t have a DataType attribute. Many of the attributes in Biml have default values, and data type is one of them. If it’s not specified, the column data type will default to Int32. The primary key for the table is defined with a <PrimaryKey> element.

The OrderDetail table includes a <TableReference> column. TableReference columns are a special class of columns, that define that this column should have a foreign key reference to another table. This one is referencing back to the OrderHeader table. It’s not shown, but you can also use a MultipleColumnTableReference, if your foreign key needs to span multiple columns.

Great – now you have your tables defined in Biml, but how do you make use of that? If only there were some way to run this against your database to create the tables… Well, fortunately, there is – by using BimlScript. BimlScript is a scripting layer that automates the production of Biml (similar in concept to the way ASP.NET produces HTML). To set this up, you need to add two Biml files to your project – one to hold the table definitions above, and one to hold the BimlScript.

First, add a new Biml file to the SSIS project (see Part 1 if you need a refresher on this). Copy the Biml above to this file, and rename the file to TableDefinitions.biml.

image

Second, add an additional Biml file. Name this one CreateTables.biml.

image

Open the CreateTables.biml file, and replace the contents with the following code:

<#@ template language="C#" hostspecific="True" #>

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
    <Packages>
        <Package Name="Create Tables" AutoCreateConfigurationsType="None" ConstraintMode="Linear">
            <Tasks>
                <# foreach(var table in RootNode.Tables) {#>
                <ExecuteSQL Name="Create <#=table.Name#>" ConnectionName="<#=table.Connection.Name#>">
                    <DirectInput>
                        <#=table.GetTableSql()#>
                    </DirectInput>
                </ExecuteSQL>
                <# } #>
                <# foreach(var table in RootNode.Dimensions) {#>
                <ExecuteSQL Name="Create <#=table.Name#>" ConnectionName="<#=table.Connection.Name#>">
                    <DirectInput>
                        <#=table.GetTableSql()#>
                    </DirectInput>
                </ExecuteSQL>
                <# } #>
                <# foreach(var table in RootNode.Facts) {#>
                <ExecuteSQL Name="Create <#=table.Name#>" ConnectionName="<#=table.Connection.Name#>">
                    <DirectInput>
                        <#=table.GetTableSql()#>
                    </DirectInput>
                </ExecuteSQL>
                <# } #>
            </Tasks>
        </Package>
    </Packages>
</Biml>

This file has a header at the beginning that indicates the script will use C#. The next section defines a package named “Create Tables”. The section inside the Tasks element is the interesting part:

<# foreach(var table in RootNode.Tables) {#>
<ExecuteSQL Name="Create <#=table.Name#>" ConnectionName="<#=table.Connection.Name#>">
    <DirectInput>
        <#=table.GetTableSql()#>
    </DirectInput>
</ExecuteSQL>
<# } #>

This code iterates over the tables that are part of the current model. For each table it finds, it creates an ExecuteSQL task, and embeds the SQL to create the table in the package. The code is repeated to iterate over Dimensions and Facts, which are special classes of tables.

Notice that there are no tables defined in the BimlScript file. The BimlScript can’t operate against objects defined in the same file, which is why we created the TableDefinitions.biml file separately. To produce the package, multi-select both TableDefinitions.biml, and CreateTables.biml, right-click, and choose Expand Biml File.

image

This will produce a new SSIS package in the project named Create Tables.dtsx. It contains two Execute SQL tasks, one for each table.

image

Each task includes the appropriate SQL to create the tables. As an example, here’s the OrderHeader SQL from the Execute SQL task.

SET ANSI_NULLS ON
SET QUOTED_IDENTIFIER ON
GO

-------------------------------------------------------------------
IF EXISTS (SELECT * from sys.objects WHERE object_id = OBJECT_ID(N'[OrderHeader]') AND type IN (N'U'))
DROP TABLE [OrderHeader]
GO

CREATE TABLE [OrderHeader]
(
-- Columns Definition
 [OrderId] int IDENTITY(1,1) NOT NULL
, [SalesDate] datetime NOT NULL
, [CustomerName] nvarchar(50) NOT NULL

-- Constraints
,CONSTRAINT [OrderHeaderPK] PRIMARY KEY CLUSTERED
(
  [OrderId] Asc) WITH(PAD_INDEX = OFF,IGNORE_DUP_KEY = OFF) ON [PRIMARY]

)
ON [PRIMARY]
WITH (DATA_COMPRESSION = NONE)
GO

-------------------------------------------------------------------

Note that the tables are ordered in the package in the same order they are defined in the Biml file. If you have tables with dependencies, make sure to order them correctly.

In the next post, we’ll look at some ways to copy data dynamically using BimlScript.

Submissions for PASS Summit 2011

I’ve presented at the PASS Summit several times over the past years, and I’ve really enjoyed it every time. The Summit is great event for learning from some of the best in the SQL Server community, and you get a chance to spend time with them at the various events going on at the Summit. This year, the community’s getting the opportunity to vote for their preferred sessions, which I think is a great way to get people involved in the selection process. The deadline is May 20th according to the webpage (though another page says the 30th – probably better to vote early than to miss the deadline).

This year, I’ve submitted several sessions.

Do More (ETL) with Less (Effort) – Automating SSIS

SSIS is a great tool for transferring data from one data source to another, and for implementing complex ETL processes. However, for simple, straightforward data transfer tasks or packages that adhere to a pattern, creating SSIS packages by hand can be time-consuming and repetitious. By attending this session, you’ll learn how to automate package creation in SSIS, including the dynamic generation of data flows. We’ll cover some of the free and open source tools available for this, and discuss “roll your own” options.

Do You Know the Data Flow?

The Data Flow task is one of the most powerful and most complex tools available in SSIS. Whether you are brand new to SSIS, or you’ve been using it for a while, it’s likely you’ve had some questions about the Data Flow. Why are some components so much slower than others? Why can’t I store a value (like a row count) in one component, and use it in another component later in the Data Flow? And why does it always seem to be the part of my package that fails when I run it against real data? Well, you’re not alone. During this session, we’ll answer these questions (and many others) by learning how the Data Flow operates internally. We’ll cover the Data Flow from the basic (what’s a component?) to the advanced (how can I determine how many threads my Data Flow is using?). After attending this session, you’ll know a lot more about getting the most out of Data Flows in SSIS.

Handling Advanced Data Warehouse Scenarios in SSIS

So you’ve used SSIS to populate a simple star schema data mart, and everybody’s happy. But now you have new requirements that require more advanced data warehouse approaches, like late arriving dimensions, bridge tables, parent child dimensions, and Type 3 or Type 6 slowly changing dimensions (SCD). How do you handle those in a scalable, efficient way in SSIS? This session will present some common patterns for handling these scenarios. You’ll learn when to use each advanced approach and the pros and cons associated with each pattern. You will learn how to implement these patterns in SSIS, and how to tune them for high performance.

Tuning Analysis Services Processing Performance

You’ve got your Analysis Services cube created, and deployed in production. However, you notice that every night, the cube is taking longer and longer to process, and users are starting to complain about their data not being ready when they arrive in the morning. If you’ve found yourself in this situation, or want to avoid being in it in the first place, come to this session. We’ll cover how to benchmark processing performance, track down bottlenecks, and how to tune things to get the best performance for processing your cube.

Other Sessions I’d Like to See At the Summit

This is not an exhaustive list, by any means – so if you aren’t on the list, please don’t take it personally. These are the ones that happened to catch my eye as I looked through the list. Some because they had an interesting technical focus, some because I know the speaker will bring an interesting perspective on the topic, and some because…, well, just because I can.

Dynamic Business Rules Processing Using SSIS – Tim Mitchell

Techniques for Automating T-SQL Unit Tests and User Acceptance Testing – Scott Currie

Advanced Analysis Services Development – David Darden

Advanced SSAS Security – Chris Webb

Analysis Services Power Tools – Darren Gosbell

DAX Deep Dive – Marco Russo, Alberto Ferrari

Biml Functionality in BIDS Helper

If you follow along with the BIDS Helper project on CodePlex, you may have noticed that we released a new beta version on April 20th. The new build includes some bug fixes, and two new features.

Duplicate Role (SSAS)

The Duplicate Role feature allows you to create a  copy of an existing SSAS role, along with all of it’s settings and permissions. Performing a regular copy and paste on a Role in SSAS will only copy the role membership, not all the permissions. The new feature will copy the role itself, and also copy the permissions associated with the original role. I’ve found this pretty useful over the past few months.

Biml Package Generator (SSIS)

The Biml Package Generator feature is one I’m pretty excited about, as it leverages some of the things I’ve been working on at Varigence over the past year. This feature enables you to use BI Markup Language (Biml) to generate SSIS packages. This is nice for a couple of reasons. First, Biml is a XML based language with a readable syntax, and for a lot of packages, I find it easier to just type the Biml to create the package, rather than using the designers. Second, the Biml compiler includes the ability to script the generation of Biml code, in a similar way to how you can use ASP.NET to script the production of HTML code. Using that, I can easily generate large numbers of packages very easily.

I’m starting a series of posts that will illustrate how to use the Biml functionality in BIDS Helper. This post will serve as an index page to locate these, and they will also be posted to the BIDS Helper documentation and the  Varigence blog.

Part 1: Creating a Basic Package Using Biml
Part 2: Creating Tables using Biml and BimlScript
Part 3: Copy Data Dynamically with BimlScript
Part 4: Controlling the Control Flow with Biml
Part 5: Defining the Data Flow in Biml


As always, if you have questions, let me know in the comments.

Creating a Basic Package Using Biml

This article is going to walk through the process of creating a simple package using Biml and the Biml Package Generator feature in BIDS Helper. To start out, you need to install the latest beta of BIDS Helper from CodePlex. Once that is set up, you should create a new Integration Services project in BIDS. In the project, right-click on the Project in the Solution Explorer. There’s a new item in this menu – Add New Biml File.

image

Clicking Add New Biml File will add a new file to the Miscellaneous folder in the solution named BimlScript.biml. (The name is automatically generated, so it may be BimlScript1.biml, etc). You can right-click on the file and choose rename to give the file a more specific name. For this example, rename the file “BasicPackage.biml”.

Double-clicking on the file will open the XML editor inside of BIDS. The editor supports Intellisense for Biml, so typing an opening tag (“<”) will give you a list of valid options for tags you can use. (If you aren’t seeing the Intellisense, please check this link for troubleshooting steps.)

image

For this example, copy and paste the following Biml into the document. Since the code below includes the document root tags (<Biml>), you’ll want to make sure you replace the entire contents of the Biml file.

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
    <Connections>
        <Connection Name="AdventureWorks" ConnectionString="Server=.;Initial Catalog=AdventureWorks;Integrated Security=SSPI;Provider=SQLNCLI10"/>
    </Connections>
    <Packages>
        <Package Name="Biml Sample" AutoCreateConfigurationsType="None" ConstraintMode="Linear">
            <Tasks>
                <Dataflow Name="Extract Table List">
                    <Transformations>
                        <OleDbSource Name="Get Table List" ConnectionName="AdventureWorks">
                            <DirectInput>SELECT * FROM sys.tables</DirectInput>
                        </OleDbSource>
                        <Multicast Name="Multicast"/>
                    </Transformations>
                </Dataflow>
            </Tasks>
        </Package>
    </Packages>
</Biml>

The first section (<Connections>) of this Biml defines an OleDbConnection that points to the AdventureWorks database. The next section (inside the <Packages> tag) defines a single package that contains a Dataflow task (the <Dataflow> tag). The Dataflow task contains two components, an OleDb Source and an Union All transformation.

The next step is to take this definition of a package, and actually generate the package from it. To do this, right-click on the Biml file, and choose Expand Biml File from the context menu.

image

A new package will be added to the SSIS Packages folder, named Biml Sample.dtsx. If you review the generated package, you’ll see that it matches up to what was defined in the Biml code.

imageimage

That’s a quick introduction to the Biml functionality in BIDS Helper. In the next article, we’ll set the stage for some more advanced (read: more interesting) uses of Biml, including some scripting.

Presenting at Carolina Code Camp 2011

I’m going to be presenting a couple of sessions at the Carolina Code Camp this weekend. The Carolina Code Camp is always a good time – well organized, educational, and fun. I’m looking forward to it.

Do More (ETL) with Less (Effort) – Automating SSIS
SSIS is a great tool for transferring data from one data source to another, and for implementing complex ETL processes. However, for simple, straightforward data transfer tasks or packages that adhere to a pattern, creating SSIS packages by hand can be time-consuming and repetitious. By attending this session, you’ll learn how to automate package creation in SSIS, including the dynamic generation of data flows. We’ll cover some of the free and open source tools available for this, and discuss “roll your own” options.

 

Do You Know the Data Flow?
The Data Flow task is one of the most powerful and most complex tools available in SSIS. Whether you are brand new to SSIS, or you’ve been using it for a while, it’s likely you’ve had some questions about the Data Flow. Why are some components so much slower than others? Why can’t I store a value (like a row count) in one component, and use it in another component later in the Data Flow? And why does it always seem to be the part of my package that fails when I run it against real data? Well, you’re not alone. During this session, we’ll answer these questions (and many others) by learning how the Data Flow operates internally. After attending this session, you’ll know a lot more about getting the most out of Data Flows in SSIS.

Tuning SSAS Processing Performance

Thanks to all those that attended either the webcast of this that I did for the PASS BI virtual chapter, or the presentation at SQLSaturday #74 in Jacksonville this weekend. I really appreciate all the great feedback that I’ve received from the attendees at both events.

I’ve attached the slides from the presentation. I’ve also attached a few definition files, one for a Performance Monitor data collector set, and two trace templates, one for SSAS and one for SQL Server. Feel free to customize these files as necessary for your own tuning efforts. Also, while these trace templates can be used in Profiler, the best thing to do is to generate the scripts from them to run the trace directly on the server – lower overhead and less impact on performance.

The file is located on my SkyDrive.