Agile BI

Automatically Using the Latest Version of an SSRS Template

Posted on July 12, 2011 by ddarden

The ability to add Templates to SSRS (such as documented here and here and here) is a neat feature. Basically, you just create an RDL and put it in the “C:Program Files (x86)Microsoft Visual Studio 9.0Common7IDEPrivateAssembliesProjectItemsReportProject” directory… now you can create a new report starting from the template by selecting it from the “New Items” dialog. This is great when you don’t want to start from scratch on every report.

The only thing I’ve found to be a friction point is that every time someone updates the Template (which you’re storing somewhere in source control) you need to copy it over to that folder. That’s no big deal, but when you have a lot of developers that all have to do that frequently, it’s a pain.

The easy way to solve this is with symlinks. You can’t use a shortcut, but you can create a soft symlink to point back at your template in source control. That way you will always create a report off of the latest version in your repository. (Yeah, you still have to keep that up to date… no getting around that.)

[sourcecode language="bash" padlinenumbers="true" wraplines="true"]
mklink "C:Program Files (x86)Microsoft Visual Studio 9.0Common7IDEPrivateAssembliesProjectItemsReportProjectTemplatePortrait.rdl"  "C:SourceControlframeworksSsrsFrameworkReportingTemplatesTemplatePortrait.rdl"

mklink "C:Program Files (x86)Microsoft Visual Studio 9.0Common7IDEPrivateAssembliesProjectItemsReportProjectTemplateLandscape.rdl" "C:SourceControlframeworksSsrsFrameworkReportingTemplatesTemplateLandscape.rdl"
[/sourcecode]

That’s all there is to it!

Posted in Uncategorized | Tagged SSRS | Comments Off

Using SSRS queries with parameters with Netezza

Posted on July 5, 2011 by ddarden

Our company has started using SQL Server Reporting Services 2008 R2 to handle most of our reporting against Netezza. This works pretty well over all using the Netezza 6.0 OLE DB driver. There are a few things to be aware of though.

I found a number of people were using expressions to construct a SQL statement to send for the data set. This works well, but it’s a little bit of a pain to both write and maintain. Luckily, we can use parameters (with just a few things to be aware of). That means we can write SQL that looks like this:

[sourcecode language="sql" padlinenumbers="true"]
SELECT
     date_key
    ,date
    ,good_day
    ,day_of_week
FROM
    date_sample
WHERE
    1=1
    AND day_of_week IN (@day_of_week_multi)
    AND date IN (@date)
    AND date_key =  @date_key
    AND good_day = @good_day 
ORDER BY
    date_key
;
[/sourcecode]

You can use parameters in the normal way with Netezza, but there are a few details to be aware of. You just imbed the parameter in the query like

[sourcecode language="sql" padlinenumbers="true"]
WHERE cola = @param
[/sourcecode]

and create a corresponding parameter in your data sets. The OLE DB driver handle surrounding surrounding the parameter with quotes when required (strings and date times), and it automatically escapes any single quotes in the string.

Parameters are ordinal, so you need to create a parameter (in order) for each value you want to send. There parameters in your query will be replaced in order based on the parameters you specified in the dataset. No big deal, but something to be aware of. You can either use a either a normal parameter (such as ‘@myparam’) or a question mark (‘?’).

Hears the rub… the OLE DB driver also automatically escapes single quotes so you don’t have to worry about that. This is great… but there is a little bit of an issue when you want to use multi-value parameters. Basically, the normal method of using SQL like

[sourcecode language="sql"]
WHERE col_a IN (@multi_value_param)
[/sourcecode]

and an expression like this for a parameter

[sourcecode language="vb"]
=JOIN(@multi_param, "’,’”)
[/sourcecode]

doesn’t work because the single ticks will always be escaped (so you’ll see a value like “’val1’’,’’val2’’,’’val3’” instead of “’val1’,’val2’,’val3’” in the resulting SQL). No problem… we just need a different approach. We’ll use something in the WHERE clause like

[sourcecode language="sql"]
STRPOS(@multi_param, '|' || col_a|| '|') > 0
[/sourcecode]

with an expression like

[sourcecode language="vb"]
=”|” + JOIN(@multi_param, "|”) + “|”
[/sourcecode]

This will bracket each value in a special “end of column” character such as “|”, and thus look for an exact match. You may need to use a different character, or handle escaping, depending on your data. You can always use multiple characters if you’d like (such as “~~~” as a delimiter). This is a little hacky, but not too bad. Remember to add the delimiter to the start and end of the string as well so you can match those values as well.

I tested this on a 1.5 billion row table against a similar query that uses an IN clause. I found the two methods were essentially equal in performance, though that might degrade for an extremely large number of parameter values. These both provide exact matches against the value, and they provide equivalent results.

But… what about using a multi-select on a date? Unfortunately, that get’s slightly more complicated, but not too bad. All you need to do is write a little code such as

[sourcecode language="vb" padlinenumbers="true"]
    ''' <summary>
    ''' Convert an array of Dates into a string.
    ''' </summary>
    ''' <param name="inputArray">Array of dates to convert.</param>
    ''' <param name="delimiter">Delimiter to place between each item in the array.</param>
    ''' <param name="formatString">Format string for the date.</param>
    ''' <returns>String of formatted datetimes separated by a delimiter.</returns>
    ''' <remarks></remarks>
    Public Function JoinDate(ByVal inputArray() As Object, ByVal delimiter As String, ByVal formatString As String) As String
        Dim output As String = String.Empty

        Dim i As Integer
        For i = 0 To inputArray.Length - 1
            output += (CDate(inputArray(i))).ToString(formatString) + delimiter
        Next i

        ' Trim the trailing delimiter off the string
        If output.Length > 0 Then
            output = output.Substring(0, output.Length - delimiter.Length)
        End If

        Return output

    End Function
[/sourcecode]

And use an expression like

[sourcecode language="vb"]
="|" + Code.JoinDate(Parameters!date.Value, "|", "yyyy-MM-dd") + "|"
[/sourcecode]

And you’re good to go! This will convert a multi-select DateTime variable (which is an array of DateTime objects) into a string containing dates formatted as you request.

So for this example where we are constraining on a multi-value string parameter, a multi-value date parameter, plus a few other standard guys, we end up with

[sourcecode language="sql"]
SELECT
     date_key
    ,date
    ,good_day
    ,day_of_week
FROM
    date_sample
WHERE
    1=1
    AND strpos(@day_of_week_multi, '|' ||day_of_week || '|') > 0
    AND strpos(@date, '|' || date || '|') > 0  
    AND date_key =  @date_key
    AND good_day = @good_day 
ORDER BY
    date_key
;
[/sourcecode]

Here’s a sample that shows how to use integer, datetime, and string values (with some examples of multi-value parameters thrown in). Just use the SQL script to build and populate the test table, set the data source appropriately, and play around with it.

Posted in Uncategorized | Comments Off

Delivering a Pre-Con at SQLSaturday #89 (#sqlsat89)

Posted on June 27, 2011 by jwelch

I’ll be delivering an all-day deep dive into using SSIS for data warehouse ETL processes the day before SQLSaturday #89, on Friday, September 16th. We’ll be taking an in-depth tour of implementing data warehouse extract, transform, and load processes with SSIS, with plenty of demonstrations and sample code. If you’ve ever wondered about how to handle data errors during your ETL, how to handle updates to large fact tables, or how to load a dimension table that combines type 1, 2, and 3 attributes, then come to this pre-con. We’ll cover all of that, plus a lot more. We have a reduced rate on the pre-con until July 1st, so now’s a great time to register.

Pre-cons like this are some of the most cost effective training you can get – plenty of time to both cover a topic from end to end, and to dive into the real implementation details that are often missing from shorter presentations because of the time constraints. I hope to see you there!

Data Warehousing with SSIS Deep Dive

Want to learn more about implementing data warehouse ETL with SQL Server Integration Services? Attend this full day seminar, and we’ll cover using SSIS for data warehousing in-depth. You’ll learn everything you need to know to populate your data warehouse with data. We’ll cover how to develop a common framework for your packages, automate the creation of rote packages for staging data, implement common patterns for handling various types of dimensions and fact tables, and how to instrument your packages to identify and recover from failures when loading data. We’ll be using the AdventureWorks databases for the examples, so bring along a laptop configured with SQL Server 2005 or later, and the AdventureWorks sample databases installed. We’ll also cover how the upcoming Denali release of SQL Server affects what we discuss in this seminar.

Laying out a framework for your ETL
- Logging
- Restartability and Recoverability
- Auditing
Handling Dimensions
- SCD Type 1
- SCD Type 2
- Advanced Dimension Types
Handling Facts
- Transactional
- Periodic Snapshot
- Accumulating Snapshot
- Advanced Fact Patterns
Errors
- Handling Processing Errors
- Handling Data Errors
- Recovering from Errors
Best Practices for Managing Your ETL

Posted in SQL Saturday | Tagged Data Warehouse, Presentations, SSIS | Comments Off

Presenting at SQLSaturday #82 (#sqlsat82)

Posted on June 23, 2011 by jwelch

I’ll be presenting two sessions at SQLSaturday #82 this weekend in Indianapolis.

If you’ve seen my posts about using Biml to generate SSIS packages, and you’d like to learn more, then please check out “Do More (ETL) with Less (Effort) – Automating SSIS”. I’ll talk about the Biml support in BIDS Helper, as well as other approaches for creating SSIS packages without all the manual effort. In my other presentation, “Tuning Analysis Services Processing Performance”, we’ll look at some of the common performance problems people encounter with Analysis Services, and how to resolve them. We’ll also cover a process for doing the tuning.

I really enjoy presenting at SQLSaturday events, and I’m looking forward to this one. I hope to see you there.

Do More (ETL) with Less (Effort) – Automating SSIS

SSIS is a great tool for transferring data from one data source to another, and for implementing complex ETL processes. However, for simple, straightforward data transfer tasks or packages that adhere to a pattern, creating SSIS packages by hand can be time-consuming and repetitious. By attending this session, you’ll learn how to automate package creation in SSIS, including the dynamic generation of data flows. We’ll cover some of the free and open source tools available for this, and discuss “roll your own” options.

Tuning Analysis Services Processing Performance

You’ve got your Analysis Services cube created, and deployed in production. However, you notice that every night, the cube is taking longer and longer to process, and users are starting to complain about their data not being ready when they arrive in the morning. If you’ve found yourself in this situation, or want to avoid being in it in the first place, come to this session. We’ll cover how to benchmark processing performance, track down bottlenecks, and how to tune things to get the best performance for processing your cube.

Posted in SQL Saturday | Tagged Presentations, SQLSaturday | Comments Off

Defining the Data Flow in Biml

Posted on June 15, 2011 by jwelch

This post is part 5 of a series on using Biml in BIDS Helper. This post builds on some of the information and the sample from the previous posts.

In the previous post in the series, I talked about controlling the order of execution in the control flow. In this post, the focus will be on the dataflow, and controlling how the data in the pipeline flows from one component to the next. This post uses a new table as the target of the data flow, so you may want to review Part 2: Creating Tables using Biml and BimlScript to see how to create the table locally. The Biml below describes the table. You can create it in the database of your choice – I used a database named Target.

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
    <Connections>
        <OleDbConnection Name="Source" ConnectionString="Provider=SQLNCLI10;Server=.;Initial Catalog=AdventureWorksDW2008R2;Integrated Security=SSPI;"/>
        <OleDbConnection Name="Target" ConnectionString="Provider=SQLNCLI10;Server=.;Initial Catalog=Target;Integrated Security=SSPI;"/>
    </Connections>
    <Tables>
        <Table Name="DimAccount_Test" ConnectionName="Target">
            <Columns>
                <Column Name="AccountKey" />
                <Column Name="ParentAccountKey" IsNullable="true" />
                <Column Name="AccountCodeAlternateKey" IsNullable="true" />
                <Column Name="ParentAccountCodeAlternateKey" IsNullable="true" />
                <Column Name="AccountDescription" DataType="String" Length="50" IsNullable="true" />
                <Column Name="AccountType" DataType="String" Length="50" IsNullable="true" />
                <Column Name="Operator" DataType="String" Length="50" IsNullable="true" />
                <Column Name="CustomMembers" DataType="String" Length="300" IsNullable="true" />
                <Column Name="ValueType" DataType="String" Length="50" IsNullable="true" />
                <Column Name="CustomMemberOptions" DataType="String" Length="200" IsNullable="true" />
            </Columns>
        </Table>
    </Tables>
</Biml>

With the table created, we can move on to the interesting part – transforming the data. In a simple, straightforward data flow, the Biml compiler will do most of the work for you. Take this data flow as an example:

<Dataflow Name="Dataflow 1">
    <Transformations>
        <OleDbSource Name="Source" ConnectionName="Source">
            <DirectInput>SELECT * FROM dbo.DimAccount</DirectInput>
        </OleDbSource>
        <OleDbDestination Name="Target" ConnectionName="Target">
            <ExternalTableOutput Table="dbo.DimAccount_Test"/>
        </OleDbDestination>
    </Transformations>
</Dataflow>

In this case, you don’t have to specify any data paths. The Biml compiler will infer that the OleDbSource’s output should be connected to the input of the OleDbDestination.

The compiler is able to do this by using default outputs. In Biml, most components have a default output defined. In the absence of other information, the compiler will automatically connect the default output of a transformation to the input of the next component defined in the Biml. So, if we use a slightly more complex data flow, like this:

<Dataflow Name="Dataflow 2">
    <Transformations>
        <OleDbSource Name="Source" ConnectionName="Source">
            <DirectInput>SELECT * FROM dbo.DimAccount</DirectInput>
        </OleDbSource>
        <Lookup Name="Check For Existing" OleDbConnectionName="Target" NoMatchBehavior="RedirectRowsToNoMatchOutput">
            <DirectInput>SELECT AccountKey FROM dbo.DimAccount</DirectInput>
            <Inputs>
                <Column SourceColumn="AccountKey" TargetColumn="AccountKey"/>
            </Inputs>
        </Lookup>
        <ConditionalSplit Name="Test ID Range">
            <OutputPaths>
                <OutputPath Name="High ID">
                    <Expression>AccountKey >= 100</Expression>
                </OutputPath>
            </OutputPaths>
        </ConditionalSplit>
        <OleDbDestination Name="Target" ConnectionName="Target">
            <ExternalTableOutput Table="dbo.DimAccount_Test"/>
        </OleDbDestination>
    </Transformations>
</Dataflow>

We end up with a data flow that still automatically connects data paths between components. In this case, though, it’s probably not doing exactly what we want, since it’s just connecting the default outputs. The Conditional Split (“Test ID Range”) in this example is connected by the default output, but we want to use the”High ID” output to filter out IDs less than 100. In the case of the Lookup (“Check For Existing”), the default output being used is the “Match” output, but we only want the non-matched records, so that only new rows are inserted.

I explicitly choose the option BIDS to display the path Source Names for this screenshot – by default, they aren’t displayed in the generated package. You can change the setting in BIDS by selecting the path, opening the Properties tool window, and changing the PathAnnotation property to SourceName.

So how would we change the Biml to get the desired results? If we add an InputPath element to the appropriate components, we can control which output is tied to the component’s input. In this case, we need to add explicit InputPath instructions to the Conditional Split (that will reference the Lookup’s NoMatch output) and to the OleDbDestination (which will reference the ConditionalSplit’s High ID output).

<Dataflow Name="Dataflow 3">
    <Transformations>
        <OleDbSource Name="Source" ConnectionName="Source">
            <DirectInput>SELECT * FROM dbo.DimAccount</DirectInput>
        </OleDbSource>
        <Lookup Name="Check For Existing" OleDbConnectionName="Target" NoMatchBehavior="RedirectRowsToNoMatchOutput">
            <DirectInput>SELECT AccountKey FROM dbo.DimAccount</DirectInput>
            <Inputs>
                <Column SourceColumn="AccountKey" TargetColumn="AccountKey"/>
            </Inputs>
        </Lookup>
        <ConditionalSplit Name="Test ID Range">
            <InputPath OutputPathName="Check For Existing.NoMatch"/>
            <OutputPaths>
                <OutputPath Name="High ID">
                    <Expression>AccountKey >= 100</Expression>
                </OutputPath>
            </OutputPaths>
        </ConditionalSplit>
        <OleDbDestination Name="Target" ConnectionName="Target">
            <InputPath OutputPathName="Test ID Range.High ID"/>
            <ExternalTableOutput Table="dbo.DimAccount_Test"/>
        </OleDbDestination>
    </Transformations>
</Dataflow>

This gives you the following data flow.

That’s a few examples of controlling the data paths in a data flow. There are a few other bits of information that are important to know about data paths in the data flow.

Most components have a default output named “Output”, and a second output named “Error” for the error output (if the component supports errors).
The Multicast component has no default output, so you always need to explicitly define the data path mapping from it to the next component.
The Union All, Merge, and Merge Join components need to be explicitly mapped, since they support multiple inputs.
The Slowly Changing Dimension (SCD) transformation has multiple outputs. The “New” output is the default. There are also outputs named “Unchanged”, “FixedAttribute”, “ChangingAttribute”, “HistoricalAttribute”, and “InferredMember”.
The Percentage Sampling and Row Sampling transformations have two output named “Selected” (the default) and “Unselected”.

The sample Biml for this post is on my SkyDrive. Please download it and try it out with the latest release of BIDS Helper.

Posted in SSIS | Tagged BIDSHelper, Biml, SSIS | Comments Off

Controlling the Control Flow in Biml

Posted on June 13, 2011 by jwelch

This post is part 4 of a series on using Biml in BIDS Helper. This post builds on some of the information and the sample from the previous posts.

So far, we’ve looked at some relatively simple packages, in terms of their flow. In this post, we’re going to look at how to handle more complex control flow in Biml.

One feature of Biml is the ConstraintMode property that’s part of packages and containers. This property controls how precedence constraints are generated in the control flow. In the simple case, if you want all tasks to be run in sequence, you can set the ConstraintMode to Linear. This causes the package to be produced with all tasks connected sequentially by Success precedence constraints, in the order they were specified in the Biml. So, the following Biml:

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
    <Packages>
        <Package Name="Control Flow Sample 1" AutoCreateConfigurationsType="None" ConstraintMode="Linear">
            <Tasks>
                <Dataflow Name="Task 1"/>
                <Dataflow Name="Task 2"/>
                <Dataflow Name="Task 3"/>
            </Tasks>
        </Package>
    </Packages>
</Biml>

results in a package that looks like this:

This type of linear flow is great in some situations, but sometimes you need more control. In those cases, you can change the ConstraintMode to Parallel. The generated package will not have any automatically created precedence constraints, so it will look like this:

Once a container is in Parallel constraint mode, you can start adding explicit precedence constraints. Let’s use an example to highlight this. Imagine you have a package that needs to run three data flows. (I’m using data flows for the example because they are simple to read in Biml, and I want the focus to be on the constraints, not the tasks.) I want one data flow to execute first – this data flow will be named “Me First”. If “Me First” succeeds, the data flow named “Me Next (Success)” should execute. The third data flow, named “I’m Last (Always)” should always be executed, regardless of the success or failure of the other two tasks. So my package should look like this:

So how do we get this output from Biml? We can use the PrecedenceConstraints collection on each task. At it’s simplest, you just add an Input to the collection, and reference the output of the task that should execute prior to this one. In Biml, all tasks have a built-in output named Output. You can reference it using TaskName.Output (“Me First.Output” in the example below). This will create a regular, Success constraint between the tasks.

<Dataflow Name="Me Next (Success)">
    <PrecedenceConstraints>
        <Inputs>
            <Input OutputPathName="Me First.Output"/>
        </Inputs>
    </PrecedenceConstraints>
</Dataflow>

For the next set of constraints, we want to use the OR logical type, using the LogicalType property, for the constraints, since either of them should cause the third task to run. We also need to explicitly set the evaluation value on these, using the EvaluationValue property.

<Dataflow Name="I'm Last (Always)">
    <PrecedenceConstraints LogicalType="Or">
        <Inputs>
            <Input OutputPathName="Me First.Output" EvaluationValue="Failure"/>
            <Input OutputPathName="Me Next (Success).Output" EvaluationValue="Completion"/>
        </Inputs>
    </PrecedenceConstraints>
</Dataflow>

You can also add expression constraints to the Inputs, to control whether tasks run based on the results on an expression. You use the EvaluationOperation and Expression properties to configure that.

<Package Name="Control Flow Sample 3" AutoCreateConfigurationsType="None" ConstraintMode="Parallel">
    <Variables>
        <Variable Name="Continue" DataType="Int32">0</Variable>
    </Variables>
    <Tasks>
        <Dataflow Name="Task 1"/>
        <Dataflow Name="Task 2">
            <PrecedenceConstraints>
                <Inputs>
                    <Input OutputPathName="Task 1.Output" EvaluationOperation="Expression" Expression="@Continue==1"/>
                </Inputs>
            </PrecedenceConstraints>
        </Dataflow>
    </Tasks>
</Package>

That Biml results in a package that looks like this.

That’s how to control the precedence constraints. I’ve uploaded the Biml from this post to my SkyDrive here, so you can download and experiment with this yourself. In the next post, we’ll look at controlling the data paths in a data flow.

Posted in SSIS | Tagged BIDSHelper, Biml, SSIS | Comments Off

BIDS Helper 1.5 Is Available!

Posted on June 9, 2011 by jwelch

The new version of BIDS Helper has now been released. If you haven’t used it before, now’s a great time to get started with it. BIDS Helper has been adding much needed functionality functionality to the BIDS environment for over 4 years, with the first release happening back in May, 2007.

I posted last month about the beta – thanks to everyone that downloaded it and provided feedback. The major new features are:

Duplicate Roles

This feature allows you to copy a role with all of the associated settings and permissions. This is implemented as a new menu item on the right click menu for a role.

Biml Package Generator

The Biml Package Generator provides the ability to create packages from Business Intelligence Markup Language (Biml). You can find some tutorials on using this functionality here, and a short introduction video here.

There were also several issues resolved, and other improvements. The Enable/Disable features dialog has been improved to provide additional information and help links. The new version of the SSIS Expression Editor is also included, and there are improvements to the variable move feature, the Expression list, and the Printer Friendly Dimension Usage feature.

I’m pretty excited about this release, since I’ve been working on the Biml language and technology since joining Varigence. I think it’s great that we’re able to provide some of this functionality for free to the community through BIDS Helper. I’m particularly interested in seeing what interesting uses the community comes up with for it. As I mentioned here, I think it will provide a nice way to package up complete, working samples that someone can just copy and paste from a web page, and use locally. On top of that, it will help automate a lot of rote tasks in SSIS, as I’ll continue to highlight in my series on using Biml in BIDS Helper.

We’ve already gotten some great feedback on the new version, along with some suggestions for continued improvements. If there’s functionality you’d like to see added to BIDS Helper, please suggest it at the Issue Tracker page – you can also use that to track new feature suggestions.

Posted in SSIS | Tagged BIDSHelper, Biml, SSAS, SSIS | Comments Off

Configuring Visual Studio to Play Nicely with BimlScript

Posted on June 3, 2011 by jwelch

Visual Studio has a great XML editor. Unfortunately, it doesn’t understand BimlScript. This can cause a few headaches when using the Biml feature in BIDS Helper, particularly when you copy and paste BimlScript code into the editor. Take this code as an example:

<#@ template language="C#" hostspecific="true"#>
      
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
      <Packages>
          <# for(var i = 0; i < 5; i++) {#>
            <Package Name="Package <#=i#>" ConstraintMode="Linear"/>
          <# } #>
      </Packages>
</Biml>

If you copy and paste this code into the XML editor with the default settings, you’ll see something like this:

Notice that Visual Studio helpfully added an =”” after the word “template” in the first line. That would be nice, expect that the first line is not XML, it’s actually BimlScript. The word “template” in this case is a directive, and shouldn’t have =”” after it. You have to delete the =”” for this BimlScript to work.

You’ll also notice the red and blue squiggles underneath some of the code. This is also caused by the XML editor not knowing how to parse this information. You can ignore the squiggles, as they won’t impact the execution of the BimlScript.

Fortunately, you can make the experience better. In Visual Studio, go to the Tools menu, and select Options. In the resulting dialog, expand the Text Editor item on the left, and then select XML.

Expand the XML item, and choose the Formatting item. In the right panel, in the Auto Reformat section, uncheck “On paste from clipboard” and “On completion of end tag”. Click OK to save your changes.

Unfortunately, this doesn’t get rid of the squiggles, but it will stop Visual Studio from incorrectly reformatting your BimlScript. If you copy and paste the above sample after making these changes, you won’t get the extraneous =”” values.

Hopefully this makes the BimlScript experience in Visual Studio a little more enjoyable.

Posted in SSIS | Tagged BIDSHelper, Biml | Comments Off

Copy Data Dynamically with BimlScript

Posted on May 31, 2011 by jwelch

This post is part 3 of a series on using Biml in BIDS Helper. This post builds on some of the information and the sample from the previous posts.

BimlScript enables some interesting scenarios for generating large numbers of SSIS packages automatically. This can come in handy when you need to copy most or all of the data in one database to a different one. In this case, you could use something like the Transfer SQL Server Objects task, but it has a few problems. You can roll your own, but that might mean a fair amount of custom scripting. Or you could use the Import / Export Wizard. But in all these cases, you don’t have complete control of how the packages are produced. You could create all the packages by hand, which does give you full control, but then you are stuck doing a lot of repetitive work in SSIS.

BimlScript provides an alternative that lets you fully control the output, while automating the rote work of producing lots of packages that use the same pattern. Let’s take a look at a sample of this, using the scenario above (copying the data from one database to another).

<#@ template language="C#" hostspecific="true"#>
<#@ import namespace="System.Data" #>

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
      <Connections>
            <OleDbConnection Name="Source" ConnectionString="Provider=SQLNCLI10;Server=.;Initial Catalog=AdventureWorksDW2008R2;Integrated Security=SSPI;"/>
            <OleDbConnection Name="Target" ConnectionString="Provider=SQLNCLI10;Server=.;Initial Catalog=Target;Integrated Security=SSPI;"/>
      </Connections>
      <Packages>
            <#
                string metadataConnectionString = "Provider=SQLNCLI10;Server=.;Initial Catalog=AdventureWorksDW2008R2;Integrated Security=SSPI;";
                DataTable tables = ExternalDataAccess.GetDataTable(metadataConnectionString,
                    "SELECT '[' + s.name + '].[' + t.name + ']' FROM sys.tables t INNER JOIN sys.schemas s on t.schema_id = s.schema_id");
                foreach (DataRow row in tables.Rows)
                { #>
            <Package Name="Extract <#=row[0]#>" ConstraintMode="Linear" AutoCreateConfigurationsType="None">
                  <Tasks>
                        <Dataflow Name="Copy Data">
                              <Transformations>
                                    <OleDbSource Name="Retrieve Data" ConnectionName="Source">
                                          <DirectInput>SELECT * FROM <#=row[0]#></DirectInput>
                                    </OleDbSource>
                                    <OleDbDestination Name="Insert Data" ConnectionName="Target">
                                          <ExternalTableOutput Table="<#=row[0]#>"/>
                                    </OleDbDestination>
                              </Transformations>
                        </Dataflow>
                  </Tasks>
            </Package>
                <# } #>
      </Packages>
</Biml>

This script is set up to copy all the data in the AdventureWorksDW2008R2 database to a second database named Target (very inventive, I know). One note – the script is not creating the tables in the target database. We could actually automate that portion as well, but it’s beyond the scope of this post. To ensure you are set up properly to run this script, you should create an exact structural copy of your source database under a different name. You can use the Generate Scripts Wizard to do this. Just script the entire database, and then update the generated script to use a different database name (don’t forget to change the USE statement to the new name).

The script will produce a package per table, with a simple data flow that copies all the data using an OLE DB Source and OLE DB Destination. The script leverages the metadata already contained in the database, in the sys.tables view, to drive the loop that creates the packages.

What if you don’t want to select all the rows from each table? Instead, perhaps you want to specify a WHERE clause to use to filter some of the tables. To handle this, we can create a table in the target database that holds our WHERE information.

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
    <Connections>
        <OleDbConnection Name="Target" ConnectionString="Provider=SQLNCLI10;Server=.;Initial Catalog=Target;Integrated Security=SSPI;"/>
    </Connections>
    <Tables>
        <Table Name="WhereClause" ConnectionName="Target">
            <Columns>
                <Column Name="TableName" DataType="String" Length="255"/>
                <Column Name="WhereSql" DataType="String" Length="4000"/>
            </Columns>
        </Table>
    </Tables>
</Biml>

You can use the steps shown in Part 2 of this series to create this table in the Target database. Once it’s been created, populate it with some data. Note that since we are using the schema-qualified name of the table, you’ll need to specify that in the table. There’s an example of data for this table that will work with AdventureWorksDW2008R2 below. This will filter the rows down to only sales where the amount is greater than 1000.

TableName	SelectSql
[dbo].[FactInternetSales]	WHERE [SalesAmount] >= 1000
[dbo].[FactResellerSales]	WHERE [SalesAmount] >= 1000

Now we need to alter the script to use the new information in this table. At the beginning of the block of script after the <Packages> element, add the following code:

string targetConnectionString = "Provider=SQLNCLI10;Server=.;Initial Catalog=Target;Integrated Security=SSPI;";
DataTable whereClauses = ExternalDataAccess.GetDataTable(targetConnectionString, "SELECT TableName, WhereSql FROM WhereClause");

This retrieves the WHERE clauses from the WhereClause table, and stores them in the whereClauses variable.

Next, replace the <Direct Input> line in the OleDbSource with this:

<#
  var dataRow = whereClauses.Select(string.Format("TableName = '{0}'", row[0]));
  string whereSql = dataRow.Length == 0 ? string.Empty : dataRow[0][1].ToString();
  string sql = string.Format("SELECT * FROM {0} {1}", row[0], whereSql);
#>
<DirectInput><#=sql#></DirectInput>

This code determines whether the whereClauses table has a row for the current table. If it does, it appends it to the end of the SELECT statement. The complete, final script looks like this:

<#@ template language="C#" hostspecific="true"#>
<#@ import namespace="System.Data" #>

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
      <Connections>
            <OleDbConnection Name="Source" ConnectionString="Provider=SQLNCLI10;Server=.;Initial Catalog=AdventureWorksDW2008R2;Integrated Security=SSPI;"/>
            <OleDbConnection Name="Target" ConnectionString="Provider=SQLNCLI10;Server=.;Initial Catalog=Target;Integrated Security=SSPI;"/>
      </Connections>
      <Packages>
            <#
                string targetConnectionString = "Provider=SQLNCLI10;Server=.;Initial Catalog=Target;Integrated Security=SSPI;";
                DataTable whereClauses = ExternalDataAccess.GetDataTable(targetConnectionString, "SELECT TableName, WhereSql FROM WhereClause");

                string metadataConnectionString = "Provider=SQLNCLI10;Server=.;Initial Catalog=AdventureWorksDW2008R2;Integrated Security=SSPI;";
                DataTable tables = ExternalDataAccess.GetDataTable(metadataConnectionString,
                    "SELECT '[' + s.name + '].[' + t.name + ']' FROM sys.tables t INNER JOIN sys.schemas s on t.schema_id = s.schema_id");
                foreach (DataRow row in tables.Rows)
                { #>
            <Package Name="Extract <#=row[0]#>" ConstraintMode="Linear" AutoCreateConfigurationsType="None">
                  <Tasks>
                        <Dataflow Name="Copy Data">
                              <Transformations>
                                    <OleDbSource Name="Retrieve Data" ConnectionName="Source">
                                        <#
                                            var dataRow = whereClauses.Select(string.Format("TableName = '{0}'", row[0]));
                                            string whereSql = dataRow.Length == 0 ? string.Empty : dataRow[0][1].ToString();
                                            string sql = string.Format("SELECT * FROM {0} {1}", row[0], whereSql);
                                        #>
                                          <DirectInput><#=sql#></DirectInput>
                                    </OleDbSource>
                                    <OleDbDestination Name="Insert Data" ConnectionName="Target">
                                          <ExternalTableOutput Table="<#=row[0]#>"/>
                                    </OleDbDestination>
                              </Transformations>
                        </Dataflow>
                  </Tasks>
            </Package>
                <# } #>
      </Packages>
</Biml>

You can see the results of this script by right-clicking on the Biml file, and choosing Expand. It may take a minute or two to process, but when it finishes, you should see a package for each table in your source database. The data flows will copy the data from Source to Target, and any WHERE clauses you add to the WhereClause table will be used.

There’s a lot more that could be done with this script (automating the recreation of the tables in the destination, or deleting existing data, for example), but it’s still a good example of what BimlScript can do. Instead of spending your time writing 10s or 100s of repetitive packages, automate it with BimlScript.

Posted in SSIS | Tagged Automation, Biml, SSIS | Comments Off

Creating Tables using Biml and BimlScript

Posted on May 26, 2011 by jwelch

This post is part 2 of a series on using Biml in BIDS Helper. This post builds on some of the information and the sample from the previous posts.

When I’m creating samples for SSIS, I often find it necessary to create supporting tables to go along with the package sample. One of the things I like about Biml is that you can define both your tables and packages in the language. Here’s an example of defining an OrderHeader and OrderDetail table in Biml:

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
    <Connections>
        <OleDbConnection Name="DbConnection" ConnectionString="Server=.;Initial Catalog=Sandbox;Provider=SQLNCLI10.1;Integrated Security=SSPI;"/>
    </Connections>
    <Tables>
        <Table Name="OrderHeader" ConnectionName="DbConnection">
            <Columns>
                <Column Name="OrderId" DataType="Int32" IdentityIncrement="1" IdentitySeed="1"/>
                <Column Name="SalesDate" DataType="DateTime"/>
                <Column Name="CustomerName" DataType="String" Length="50"/>
            </Columns>
            <Keys>
                <PrimaryKey Name="OrderHeaderPK">
                    <Columns>
                        <Column ColumnName="OrderId"/>
                    </Columns>
                </PrimaryKey>
            </Keys>
        </Table>
        <Table Name="OrderDetail" ConnectionName="DbConnection">
            <Columns>
                <Column Name="OrderDetailId" DataType="Int32" IdentityIncrement="1" IdentitySeed="1"/>
                <TableReference Name="OrderId" TableName="OrderHeader"/>
                <Column Name="ProductName" DataType="String" Length="50"/>
                <Column Name="Qty" DataType="Int16"/>
                <Column Name="UnitPrice" DataType="Currency"/>
            </Columns>
            <Keys>
                <PrimaryKey Name="OrderDetailPK">
                    <Columns>
                        <Column ColumnName="OrderDetailId"/>
                    </Columns>
                </PrimaryKey>
            </Keys>
        </Table>
    </Tables>
</Biml>

Tables are defined in a <Table> tag. They can have columns defined, as well as keys, and even indexes (not shown in the example above). Notice that the OrderId column doesn’t have a DataType attribute. Many of the attributes in Biml have default values, and data type is one of them. If it’s not specified, the column data type will default to Int32. The primary key for the table is defined with a <PrimaryKey> element.

The OrderDetail table includes a <TableReference> column. TableReference columns are a special class of columns, that define that this column should have a foreign key reference to another table. This one is referencing back to the OrderHeader table. It’s not shown, but you can also use a MultipleColumnTableReference, if your foreign key needs to span multiple columns.

Great – now you have your tables defined in Biml, but how do you make use of that? If only there were some way to run this against your database to create the tables… Well, fortunately, there is – by using BimlScript. BimlScript is a scripting layer that automates the production of Biml (similar in concept to the way ASP.NET produces HTML). To set this up, you need to add two Biml files to your project – one to hold the table definitions above, and one to hold the BimlScript.

First, add a new Biml file to the SSIS project (see Part 1 if you need a refresher on this). Copy the Biml above to this file, and rename the file to TableDefinitions.biml.

Second, add an additional Biml file. Name this one CreateTables.biml.

Open the CreateTables.biml file, and replace the contents with the following code:

<#@ template language="C#" hostspecific="True" #>

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
    <Packages>
        <Package Name="Create Tables" AutoCreateConfigurationsType="None" ConstraintMode="Linear">
            <Tasks>
                <# foreach(var table in RootNode.Tables) {#>
                <ExecuteSQL Name="Create <#=table.Name#>" ConnectionName="<#=table.Connection.Name#>">
                    <DirectInput>
                        <#=table.GetTableSql()#>    
                    </DirectInput>
                </ExecuteSQL>
                <# } #>
                <# foreach(var table in RootNode.Dimensions) {#>
                <ExecuteSQL Name="Create <#=table.Name#>" ConnectionName="<#=table.Connection.Name#>">
                    <DirectInput>
                        <#=table.GetTableSql()#>    
                    </DirectInput>
                </ExecuteSQL>
                <# } #>
                <# foreach(var table in RootNode.Facts) {#>
                <ExecuteSQL Name="Create <#=table.Name#>" ConnectionName="<#=table.Connection.Name#>">
                    <DirectInput>
                        <#=table.GetTableSql()#>    
                    </DirectInput>
                </ExecuteSQL>
                <# } #>
            </Tasks>
        </Package>
    </Packages>
</Biml>

This file has a header at the beginning that indicates the script will use C#. The next section defines a package named “Create Tables”. The section inside the Tasks element is the interesting part:

<# foreach(var table in RootNode.Tables) {#>
<ExecuteSQL Name="Create <#=table.Name#>" ConnectionName="<#=table.Connection.Name#>">
    <DirectInput>
        <#=table.GetTableSql()#>    
    </DirectInput>
</ExecuteSQL>
<# } #>

This code iterates over the tables that are part of the current model. For each table it finds, it creates an ExecuteSQL task, and embeds the SQL to create the table in the package. The code is repeated to iterate over Dimensions and Facts, which are special classes of tables.

Notice that there are no tables defined in the BimlScript file. The BimlScript can’t operate against objects defined in the same file, which is why we created the TableDefinitions.biml file separately. To produce the package, multi-select both TableDefinitions.biml, and CreateTables.biml, right-click, and choose Expand Biml File.

This will produce a new SSIS package in the project named Create Tables.dtsx. It contains two Execute SQL tasks, one for each table.

Each task includes the appropriate SQL to create the tables. As an example, here’s the OrderHeader SQL from the Execute SQL task.

SET ANSI_NULLS ON
SET QUOTED_IDENTIFIER ON
GO

-------------------------------------------------------------------
IF EXISTS (SELECT * from sys.objects WHERE object_id = OBJECT_ID(N'[OrderHeader]') AND type IN (N'U'))
DROP TABLE [OrderHeader]
GO

CREATE TABLE [OrderHeader]
(
-- Columns Definition
 [OrderId] int IDENTITY(1,1) NOT NULL
, [SalesDate] datetime NOT NULL
, [CustomerName] nvarchar(50) NOT NULL

-- Constraints
,CONSTRAINT [OrderHeaderPK] PRIMARY KEY CLUSTERED
(
  [OrderId] Asc) WITH(PAD_INDEX = OFF,IGNORE_DUP_KEY = OFF) ON [PRIMARY]

)
ON [PRIMARY]
WITH (DATA_COMPRESSION = NONE)
GO

-------------------------------------------------------------------

Note that the tables are ordered in the package in the same order they are defined in the Biml file. If you have tables with dependencies, make sure to order them correctly.

In the next post, we’ll look at some ways to copy data dynamically using BimlScript.

Posted in SSIS | Tagged Automation, BIDSHelper, Biml, SSIS | Comments Off