Archive for June 2011

Delivering a Pre-Con at SQLSaturday #89 (#sqlsat89)

I’ll be delivering an all-day deep dive into using SSIS for data warehouse ETL processes the day before SQLSaturday #89, on Friday, September 16th. We’ll be taking an in-depth tour of implementing data warehouse extract, transform, and load processes with SSIS, with plenty of demonstrations and sample code. If you’ve ever wondered about how to handle data errors during your ETL, how to handle updates to large fact tables, or how to load a dimension table that combines type 1, 2, and 3 attributes, then come to this pre-con. We’ll cover all of that, plus a lot more. We have a reduced rate on the pre-con until July 1st, so now’s a great time to register.

Pre-cons like this are some of the most cost effective training you can get – plenty of time to both cover a topic from end to end, and to dive into the real implementation details that are often missing from shorter presentations because of the time constraints. I hope to see you there!

Data Warehousing with SSIS Deep Dive

Want to learn more about implementing data warehouse ETL with SQL Server Integration Services? Attend this full day seminar, and we’ll cover using SSIS for data warehousing in-depth. You’ll learn everything you need to know to populate your data warehouse with data. We’ll cover how to develop a common framework for your packages, automate the creation of rote packages for staging data, implement common patterns for handling various types of dimensions and fact tables, and how to instrument your packages to identify and recover from failures when loading data. We’ll be using the AdventureWorks databases for the examples, so bring along a laptop configured with SQL Server 2005 or later, and the AdventureWorks sample databases installed. We’ll also cover how the upcoming Denali release of SQL Server affects what we discuss in this seminar.

  • Laying out a framework for your ETL
    • Logging
    • Restartability and Recoverability
    • Auditing
  • Handling Dimensions
    • SCD Type 1
    • SCD Type 2
    • Advanced Dimension Types
  • Handling Facts
    • Transactional
    • Periodic Snapshot
    • Accumulating Snapshot
    • Advanced Fact Patterns
  • Errors
    • Handling Processing Errors
    • Handling Data Errors
    • Recovering from Errors
  • Best Practices for Managing Your ETL

Presenting at SQLSaturday #82 (#sqlsat82)

I’ll be presenting two sessions at SQLSaturday #82 this weekend in Indianapolis.

If you’ve seen my posts about using Biml to generate SSIS packages, and you’d like to learn more, then please check out “Do More (ETL) with Less (Effort) – Automating SSIS”. I’ll talk about the Biml support in BIDS Helper, as well as other approaches for creating SSIS packages without all the manual effort. In my other presentation, “Tuning Analysis Services Processing Performance”, we’ll look at some of the common performance problems people encounter with Analysis Services, and how to resolve them. We’ll also cover a process for doing the tuning.

I really enjoy presenting at SQLSaturday events, and I’m looking forward to this one. I hope to see you there.

 

Do More (ETL) with Less (Effort) – Automating SSIS

SSIS is a great tool for transferring data from one data source to another, and for implementing complex ETL processes. However, for simple, straightforward data transfer tasks or packages that adhere to a pattern, creating SSIS packages by hand can be time-consuming and repetitious. By attending this session, you’ll learn how to automate package creation in SSIS, including the dynamic generation of data flows. We’ll cover some of the free and open source tools available for this, and discuss “roll your own” options.

Tuning Analysis Services Processing Performance

You’ve got your Analysis Services cube created, and deployed in production. However, you notice that every night, the cube is taking longer and longer to process, and users are starting to complain about their data not being ready when they arrive in the morning. If you’ve found yourself in this situation, or want to avoid being in it in the first place, come to this session. We’ll cover how to benchmark processing performance, track down bottlenecks, and how to tune things to get the best performance for processing your cube.

Defining the Data Flow in Biml

This post is part 5 of a series on using Biml in BIDS Helper. This post builds on some of the information and the sample from the previous posts.

In the previous post in the series, I talked about controlling the order of execution in the control flow. In this post, the focus will be on the dataflow, and controlling how the data in the pipeline flows from one component to the next. This post uses a new table as the target of the data flow, so you may want to review Part 2: Creating Tables using Biml and BimlScript to see how to create the table locally. The Biml below describes the table. You can create it in the database of your choice – I used a database named Target.

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
    <Connections>
        <OleDbConnection Name="Source" ConnectionString="Provider=SQLNCLI10;Server=.;Initial Catalog=AdventureWorksDW2008R2;Integrated Security=SSPI;"/>
        <OleDbConnection Name="Target" ConnectionString="Provider=SQLNCLI10;Server=.;Initial Catalog=Target;Integrated Security=SSPI;"/>
    </Connections>
    <Tables>
        <Table Name="DimAccount_Test" ConnectionName="Target">
            <Columns>
                <Column Name="AccountKey" />
                <Column Name="ParentAccountKey" IsNullable="true" />
                <Column Name="AccountCodeAlternateKey" IsNullable="true" />
                <Column Name="ParentAccountCodeAlternateKey" IsNullable="true" />
                <Column Name="AccountDescription" DataType="String" Length="50" IsNullable="true" />
                <Column Name="AccountType" DataType="String" Length="50" IsNullable="true" />
                <Column Name="Operator" DataType="String" Length="50" IsNullable="true" />
                <Column Name="CustomMembers" DataType="String" Length="300" IsNullable="true" />
                <Column Name="ValueType" DataType="String" Length="50" IsNullable="true" />
                <Column Name="CustomMemberOptions" DataType="String" Length="200" IsNullable="true" />
            </Columns>
        </Table>
    </Tables>
</Biml>

With the table created, we can move on to the interesting part – transforming the data. In a simple, straightforward data flow, the Biml compiler will do most of the work for you. Take this data flow as an example:

<Dataflow Name="Dataflow 1">
    <Transformations>
        <OleDbSource Name="Source" ConnectionName="Source">
            <DirectInput>SELECT * FROM dbo.DimAccount</DirectInput>
        </OleDbSource>
        <OleDbDestination Name="Target" ConnectionName="Target">
            <ExternalTableOutput Table="dbo.DimAccount_Test"/>
        </OleDbDestination>
    </Transformations>
</Dataflow>

In this case, you don’t have to specify any data paths. The Biml compiler will infer that the OleDbSource’s output should be connected to the input of the OleDbDestination.

image

The compiler is able to do this by using default outputs. In Biml, most components have a default output defined. In the absence of other information, the compiler will automatically connect the default output of a transformation to the input of the next component defined in the Biml. So, if we use a slightly more complex data flow, like this:

<Dataflow Name="Dataflow 2">
    <Transformations>
        <OleDbSource Name="Source" ConnectionName="Source">
            <DirectInput>SELECT * FROM dbo.DimAccount</DirectInput>
        </OleDbSource>
        <Lookup Name="Check For Existing" OleDbConnectionName="Target" NoMatchBehavior="RedirectRowsToNoMatchOutput">
            <DirectInput>SELECT AccountKey FROM dbo.DimAccount</DirectInput>
            <Inputs>
                <Column SourceColumn="AccountKey" TargetColumn="AccountKey"/>
            </Inputs>
        </Lookup>
        <ConditionalSplit Name="Test ID Range">
            <OutputPaths>
                <OutputPath Name="High ID">
                    <Expression>AccountKey >= 100</Expression>
                </OutputPath>
            </OutputPaths>
        </ConditionalSplit>
        <OleDbDestination Name="Target" ConnectionName="Target">
            <ExternalTableOutput Table="dbo.DimAccount_Test"/>
        </OleDbDestination>
    </Transformations>
</Dataflow>

We end up with a data flow that still automatically connects data paths between components. In this case, though, it’s probably not doing exactly what we want, since it’s just connecting the default outputs. The Conditional Split (“Test ID Range”) in this example is connected by the default output, but we want to use the”High ID” output to filter out IDs less than 100. In the case of the Lookup (“Check For Existing”), the default output being used is the “Match” output, but we only want the non-matched records, so that only new rows are inserted.

image

I explicitly choose the option BIDS to display the path Source Names for this screenshot – by default, they aren’t displayed in the generated package. You can change the setting in BIDS by selecting the path, opening the Properties tool window, and changing the PathAnnotation property to SourceName.

So how would we change the Biml to get the desired results? If we add an InputPath element to the appropriate components, we can control which output is tied to the component’s input. In this case, we need to add explicit InputPath instructions to the Conditional Split (that will reference the Lookup’s NoMatch output) and to the OleDbDestination (which will reference the ConditionalSplit’s High ID output).

<Dataflow Name="Dataflow 3">
    <Transformations>
        <OleDbSource Name="Source" ConnectionName="Source">
            <DirectInput>SELECT * FROM dbo.DimAccount</DirectInput>
        </OleDbSource>
        <Lookup Name="Check For Existing" OleDbConnectionName="Target" NoMatchBehavior="RedirectRowsToNoMatchOutput">
            <DirectInput>SELECT AccountKey FROM dbo.DimAccount</DirectInput>
            <Inputs>
                <Column SourceColumn="AccountKey" TargetColumn="AccountKey"/>
            </Inputs>
        </Lookup>
        <ConditionalSplit Name="Test ID Range">
            <InputPath OutputPathName="Check For Existing.NoMatch"/>
            <OutputPaths>
                <OutputPath Name="High ID">
                    <Expression>AccountKey >= 100</Expression>
                </OutputPath>
            </OutputPaths>
        </ConditionalSplit>
        <OleDbDestination Name="Target" ConnectionName="Target">
            <InputPath OutputPathName="Test ID Range.High ID"/>
            <ExternalTableOutput Table="dbo.DimAccount_Test"/>
        </OleDbDestination>
    </Transformations>
</Dataflow>

This gives you the following data flow.

image

That’s a few examples of controlling the data paths in a data flow. There are a few other bits of information that are important to know about data paths in the data flow.

  • Most components have a default output named “Output”, and a second output named “Error” for the error output (if the component supports errors).
  • The Multicast component has no default output, so you always need to explicitly define the data path mapping from it to the next component.
  • The Union All, Merge, and Merge Join components need to be explicitly mapped, since they support multiple inputs.
  • The Slowly Changing Dimension (SCD) transformation has multiple outputs. The “New” output is the default. There are also outputs named “Unchanged”, “FixedAttribute”, “ChangingAttribute”, “HistoricalAttribute”, and “InferredMember”.
  • The Percentage Sampling and Row Sampling transformations have two output named “Selected” (the default) and “Unselected”.

The sample Biml for this post is on my SkyDrive. Please download it and try it out with the latest release of BIDS Helper.

Controlling the Control Flow in Biml

This post is part 4 of a series on using Biml in BIDS Helper. This post builds on some of the information and the sample from the previous posts.

So far, we’ve looked at some relatively simple packages, in terms of their flow. In this post, we’re going to look at how to handle more complex control flow in Biml.

One feature of Biml is the ConstraintMode property that’s part of packages and containers. This property controls how precedence constraints are generated in the control flow. In the simple case, if you want all tasks to be run in sequence, you can set the ConstraintMode to Linear. This causes the package to be produced with all tasks connected sequentially by Success precedence constraints, in the order they were specified in the Biml. So, the following Biml:

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
    <Packages>
        <Package Name="Control Flow Sample 1" AutoCreateConfigurationsType="None" ConstraintMode="Linear">
            <Tasks>
                <Dataflow Name="Task 1"/>
                <Dataflow Name="Task 2"/>
                <Dataflow Name="Task 3"/>
            </Tasks>
        </Package>
    </Packages>
</Biml>

results in a package that looks like this:

image

This type of linear flow is great in some situations, but sometimes you need more control. In those cases, you can change the ConstraintMode to Parallel. The generated package will not have any automatically created precedence constraints, so it will look like this:

image

Once a container is in Parallel constraint mode, you can start adding explicit precedence constraints. Let’s use an example to highlight this. Imagine you have a package that needs to run three data flows. (I’m using data flows for the example because they are simple to read in Biml, and I want the focus to be on the constraints, not the tasks.) I want one data flow to execute first – this data flow will be named “Me First”. If “Me First” succeeds, the data flow named “Me Next (Success)” should execute. The third data flow, named “I’m Last (Always)” should always be executed, regardless of the success or failure of the other two tasks. So my package should look like this:

image

So how do we get this output from Biml? We can use the PrecedenceConstraints collection on each task. At it’s simplest, you just add an Input to the collection, and reference the output of the task that should execute prior to this one. In Biml, all tasks have a built-in output named Output. You can reference it using TaskName.Output (“Me First.Output” in the example below). This will create a regular, Success constraint between the tasks.

<Dataflow Name="Me Next (Success)">
    <PrecedenceConstraints>
        <Inputs>
            <Input OutputPathName="Me First.Output"/>
        </Inputs>
    </PrecedenceConstraints>
</Dataflow>

For the next set of constraints, we want to use the OR logical type, using the LogicalType property, for the constraints, since either of them should cause the third task to run. We also need to explicitly set the evaluation value on these, using the EvaluationValue property.

<Dataflow Name="I'm Last (Always)">
    <PrecedenceConstraints LogicalType="Or">
        <Inputs>
            <Input OutputPathName="Me First.Output" EvaluationValue="Failure"/>
            <Input OutputPathName="Me Next (Success).Output" EvaluationValue="Completion"/>
        </Inputs>
    </PrecedenceConstraints>
</Dataflow>

You can also add expression constraints to the Inputs, to control whether tasks run based on the results on an expression. You use the EvaluationOperation and Expression properties to configure that.

<Package Name="Control Flow Sample 3" AutoCreateConfigurationsType="None" ConstraintMode="Parallel">
    <Variables>
        <Variable Name="Continue" DataType="Int32">0</Variable>
    </Variables>
    <Tasks>
        <Dataflow Name="Task 1"/>
        <Dataflow Name="Task 2">
            <PrecedenceConstraints>
                <Inputs>
                    <Input OutputPathName="Task 1.Output" EvaluationOperation="Expression" Expression="@Continue==1"/>
                </Inputs>
            </PrecedenceConstraints>
        </Dataflow>
    </Tasks>
</Package>

That Biml results in a package that looks like this.

image

That’s how to control the precedence constraints. I’ve uploaded the Biml from this post to my SkyDrive here, so you can download and experiment with this yourself. In the next post, we’ll look at controlling the data paths in a data flow.

BIDS Helper 1.5 Is Available!

The new version of BIDS Helper has now been released. If you haven’t used it before, now’s a great time to get started with it. BIDS Helper has been adding much needed functionality functionality to the BIDS environment for over 4 years, with the first release happening back in May, 2007.

I posted last month about the beta – thanks to everyone that downloaded it and provided feedback. The major new features are:

Duplicate Roles

This feature allows you to copy a role with all of the associated settings and permissions. This is implemented as a new menu item on the right click menu for a role.

Biml Package Generator

The Biml Package Generator provides the ability to create packages from Business Intelligence Markup Language (Biml). You can find some tutorials on using this functionality here, and a short introduction video here.

There were also several issues resolved, and other improvements. The Enable/Disable features dialog has been improved to provide additional information and help links. The new version of the SSIS Expression Editor is also included, and there are improvements to the variable move feature, the Expression list, and the Printer Friendly Dimension Usage feature.

I’m pretty excited about this release, since I’ve been working on the Biml language and technology since joining Varigence. I think it’s great that we’re able to provide some of this functionality for free to the community through BIDS Helper. I’m particularly interested in seeing what interesting uses the community comes up with for it. As I mentioned here, I think it will provide a nice way to package up complete, working samples that someone can just copy and paste from a web page, and use locally. On top of that, it will help automate a lot of rote tasks in SSIS, as I’ll continue to highlight in my series on using Biml in BIDS Helper.

We’ve already gotten some great feedback on the new version, along with some suggestions for continued improvements. If there’s functionality you’d like to see added to BIDS Helper, please suggest it at the Issue Tracker page – you can also use that to track new feature suggestions.

Configuring Visual Studio to Play Nicely with BimlScript

Visual Studio has a great XML editor. Unfortunately, it doesn’t understand BimlScript. This can cause a few headaches when using the Biml feature in BIDS Helper, particularly when you copy and paste BimlScript code into the editor. Take this code as an example:

<#@ template language="C#" hostspecific="true"#>

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
      <Packages>
          <# for(var i = 0; i < 5; i++) {#>
            <Package Name="Package <#=i#>" ConstraintMode="Linear"/>
          <# } #>
      </Packages>
</Biml>

If you copy and paste this code into the XML editor with the default settings, you’ll see something like this:

image

Notice that Visual Studio helpfully added an =”” after the word “template” in the first line. That would be nice, expect that the first line is not XML, it’s actually BimlScript. The word “template” in this case is a directive, and shouldn’t have =”” after it. You have to delete the =”” for this BimlScript to work.

You’ll also notice the red and blue squiggles underneath some of the code. This is also caused by the XML editor not knowing how to parse this information. You can ignore the squiggles, as they won’t impact the execution of the BimlScript.

Fortunately, you can make the experience better. In Visual Studio, go to the Tools menu, and select Options. In the resulting dialog, expand the Text Editor item on the left, and then select XML.

SNAGHTML166f0b5b

Expand the XML item, and choose the Formatting item. In the right panel, in the Auto Reformat section, uncheck “On paste from clipboard” and “On completion of end tag”. Click OK to save your changes.

SNAGHTML167a9fae

Unfortunately, this doesn’t get rid of the squiggles, but it will stop Visual Studio from incorrectly reformatting your BimlScript. If you copy and paste the above sample after making these changes, you won’t get the extraneous =”” values.

image

Hopefully this makes the BimlScript experience in Visual Studio a little more enjoyable.