BIDSHelper – Agile BI http://agilebi.com A community for sharing ideas about business intelligence development using agile methods. Wed, 15 Jun 2011 16:36:28 +0000 en-US hourly 1 https://wordpress.org/?v=6.2.2 Defining the Data Flow in Biml http://agilebi.com/blog/2011/06/15/defining-the-data-flow-in-biml/ Wed, 15 Jun 2011 16:36:28 +0000 http://6.1504 Continue reading ]]> This post is part 5 of a series on using Biml in BIDS Helper. This post builds on some of the information and the sample from the previous posts.

In the previous post in the series, I talked about controlling the order of execution in the control flow. In this post, the focus will be on the dataflow, and controlling how the data in the pipeline flows from one component to the next. This post uses a new table as the target of the data flow, so you may want to review Part 2: Creating Tables using Biml and BimlScript to see how to create the table locally. The Biml below describes the table. You can create it in the database of your choice – I used a database named Target.

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
    <Connections>
        <OleDbConnection Name="Source" ConnectionString="Provider=SQLNCLI10;Server=.;Initial Catalog=AdventureWorksDW2008R2;Integrated Security=SSPI;"/>
        <OleDbConnection Name="Target" ConnectionString="Provider=SQLNCLI10;Server=.;Initial Catalog=Target;Integrated Security=SSPI;"/>
    </Connections>
    <Tables>
        <Table Name="DimAccount_Test" ConnectionName="Target">
            <Columns>
                <Column Name="AccountKey" />
                <Column Name="ParentAccountKey" IsNullable="true" />
                <Column Name="AccountCodeAlternateKey" IsNullable="true" />
                <Column Name="ParentAccountCodeAlternateKey" IsNullable="true" />
                <Column Name="AccountDescription" DataType="String" Length="50" IsNullable="true" />
                <Column Name="AccountType" DataType="String" Length="50" IsNullable="true" />
                <Column Name="Operator" DataType="String" Length="50" IsNullable="true" />
                <Column Name="CustomMembers" DataType="String" Length="300" IsNullable="true" />
                <Column Name="ValueType" DataType="String" Length="50" IsNullable="true" />
                <Column Name="CustomMemberOptions" DataType="String" Length="200" IsNullable="true" />
            </Columns>
        </Table>
    </Tables>
</Biml>

With the table created, we can move on to the interesting part – transforming the data. In a simple, straightforward data flow, the Biml compiler will do most of the work for you. Take this data flow as an example:

<Dataflow Name="Dataflow 1">
    <Transformations>
        <OleDbSource Name="Source" ConnectionName="Source">
            <DirectInput>SELECT * FROM dbo.DimAccount</DirectInput>
        </OleDbSource>
        <OleDbDestination Name="Target" ConnectionName="Target">
            <ExternalTableOutput Table="dbo.DimAccount_Test"/>
        </OleDbDestination>
    </Transformations>
</Dataflow>

In this case, you don’t have to specify any data paths. The Biml compiler will infer that the OleDbSource’s output should be connected to the input of the OleDbDestination.

image

The compiler is able to do this by using default outputs. In Biml, most components have a default output defined. In the absence of other information, the compiler will automatically connect the default output of a transformation to the input of the next component defined in the Biml. So, if we use a slightly more complex data flow, like this:

<Dataflow Name="Dataflow 2">
    <Transformations>
        <OleDbSource Name="Source" ConnectionName="Source">
            <DirectInput>SELECT * FROM dbo.DimAccount</DirectInput>
        </OleDbSource>
        <Lookup Name="Check For Existing" OleDbConnectionName="Target" NoMatchBehavior="RedirectRowsToNoMatchOutput">
            <DirectInput>SELECT AccountKey FROM dbo.DimAccount</DirectInput>
            <Inputs>
                <Column SourceColumn="AccountKey" TargetColumn="AccountKey"/>
            </Inputs>
        </Lookup>
        <ConditionalSplit Name="Test ID Range">
            <OutputPaths>
                <OutputPath Name="High ID">
                    <Expression>AccountKey >= 100</Expression>
                </OutputPath>
            </OutputPaths>
        </ConditionalSplit>
        <OleDbDestination Name="Target" ConnectionName="Target">
            <ExternalTableOutput Table="dbo.DimAccount_Test"/>
        </OleDbDestination>
    </Transformations>
</Dataflow>

We end up with a data flow that still automatically connects data paths between components. In this case, though, it’s probably not doing exactly what we want, since it’s just connecting the default outputs. The Conditional Split (“Test ID Range”) in this example is connected by the default output, but we want to use the”High ID” output to filter out IDs less than 100. In the case of the Lookup (“Check For Existing”), the default output being used is the “Match” output, but we only want the non-matched records, so that only new rows are inserted.

image

I explicitly choose the option BIDS to display the path Source Names for this screenshot – by default, they aren’t displayed in the generated package. You can change the setting in BIDS by selecting the path, opening the Properties tool window, and changing the PathAnnotation property to SourceName.

So how would we change the Biml to get the desired results? If we add an InputPath element to the appropriate components, we can control which output is tied to the component’s input. In this case, we need to add explicit InputPath instructions to the Conditional Split (that will reference the Lookup’s NoMatch output) and to the OleDbDestination (which will reference the ConditionalSplit’s High ID output).

<Dataflow Name="Dataflow 3">
    <Transformations>
        <OleDbSource Name="Source" ConnectionName="Source">
            <DirectInput>SELECT * FROM dbo.DimAccount</DirectInput>
        </OleDbSource>
        <Lookup Name="Check For Existing" OleDbConnectionName="Target" NoMatchBehavior="RedirectRowsToNoMatchOutput">
            <DirectInput>SELECT AccountKey FROM dbo.DimAccount</DirectInput>
            <Inputs>
                <Column SourceColumn="AccountKey" TargetColumn="AccountKey"/>
            </Inputs>
        </Lookup>
        <ConditionalSplit Name="Test ID Range">
            <InputPath OutputPathName="Check For Existing.NoMatch"/>
            <OutputPaths>
                <OutputPath Name="High ID">
                    <Expression>AccountKey >= 100</Expression>
                </OutputPath>
            </OutputPaths>
        </ConditionalSplit>
        <OleDbDestination Name="Target" ConnectionName="Target">
            <InputPath OutputPathName="Test ID Range.High ID"/>
            <ExternalTableOutput Table="dbo.DimAccount_Test"/>
        </OleDbDestination>
    </Transformations>
</Dataflow>

This gives you the following data flow.

image

That’s a few examples of controlling the data paths in a data flow. There are a few other bits of information that are important to know about data paths in the data flow.

  • Most components have a default output named “Output”, and a second output named “Error” for the error output (if the component supports errors).
  • The Multicast component has no default output, so you always need to explicitly define the data path mapping from it to the next component.
  • The Union All, Merge, and Merge Join components need to be explicitly mapped, since they support multiple inputs.
  • The Slowly Changing Dimension (SCD) transformation has multiple outputs. The “New” output is the default. There are also outputs named “Unchanged”, “FixedAttribute”, “ChangingAttribute”, “HistoricalAttribute”, and “InferredMember”.
  • The Percentage Sampling and Row Sampling transformations have two output named “Selected” (the default) and “Unselected”.

The sample Biml for this post is on my SkyDrive. Please download it and try it out with the latest release of BIDS Helper.

]]>
Controlling the Control Flow in Biml http://agilebi.com/blog/2011/06/13/controlling-the-control-flow-in-biml/ Mon, 13 Jun 2011 19:01:23 +0000 http://6.1495 Continue reading ]]> This post is part 4 of a series on using Biml in BIDS Helper. This post builds on some of the information and the sample from the previous posts.

So far, we’ve looked at some relatively simple packages, in terms of their flow. In this post, we’re going to look at how to handle more complex control flow in Biml.

One feature of Biml is the ConstraintMode property that’s part of packages and containers. This property controls how precedence constraints are generated in the control flow. In the simple case, if you want all tasks to be run in sequence, you can set the ConstraintMode to Linear. This causes the package to be produced with all tasks connected sequentially by Success precedence constraints, in the order they were specified in the Biml. So, the following Biml:

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
    <Packages>
        <Package Name="Control Flow Sample 1" AutoCreateConfigurationsType="None" ConstraintMode="Linear">
            <Tasks>
                <Dataflow Name="Task 1"/>
                <Dataflow Name="Task 2"/>
                <Dataflow Name="Task 3"/>
            </Tasks>
        </Package>
    </Packages>
</Biml>

results in a package that looks like this:

image

This type of linear flow is great in some situations, but sometimes you need more control. In those cases, you can change the ConstraintMode to Parallel. The generated package will not have any automatically created precedence constraints, so it will look like this:

image

Once a container is in Parallel constraint mode, you can start adding explicit precedence constraints. Let’s use an example to highlight this. Imagine you have a package that needs to run three data flows. (I’m using data flows for the example because they are simple to read in Biml, and I want the focus to be on the constraints, not the tasks.) I want one data flow to execute first – this data flow will be named “Me First”. If “Me First” succeeds, the data flow named “Me Next (Success)” should execute. The third data flow, named “I’m Last (Always)” should always be executed, regardless of the success or failure of the other two tasks. So my package should look like this:

image

So how do we get this output from Biml? We can use the PrecedenceConstraints collection on each task. At it’s simplest, you just add an Input to the collection, and reference the output of the task that should execute prior to this one. In Biml, all tasks have a built-in output named Output. You can reference it using TaskName.Output (“Me First.Output” in the example below). This will create a regular, Success constraint between the tasks.

<Dataflow Name="Me Next (Success)">
    <PrecedenceConstraints>
        <Inputs>
            <Input OutputPathName="Me First.Output"/>
        </Inputs>
    </PrecedenceConstraints>
</Dataflow>

For the next set of constraints, we want to use the OR logical type, using the LogicalType property, for the constraints, since either of them should cause the third task to run. We also need to explicitly set the evaluation value on these, using the EvaluationValue property.

<Dataflow Name="I'm Last (Always)">
    <PrecedenceConstraints LogicalType="Or">
        <Inputs>
            <Input OutputPathName="Me First.Output" EvaluationValue="Failure"/>
            <Input OutputPathName="Me Next (Success).Output" EvaluationValue="Completion"/>
        </Inputs>
    </PrecedenceConstraints>
</Dataflow>

You can also add expression constraints to the Inputs, to control whether tasks run based on the results on an expression. You use the EvaluationOperation and Expression properties to configure that.

<Package Name="Control Flow Sample 3" AutoCreateConfigurationsType="None" ConstraintMode="Parallel">
    <Variables>
        <Variable Name="Continue" DataType="Int32">0</Variable>
    </Variables>
    <Tasks>
        <Dataflow Name="Task 1"/>
        <Dataflow Name="Task 2">
            <PrecedenceConstraints>
                <Inputs>
                    <Input OutputPathName="Task 1.Output" EvaluationOperation="Expression" Expression="@Continue==1"/>
                </Inputs>
            </PrecedenceConstraints>
        </Dataflow>
    </Tasks>
</Package>

That Biml results in a package that looks like this.

image

That’s how to control the precedence constraints. I’ve uploaded the Biml from this post to my SkyDrive here, so you can download and experiment with this yourself. In the next post, we’ll look at controlling the data paths in a data flow.

]]>
BIDS Helper 1.5 Is Available! http://agilebi.com/blog/2011/06/09/bids-helper-1-5-available/ Thu, 09 Jun 2011 14:54:05 +0000 http://6.1486 Continue reading ]]> The new version of BIDS Helper has now been released. If you haven’t used it before, now’s a great time to get started with it. BIDS Helper has been adding much needed functionality functionality to the BIDS environment for over 4 years, with the first release happening back in May, 2007.

I posted last month about the beta – thanks to everyone that downloaded it and provided feedback. The major new features are:

Duplicate Roles

This feature allows you to copy a role with all of the associated settings and permissions. This is implemented as a new menu item on the right click menu for a role.

Biml Package Generator

The Biml Package Generator provides the ability to create packages from Business Intelligence Markup Language (Biml). You can find some tutorials on using this functionality here, and a short introduction video here.

There were also several issues resolved, and other improvements. The Enable/Disable features dialog has been improved to provide additional information and help links. The new version of the SSIS Expression Editor is also included, and there are improvements to the variable move feature, the Expression list, and the Printer Friendly Dimension Usage feature.

I’m pretty excited about this release, since I’ve been working on the Biml language and technology since joining Varigence. I think it’s great that we’re able to provide some of this functionality for free to the community through BIDS Helper. I’m particularly interested in seeing what interesting uses the community comes up with for it. As I mentioned here, I think it will provide a nice way to package up complete, working samples that someone can just copy and paste from a web page, and use locally. On top of that, it will help automate a lot of rote tasks in SSIS, as I’ll continue to highlight in my series on using Biml in BIDS Helper.

We’ve already gotten some great feedback on the new version, along with some suggestions for continued improvements. If there’s functionality you’d like to see added to BIDS Helper, please suggest it at the Issue Tracker page – you can also use that to track new feature suggestions.

]]>
Configuring Visual Studio to Play Nicely with BimlScript http://agilebi.com/blog/2011/06/03/configuring-visual-studio-to-play-nicely-with-bimlscript/ Fri, 03 Jun 2011 23:03:55 +0000 http://6.1485 Continue reading ]]> Visual Studio has a great XML editor. Unfortunately, it doesn’t understand BimlScript. This can cause a few headaches when using the Biml feature in BIDS Helper, particularly when you copy and paste BimlScript code into the editor. Take this code as an example:

<#@ template language="C#" hostspecific="true"#>
      
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
      <Packages>
          <# for(var i = 0; i < 5; i++) {#>
            <Package Name="Package <#=i#>" ConstraintMode="Linear"/>
          <# } #>
      </Packages>
</Biml>

If you copy and paste this code into the XML editor with the default settings, you’ll see something like this:

image

Notice that Visual Studio helpfully added an =”” after the word “template” in the first line. That would be nice, expect that the first line is not XML, it’s actually BimlScript. The word “template” in this case is a directive, and shouldn’t have =”” after it. You have to delete the =”” for this BimlScript to work.

You’ll also notice the red and blue squiggles underneath some of the code. This is also caused by the XML editor not knowing how to parse this information. You can ignore the squiggles, as they won’t impact the execution of the BimlScript.

Fortunately, you can make the experience better. In Visual Studio, go to the Tools menu, and select Options. In the resulting dialog, expand the Text Editor item on the left, and then select XML.

SNAGHTML166f0b5b

Expand the XML item, and choose the Formatting item. In the right panel, in the Auto Reformat section, uncheck “On paste from clipboard” and “On completion of end tag”. Click OK to save your changes.

SNAGHTML167a9fae

Unfortunately, this doesn’t get rid of the squiggles, but it will stop Visual Studio from incorrectly reformatting your BimlScript. If you copy and paste the above sample after making these changes, you won’t get the extraneous =”” values.

image

Hopefully this makes the BimlScript experience in Visual Studio a little more enjoyable.

]]>
Creating Tables using Biml and BimlScript http://agilebi.com/blog/2011/05/26/creating-tables-using-biml-and-bimlscript/ Thu, 26 May 2011 13:00:00 +0000 http://6.1466 Continue reading ]]> This post is part 2 of a series on using Biml in BIDS Helper. This post builds on some of the information and the sample from the previous posts.

When I’m creating samples for SSIS, I often find it necessary to create supporting tables to go along with the package sample. One of the things I like about Biml is that you can define both your tables and packages in the language. Here’s an example of defining an OrderHeader and OrderDetail table in Biml:

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
    <Connections>
        <OleDbConnection Name="DbConnection" ConnectionString="Server=.;Initial Catalog=Sandbox;Provider=SQLNCLI10.1;Integrated Security=SSPI;"/>
    </Connections>
    <Tables>
        <Table Name="OrderHeader" ConnectionName="DbConnection">
            <Columns>
                <Column Name="OrderId" DataType="Int32" IdentityIncrement="1" IdentitySeed="1"/>
                <Column Name="SalesDate" DataType="DateTime"/>
                <Column Name="CustomerName" DataType="String" Length="50"/>
            </Columns>
            <Keys>
                <PrimaryKey Name="OrderHeaderPK">
                    <Columns>
                        <Column ColumnName="OrderId"/>
                    </Columns>
                </PrimaryKey>
            </Keys>
        </Table>
        <Table Name="OrderDetail" ConnectionName="DbConnection">
            <Columns>
                <Column Name="OrderDetailId" DataType="Int32" IdentityIncrement="1" IdentitySeed="1"/>
                <TableReference Name="OrderId" TableName="OrderHeader"/>
                <Column Name="ProductName" DataType="String" Length="50"/>
                <Column Name="Qty" DataType="Int16"/>
                <Column Name="UnitPrice" DataType="Currency"/>
            </Columns>
            <Keys>
                <PrimaryKey Name="OrderDetailPK">
                    <Columns>
                        <Column ColumnName="OrderDetailId"/>
                    </Columns>
                </PrimaryKey>
            </Keys>
        </Table>
    </Tables>
</Biml>

Tables are defined in a <Table> tag. They can have columns defined, as well as keys, and even indexes (not shown in the example above). Notice that the OrderId column doesn’t have a DataType attribute. Many of the attributes in Biml have default values, and data type is one of them. If it’s not specified, the column data type will default to Int32. The primary key for the table is defined with a <PrimaryKey> element.

The OrderDetail table includes a <TableReference> column. TableReference columns are a special class of columns, that define that this column should have a foreign key reference to another table. This one is referencing back to the OrderHeader table. It’s not shown, but you can also use a MultipleColumnTableReference, if your foreign key needs to span multiple columns.

Great – now you have your tables defined in Biml, but how do you make use of that? If only there were some way to run this against your database to create the tables… Well, fortunately, there is – by using BimlScript. BimlScript is a scripting layer that automates the production of Biml (similar in concept to the way ASP.NET produces HTML). To set this up, you need to add two Biml files to your project – one to hold the table definitions above, and one to hold the BimlScript.

First, add a new Biml file to the SSIS project (see Part 1 if you need a refresher on this). Copy the Biml above to this file, and rename the file to TableDefinitions.biml.

image

Second, add an additional Biml file. Name this one CreateTables.biml.

image

Open the CreateTables.biml file, and replace the contents with the following code:

<#@ template language="C#" hostspecific="True" #>

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
    <Packages>
        <Package Name="Create Tables" AutoCreateConfigurationsType="None" ConstraintMode="Linear">
            <Tasks>
                <# foreach(var table in RootNode.Tables) {#>
                <ExecuteSQL Name="Create <#=table.Name#>" ConnectionName="<#=table.Connection.Name#>">
                    <DirectInput>
                        <#=table.GetTableSql()#>    
                    </DirectInput>
                </ExecuteSQL>
                <# } #>
                <# foreach(var table in RootNode.Dimensions) {#>
                <ExecuteSQL Name="Create <#=table.Name#>" ConnectionName="<#=table.Connection.Name#>">
                    <DirectInput>
                        <#=table.GetTableSql()#>    
                    </DirectInput>
                </ExecuteSQL>
                <# } #>
                <# foreach(var table in RootNode.Facts) {#>
                <ExecuteSQL Name="Create <#=table.Name#>" ConnectionName="<#=table.Connection.Name#>">
                    <DirectInput>
                        <#=table.GetTableSql()#>    
                    </DirectInput>
                </ExecuteSQL>
                <# } #>
            </Tasks>
        </Package>
    </Packages>
</Biml>

This file has a header at the beginning that indicates the script will use C#. The next section defines a package named “Create Tables”. The section inside the Tasks element is the interesting part:

<# foreach(var table in RootNode.Tables) {#>
<ExecuteSQL Name="Create <#=table.Name#>" ConnectionName="<#=table.Connection.Name#>">
    <DirectInput>
        <#=table.GetTableSql()#>    
    </DirectInput>
</ExecuteSQL>
<# } #>

This code iterates over the tables that are part of the current model. For each table it finds, it creates an ExecuteSQL task, and embeds the SQL to create the table in the package. The code is repeated to iterate over Dimensions and Facts, which are special classes of tables.

Notice that there are no tables defined in the BimlScript file. The BimlScript can’t operate against objects defined in the same file, which is why we created the TableDefinitions.biml file separately. To produce the package, multi-select both TableDefinitions.biml, and CreateTables.biml, right-click, and choose Expand Biml File.

image

This will produce a new SSIS package in the project named Create Tables.dtsx. It contains two Execute SQL tasks, one for each table.

image

Each task includes the appropriate SQL to create the tables. As an example, here’s the OrderHeader SQL from the Execute SQL task.

SET ANSI_NULLS ON
SET QUOTED_IDENTIFIER ON
GO

-------------------------------------------------------------------
IF EXISTS (SELECT * from sys.objects WHERE object_id = OBJECT_ID(N'[OrderHeader]') AND type IN (N'U'))
DROP TABLE [OrderHeader]
GO

CREATE TABLE [OrderHeader]
(
-- Columns Definition
 [OrderId] int IDENTITY(1,1) NOT NULL
, [SalesDate] datetime NOT NULL
, [CustomerName] nvarchar(50) NOT NULL

-- Constraints
,CONSTRAINT [OrderHeaderPK] PRIMARY KEY CLUSTERED
(
  [OrderId] Asc) WITH(PAD_INDEX = OFF,IGNORE_DUP_KEY = OFF) ON [PRIMARY]

)
ON [PRIMARY]
WITH (DATA_COMPRESSION = NONE)
GO

-------------------------------------------------------------------

Note that the tables are ordered in the package in the same order they are defined in the Biml file. If you have tables with dependencies, make sure to order them correctly.

In the next post, we’ll look at some ways to copy data dynamically using BimlScript.

]]>
Creating a Basic Package Using Biml http://agilebi.com/blog/2011/05/13/creating-a-basic-package-using-biml/ Fri, 13 May 2011 20:53:21 +0000 http://6.1452 Continue reading ]]> This article is going to walk through the process of creating a simple package using Biml and the Biml Package Generator feature in BIDS Helper. To start out, you need to install the latest beta of BIDS Helper from CodePlex. Once that is set up, you should create a new Integration Services project in BIDS. In the project, right-click on the Project in the Solution Explorer. There’s a new item in this menu – Add New Biml File.

image

Clicking Add New Biml File will add a new file to the Miscellaneous folder in the solution named BimlScript.biml. (The name is automatically generated, so it may be BimlScript1.biml, etc). You can right-click on the file and choose rename to give the file a more specific name. For this example, rename the file “BasicPackage.biml”.

Double-clicking on the file will open the XML editor inside of BIDS. The editor supports Intellisense for Biml, so typing an opening tag (“<”) will give you a list of valid options for tags you can use. (If you aren’t seeing the Intellisense, please check this link for troubleshooting steps.)

image

For this example, copy and paste the following Biml into the document. Since the code below includes the document root tags (<Biml>), you’ll want to make sure you replace the entire contents of the Biml file.

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
    <Connections>
        <Connection Name="AdventureWorks" ConnectionString="Server=.;Initial Catalog=AdventureWorks;Integrated Security=SSPI;Provider=SQLNCLI10"/>
    </Connections>
    <Packages>
        <Package Name="Biml Sample" AutoCreateConfigurationsType="None" ConstraintMode="Linear">
            <Tasks>
                <Dataflow Name="Extract Table List">
                    <Transformations>
                        <OleDbSource Name="Get Table List" ConnectionName="AdventureWorks">
                            <DirectInput>SELECT * FROM sys.tables</DirectInput>
                        </OleDbSource>
                        <Multicast Name="Multicast"/>
                    </Transformations>
                </Dataflow>
            </Tasks>
        </Package>
    </Packages>
</Biml>

The first section (<Connections>) of this Biml defines an OleDbConnection that points to the AdventureWorks database. The next section (inside the <Packages> tag) defines a single package that contains a Dataflow task (the <Dataflow> tag). The Dataflow task contains two components, an OleDb Source and an Union All transformation.

The next step is to take this definition of a package, and actually generate the package from it. To do this, right-click on the Biml file, and choose Expand Biml File from the context menu.

image

A new package will be added to the SSIS Packages folder, named Biml Sample.dtsx. If you review the generated package, you’ll see that it matches up to what was defined in the Biml code.

imageimage

That’s a quick introduction to the Biml functionality in BIDS Helper. In the next article, we’ll set the stage for some more advanced (read: more interesting) uses of Biml, including some scripting.

]]>