SSIS – Agile BI http://agilebi.com A community for sharing ideas about business intelligence development using agile methods. Mon, 11 Jan 2016 20:55:23 +0000 en-US hourly 1 https://wordpress.org/?v=6.2.2 Presentations at Atlanta’s BI Edition SQL Saturday http://agilebi.com/blog/2016/01/11/presentations-at-atlantas-bi-edition-sql-saturday/ Mon, 11 Jan 2016 20:55:23 +0000 http://6.1550 Continue reading ]]> I presented this weekend at SQL Saturday #477 in Atlanta. It was a great event, very well organized. I appreciate all the attendees at my sessions – there were some great questions and comments. I promised that I’d publish my slides and sample code, so here it is.

Getting Started with SSIS Script Tasks and Components

This session was an introduction to the scripting objects in SSIS, and how they can be used to extend the built in functionality. Download the files here.

Testing Data and Data-Centric Applications

This session was on testing data-centric applications, both during development and how you can continue validating your data in production. Download the files here.

Thanks again to eveyone who attended!

]]>
Advanced Scripting in SSIS http://agilebi.com/blog/2015/08/17/advanced-scripting-in-ssis/ Mon, 17 Aug 2015 15:51:15 +0000 http://6.1548 Continue reading ]]> Last week I presented a session on Advanced Scripting for SSIS for Pragmatic Works. Thanks to everyone who attended, and all the great questions. I’ve had a few requests for the samples I used, so I wanted to make those available. You can download them from my OneDrive.

]]>
Q & A From “Unit Tests for SSIS Packages” http://agilebi.com/blog/2013/04/08/qa-from-unit-tests-for-ssis-packages/ Mon, 08 Apr 2013 20:26:14 +0000 http://6.1531 Continue reading ]]> I did a webinar this week for Pragmatic Works’ “Free Training on the T’s”. The topic was “Unit Tests for SSIS Packages”. If you attended, thanks for taking the time! If not, the recording is now available at the link above. You can also download the slides from my SkyDrive, and ssisUnit can be downloaded from CodePlex.

We had great turnout for the session, and a lot of great questions. I didn’t have time to address all of them during the webinar, and I had a number of requests to share my answers with all the attendees, so I thought I’d write up a blog post on the questions that I didn’t get a chance to answer.

Testing Practices

“For test drive development, should a “final test” (of sorts) be performed after the “refactor” phase to ensure that the act of refactoring didn’t negatively alter the code?”

Yes, absolutely. You want to run your tests after any code changes. This helps ensure that code continues working as expected, and that you don’t have hidden “side effects” associated with your changes.

“How would you unit test a package that is performing transforms in the source query?”

I would create a known set of data in a test version of the source database. I would then create a unit test for the data flow task that checks that the output includes the data with the proper transformations. This will get much easier in a future release, when it will support data flow component testing in the unit test harness, allowing you to test the results of a source component directly.

“What is the best way to incorporate SSIS testing on existing packages to automate..as this also needs requirement breakdown?”

I approach this in the same way that I approach adding tests to an existing .NET application. Start by identifying a subset of existing packages  to work with, and add tests to those. Once those have adequate coverage, move on to the next set. As far as requirements breakdown – yes, you do need to understand what the package is supposed to do in order to test it properly. You can add some simple test cases without a great deal of analysis (for example, did the task execute successfully?) but to get real value out of the tests, they do need to check that the task carries out the requirements as intended.

There is the option of generating unit tests automatically for existing packages via the ssisUnit API. While this can improve code coverage, I would caution you not to rely on it to verify real functionality.

“Sorry I missed the first part of the talk. so not sure if you already talked about this. But how do you recommend creating test data for dimensions if you need to use synthetic data? Are there any tools you recommend using?”

Remember that unit and integration testing is more about verifying functionality, and less about performance. So I like to create a small amount of handcrafted test data that hits the specific scenarios that I need to validate. I find that most of the tools out there for generating data tend to work well if I want large volumes of test data, but not so well for concrete examples. For example, when I need to validate that the package handles a Type 2 SCD update that affects multiple rows correctly, I need 3 to 5 rows of test data that are very specific. Data generators don’t do that very well.

Automating ssisUnit in Builds

“Can we automate the ssisUnit with build & deployments in TFS?” and “How would you enable automated testing with ssisUnit for Continuous Integration?”

The simplest way to to incorporate ssisUnit into your builds is by calling the command line test execution tool. It’s called ssisUnitTestRunner2008.exe (substitute the appropriate version number for your version of SSIS) and you can find it in the folder where you installed ssisUnit.

Another approach is to use the ssisUnit API, but this requires some level of .NET or PowerShell coding.

Compatibility

“Is ssisUnit backward compatible with VS2008?” and “Is this framework compatible with MS SQL Server 2012?”

It does work against 2008 and 2012. You can download the 2008 version directly from the CodePlex site. For 2012, you need to download and compile the source currently, which can be done with the free Express version of Visual Studio. The next release will have separate setups for each version.

“Does the SSIS unit testing work with evaluation edition of BIDS 2008 R2?”

Yes, it does.

“Will this connect with the Microsoft Parallel Data Warehouse?”

I have not tested this myself. However, because it uses standard OLEDB and ADO.NET connection technology for database access, I don’t see any reason why it wouldn’t work.

“I am a SQL 2008 R2 user.  What version of VS is John running?  What were those testing menu options?”

I was running SQL 2012 for the demo. The menu option for running ssisUnit was created using the External Tools menu option under the Tools menu in VS.

Setting Up and Using Tests

“I’m lost.  How did you set up the tests using the GUI and then link it to the SSIS package?  Did i miss that?”

During the demo, I showed a prebuilt ssisUnit test, that I opened in the ssisUnit UI tool. The unit tests are linked to packages through Package Reference objects, which basically refer to a package by it’s file path, or location in SQL Server.

“Can you use a different operator then equal to? Like greater than, less than, different than?”

Yes, the Asserts in ssisUnit can use expressions. These are C#-based expressions that evaluate to True or False. You can use an expression like “((DateTime)result).Date==DateTime.Now.Date” to check that the result of the Assert command is equal to today’s date. “result”, in the expression, represents the object returned by the command associated with the Assert. You can apply pretty much any C# / .NET operation to the result object. For more examples, check out this page.

“When performing SCD operations, the details of the row need to be inspected rather than just count of rows. How would you test this? Within ssisUnit or another tool?”

Well, I’d use ssisUnit, but I’m not exactly unbiased Smile. Currently, you can do this using the SQL Command, which enables you to retrieve specific row details from the database. In the future, you will be able to do this more directly by testing it in the data flow.

“Did you say the file command can accommodate various file types such as xml, csv, pipe delimited, etc.?”

I didn’t say that (I don’t think), but it will work with pretty much any file. Internally, it uses standard .NET file operations, so it’s very similar to the capabilities of the File System task in SSIS. It doesn’t actually care what format the file is in, it can copy, move, and delete it regardless. For line counts, it counts the number of carriage return / line feeds in the file.

“Do you have a guide on how to setup ssisUnit with SSIS? We have tried using this before, but we couldn’t successfully run the test.”

I’m sorry to hear you had problems with it. The Getting Started guide is admittedly light on content, and I really need to provide a more detailed walkthrough. Please vote for this here if you would like this prioritized over the other changes in the queue.

“I noticed in the code that ssisUnit doesn’t handle password protected packages. When will this be supported?”

I’ve added an issue to track this here. Please vote for it if you use password protected packages, as it will help me prioritize how quickly to address it.

“If possible, can you demo if a container can be executed? Especially a For loop or For Each loop?”

I didn’t have time to demo this during the presentation. Good thing too, because there was an error around handling of containers. This has now been fixed in the source code version of the project.

“When testing in a 64bit environment, is there a specific way to execute ssisUnit when data sources are 32bit?”

Currently, this requires that you compile a 32bit version of the source. In the next release, I will provide both 32-bit and 64-bit versions of the test execution utility.

“Will the new version of ssisUnit for 2012 actually include some bug fixes? We’ve tried the current ssisUnit and it’s pretty buggy.” and “Not a question, just feedback: So there are over 30 bug reports and enhancement requests on CodePlex (including UI bugs, missing parts like handling complex Data Flow logic unit testing, etc.) posted since the posting date in 2008. If you address some of these (particularly the UI bugs) in a new release we might try it again. The biggest lack my organization found is that there’s no support for Data Flow tasks, which are 80% or more of our ETL testing. Just some feedback to keep in mind if you make any updates in the future. I’ll gather any additional bugs that we found in our evaluation back in 2011 and add them to the site when I have time.”

Again, sorry that you have had difficulties with it, and I’d definitely appreciate any feedback you can add to the site. For data flow testing, I do use that fairly successfully today, and a number of other users do as well. Admittedly, component level testing for the data flow would be nice (and it is being worked on) but I’m curious about what is blocking you from using it today. If you can submit your issues to the site, I will look at how to address them.

As far as the number of outstanding requests and UI bugs, well. unfortunately, I don’t get paid for working on open source. So focus on this project tends to take a backseat to demands from paying work. That being said, I do want to address as many of the issues as possible, but I have to prioritize my time pretty heavily these days. If there are items that particularly annoy you, please vote for them, as I do use the votes to determine what I should work on. For the UI, I really don’t enjoy UI work (nor, as evidenced by the current GUI, is it my strongest skill as a developer), so you are unlikely to see any significant updates on that front on the open source project. However, Pragmatic Works has taken an interest in unit testing as of late, and we are investigating offering an enhanced UI that’s integrated with Visual Studio as part of our BI xPress product.

“This makes sense to me, but I don’t understand how I setup a package to match up to the unit test.”

Each test case can reference tasks by name or GUID. GUID is more accurate, as names can conflict if you have multiple tasks with the same name in different containers.

“In package 1, you were getting a table count as a task and then writing a unit test to check the results of that test.  Are you returning the result of the SELECT to a variable and then checking the value stored in the variable or are you directly testing the return value from the execute SQL task?”

The Execute SQL task in that test was designed to store the results of the SQL statement in a variable, so the test is written to check the value of the variable (using a VariableCommand) after the task executes. If you need to get a value directly from the database as part of the test, you can use a SqlCommand.

“Do unit tests always fall under Miscellaneous?”

Yes – the SSIS project structure doesn’t allow for custom folders.

“Would you have a more complex example for the test than the result of a table count?”

The ProductSample test on the website illustrates a few other test scenarios. If there are additional scenarios that you’d like to see examples for, please add an issue or discussion item on the site.

Miscellaneous

“Where I can get some xml to parse my .dtsx packages and only extract all the SQL code and associated SSIS task name?”

I’m not really sure how this relates, honestly. However, to do this, you’d need to either write a fairly complex XSLT transform, or use the SSIS API. Unfortunately, I don’t know of any public example code that illustrates this.

Sharing the Presentation

Many, many variations of “Can I get the slides?”

I mentioned this at the top of the post, but just in case: The recording of the webinar is available here: “Unit Tests for SSIS Packages”. You can  download the slides from my SkyDrive, and ssisUnit can be downloaded from CodePlex.

Thanks again to everyone who attended, and thanks for all the great questions.

]]>
Where’s John These Days? http://agilebi.com/blog/2012/02/20/wheres-john-these-days/ Tue, 21 Feb 2012 02:47:27 +0000 http://6.1524 Continue reading ]]> Apologies for the lack of updates to the blog recently. It’s been a very busy time, but hopefully things will settle down a bit now.

Exciting news today (for me at least)! It was my first day as a Pragmatic Works employee. I’ve joined their product group, and will be helping manage the development of their BI tools. As I’ve commented on this blog before, one of the things I ‘m really passionate about is enabling BI developers to create solutions faster and more easily, and I’m looking forward to the opportunities that Pragmatic Works presents to continue doing exactly that. I also get to work with a great group of developers and some really sharp BI people, so it promises to be a lot of fun.

My excitement is tempered somewhat by sadness at leaving another great group of developers at Varigence. I enjoyed working with everyone there, and wish them success in their future endeavors.

In other news, I have a number of presentations coming up. I’ll be at SQLBits in London on March the 29th, presenting a precon with Matt Masson on SSIS Performance Design Patterns (space is limited, register now!). I also have a session on SSIS Unit Testing at SQLBits.

On April 14th, I’ll be presenting at SQL Saturday #111 in Atlanta, which is always a great time. I’ll be presenting on Tuning SSAS Processing Performance

Last, but definitely not least, I was thrilled to find out that I’ll be presenting the Tuning SSAS Processing Performance session at SQL Rally in Dallas on May 10-11 as well. Please vote for one of my other sessions in the community choice options, if you see one that appeals to you. I’m really looking forward to seeing some of my friends from Texas again.

]]>
Checking Whether a Character is Numeric http://agilebi.com/blog/2011/08/30/checking-whether-a-character-is-numeric/ Tue, 30 Aug 2011 15:45:00 +0000 http://6.1517 Continue reading ]]> In SSIS, checking whether a value is numeric has always been a bit painful. Unfortunately, the SSIS expression language doesn’t have an IsNumeric function. Jamie Thomson (blog | twitter) has a nice post here that explains one way to do this check in a script component.

However, there is a shortcut you can take if you only need to check a specific character to see if it’s numeric. You can use the following expression:

FINDSTRING("0123456789", [CharacterToCheck], 1) != 0

If this returns True, the character is numeric. False means it’s not numeric.

This works by checking for the existence of the character in the string that contains all the numeric characters. If it finds an occurrence, then the character must be numeric. Obviously, this has some limitations over a real IsNumeric function, but if you just need to check single characters, it’s a lot faster than writing a script.

]]>
Naming Columns for the Script Component http://agilebi.com/blog/2011/08/18/naming-columns-for-script-component/ Thu, 18 Aug 2011 14:00:00 +0000 http://6.1514 Continue reading ]]> Do you use whitespace or special characters in your column names? Most people don’t, because of the additional headaches it creates. You have to delimit the column names, come up with a work-around for tools that don’t support column names with special characters, etc. Underscores, however, are used pretty extensively in place of spaces. If you are using Script Components in SSIS, though, you may encounter an all-new headache with special characters or even underscores in your column names.

When you use a script component in SSIS, it generates some .NET code for you automatically, based on the metadata in the pipeline connected to the script component. However, when this code is generated, SSIS strips out any whitespace or special characters from the names of inputs, outputs, and columns. It only retains the letters and numbers (alphanumeric characters) in these names.

Here’s some examples of column name issues that I’ve run into with scripts (and while these specific items are made up, they represent real-world scenarios I’ve encountered – there’s some really horrible naming approaches out there):

Original Column Name Script Column Name
Account Account
Account# Account
Account Number AccountNumber
Account_Number AccountNumber
TI_TXN_ID TITXNID
TI_TXNID TITXNID

 

As you can see, once the alphanumeric characters have been stripped from these column names, they are no longer unique. That can pose a few problems in your script code. What’s worse, because this code is auto-generated by SSIS, you can’t fix it without changing the column names in the data flow, even though this is really purely a script thing (and not even a .NET limitation – underscores are perfectly valid in .NET naming). What’s even worse than that – you don’t get an error till the binary code is recompiled.

So, if you are working with script components, make sure all your column names are unique even when all non-alphanumeric characters have been stripped from them. The same thing applies to your output names – they must be unique based only on the alphanumeric characters.

]]>
Using OLE DB Connections from Script Tasks http://agilebi.com/blog/2011/08/17/oledb-connections-in-script-tasks/ Wed, 17 Aug 2011 14:00:00 +0000 http://6.1513 Continue reading ]]> I write scripts on a pretty regular basic, and often need to access database connections from them. It’s pretty easy to do this if you are using an ADO.NET connection. However, if you are using OLE DB, you have to go through a couple of additional steps to convert the connection to an ADO.NET version. Matt Masson posted a great code sample for doing this conversion.

I use a slightly altered version of this code pretty regularly. It’s been modified to support both OLE DB and ADO.NET connections, so that I can switch connections without having to change the script code.

To use it, you need to add a reference in your script project to Microsoft.SqlServer.DTSRuntimeWrap. Then, add the following to the usings section at the top of the script:

[sourcecode language="csharp"]
using System.Data.Common;
using Wrap = Microsoft.SqlServer.Dts.Runtime.Wrapper;
[/sourcecode]

For the code to get the connection, use the following:

[sourcecode language="csharp" padlinenumbers="true"]
ConnectionManager cm = Dts.Connections["MyConnection"];
DbConnection conn = null;
if (cm.CreationName == "OLEDB")
{
	Wrap.IDTSConnectionManagerDatabaseParameters100 cmParams =
	cm.InnerObject as Wrap.IDTSConnectionManagerDatabaseParameters100;
	conn = cmParams.GetConnectionForSchema() as DbConnection;
}
else
{
	conn = cm.AcquireConnection(null) as DbConnection;
}

if (conn.State == ConnectionState.Closed)
{
	conn.Open();
}

// TODO: Add your code here

conn.Close();
Dts.TaskResult = (int)ScriptResults.Success;
[/sourcecode]

You can use the “conn” object to perform actions against the connection. Since it’s using the common DBConnection interface, you can use it against any database connection that you have an ADO.NET provider for (which includes OLE DB providers).

]]>
Delivering a Pre-Con at SQLSaturday #89 (#sqlsat89) http://agilebi.com/blog/2011/06/27/ssis-data-warehouse-deep-dive-sqlsat89/ Mon, 27 Jun 2011 15:16:36 +0000 http://6.1508 Continue reading ]]> I’ll be delivering an all-day deep dive into using SSIS for data warehouse ETL processes the day before SQLSaturday #89, on Friday, September 16th. We’ll be taking an in-depth tour of implementing data warehouse extract, transform, and load processes with SSIS, with plenty of demonstrations and sample code. If you’ve ever wondered about how to handle data errors during your ETL, how to handle updates to large fact tables, or how to load a dimension table that combines type 1, 2, and 3 attributes, then come to this pre-con. We’ll cover all of that, plus a lot more. We have a reduced rate on the pre-con until July 1st, so now’s a great time to register.

Pre-cons like this are some of the most cost effective training you can get – plenty of time to both cover a topic from end to end, and to dive into the real implementation details that are often missing from shorter presentations because of the time constraints. I hope to see you there!

Data Warehousing with SSIS Deep Dive

Want to learn more about implementing data warehouse ETL with SQL Server Integration Services? Attend this full day seminar, and we’ll cover using SSIS for data warehousing in-depth. You’ll learn everything you need to know to populate your data warehouse with data. We’ll cover how to develop a common framework for your packages, automate the creation of rote packages for staging data, implement common patterns for handling various types of dimensions and fact tables, and how to instrument your packages to identify and recover from failures when loading data. We’ll be using the AdventureWorks databases for the examples, so bring along a laptop configured with SQL Server 2005 or later, and the AdventureWorks sample databases installed. We’ll also cover how the upcoming Denali release of SQL Server affects what we discuss in this seminar.

  • Laying out a framework for your ETL
    • Logging
    • Restartability and Recoverability
    • Auditing
  • Handling Dimensions
    • SCD Type 1
    • SCD Type 2
    • Advanced Dimension Types
  • Handling Facts
    • Transactional
    • Periodic Snapshot
    • Accumulating Snapshot
    • Advanced Fact Patterns
  • Errors
    • Handling Processing Errors
    • Handling Data Errors
    • Recovering from Errors
  • Best Practices for Managing Your ETL
]]>
Defining the Data Flow in Biml http://agilebi.com/blog/2011/06/15/defining-the-data-flow-in-biml/ Wed, 15 Jun 2011 16:36:28 +0000 http://6.1504 Continue reading ]]> This post is part 5 of a series on using Biml in BIDS Helper. This post builds on some of the information and the sample from the previous posts.

In the previous post in the series, I talked about controlling the order of execution in the control flow. In this post, the focus will be on the dataflow, and controlling how the data in the pipeline flows from one component to the next. This post uses a new table as the target of the data flow, so you may want to review Part 2: Creating Tables using Biml and BimlScript to see how to create the table locally. The Biml below describes the table. You can create it in the database of your choice – I used a database named Target.

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
    <Connections>
        <OleDbConnection Name="Source" ConnectionString="Provider=SQLNCLI10;Server=.;Initial Catalog=AdventureWorksDW2008R2;Integrated Security=SSPI;"/>
        <OleDbConnection Name="Target" ConnectionString="Provider=SQLNCLI10;Server=.;Initial Catalog=Target;Integrated Security=SSPI;"/>
    </Connections>
    <Tables>
        <Table Name="DimAccount_Test" ConnectionName="Target">
            <Columns>
                <Column Name="AccountKey" />
                <Column Name="ParentAccountKey" IsNullable="true" />
                <Column Name="AccountCodeAlternateKey" IsNullable="true" />
                <Column Name="ParentAccountCodeAlternateKey" IsNullable="true" />
                <Column Name="AccountDescription" DataType="String" Length="50" IsNullable="true" />
                <Column Name="AccountType" DataType="String" Length="50" IsNullable="true" />
                <Column Name="Operator" DataType="String" Length="50" IsNullable="true" />
                <Column Name="CustomMembers" DataType="String" Length="300" IsNullable="true" />
                <Column Name="ValueType" DataType="String" Length="50" IsNullable="true" />
                <Column Name="CustomMemberOptions" DataType="String" Length="200" IsNullable="true" />
            </Columns>
        </Table>
    </Tables>
</Biml>

With the table created, we can move on to the interesting part – transforming the data. In a simple, straightforward data flow, the Biml compiler will do most of the work for you. Take this data flow as an example:

<Dataflow Name="Dataflow 1">
    <Transformations>
        <OleDbSource Name="Source" ConnectionName="Source">
            <DirectInput>SELECT * FROM dbo.DimAccount</DirectInput>
        </OleDbSource>
        <OleDbDestination Name="Target" ConnectionName="Target">
            <ExternalTableOutput Table="dbo.DimAccount_Test"/>
        </OleDbDestination>
    </Transformations>
</Dataflow>

In this case, you don’t have to specify any data paths. The Biml compiler will infer that the OleDbSource’s output should be connected to the input of the OleDbDestination.

image

The compiler is able to do this by using default outputs. In Biml, most components have a default output defined. In the absence of other information, the compiler will automatically connect the default output of a transformation to the input of the next component defined in the Biml. So, if we use a slightly more complex data flow, like this:

<Dataflow Name="Dataflow 2">
    <Transformations>
        <OleDbSource Name="Source" ConnectionName="Source">
            <DirectInput>SELECT * FROM dbo.DimAccount</DirectInput>
        </OleDbSource>
        <Lookup Name="Check For Existing" OleDbConnectionName="Target" NoMatchBehavior="RedirectRowsToNoMatchOutput">
            <DirectInput>SELECT AccountKey FROM dbo.DimAccount</DirectInput>
            <Inputs>
                <Column SourceColumn="AccountKey" TargetColumn="AccountKey"/>
            </Inputs>
        </Lookup>
        <ConditionalSplit Name="Test ID Range">
            <OutputPaths>
                <OutputPath Name="High ID">
                    <Expression>AccountKey >= 100</Expression>
                </OutputPath>
            </OutputPaths>
        </ConditionalSplit>
        <OleDbDestination Name="Target" ConnectionName="Target">
            <ExternalTableOutput Table="dbo.DimAccount_Test"/>
        </OleDbDestination>
    </Transformations>
</Dataflow>

We end up with a data flow that still automatically connects data paths between components. In this case, though, it’s probably not doing exactly what we want, since it’s just connecting the default outputs. The Conditional Split (“Test ID Range”) in this example is connected by the default output, but we want to use the”High ID” output to filter out IDs less than 100. In the case of the Lookup (“Check For Existing”), the default output being used is the “Match” output, but we only want the non-matched records, so that only new rows are inserted.

image

I explicitly choose the option BIDS to display the path Source Names for this screenshot – by default, they aren’t displayed in the generated package. You can change the setting in BIDS by selecting the path, opening the Properties tool window, and changing the PathAnnotation property to SourceName.

So how would we change the Biml to get the desired results? If we add an InputPath element to the appropriate components, we can control which output is tied to the component’s input. In this case, we need to add explicit InputPath instructions to the Conditional Split (that will reference the Lookup’s NoMatch output) and to the OleDbDestination (which will reference the ConditionalSplit’s High ID output).

<Dataflow Name="Dataflow 3">
    <Transformations>
        <OleDbSource Name="Source" ConnectionName="Source">
            <DirectInput>SELECT * FROM dbo.DimAccount</DirectInput>
        </OleDbSource>
        <Lookup Name="Check For Existing" OleDbConnectionName="Target" NoMatchBehavior="RedirectRowsToNoMatchOutput">
            <DirectInput>SELECT AccountKey FROM dbo.DimAccount</DirectInput>
            <Inputs>
                <Column SourceColumn="AccountKey" TargetColumn="AccountKey"/>
            </Inputs>
        </Lookup>
        <ConditionalSplit Name="Test ID Range">
            <InputPath OutputPathName="Check For Existing.NoMatch"/>
            <OutputPaths>
                <OutputPath Name="High ID">
                    <Expression>AccountKey >= 100</Expression>
                </OutputPath>
            </OutputPaths>
        </ConditionalSplit>
        <OleDbDestination Name="Target" ConnectionName="Target">
            <InputPath OutputPathName="Test ID Range.High ID"/>
            <ExternalTableOutput Table="dbo.DimAccount_Test"/>
        </OleDbDestination>
    </Transformations>
</Dataflow>

This gives you the following data flow.

image

That’s a few examples of controlling the data paths in a data flow. There are a few other bits of information that are important to know about data paths in the data flow.

  • Most components have a default output named “Output”, and a second output named “Error” for the error output (if the component supports errors).
  • The Multicast component has no default output, so you always need to explicitly define the data path mapping from it to the next component.
  • The Union All, Merge, and Merge Join components need to be explicitly mapped, since they support multiple inputs.
  • The Slowly Changing Dimension (SCD) transformation has multiple outputs. The “New” output is the default. There are also outputs named “Unchanged”, “FixedAttribute”, “ChangingAttribute”, “HistoricalAttribute”, and “InferredMember”.
  • The Percentage Sampling and Row Sampling transformations have two output named “Selected” (the default) and “Unselected”.

The sample Biml for this post is on my SkyDrive. Please download it and try it out with the latest release of BIDS Helper.

]]>
Controlling the Control Flow in Biml http://agilebi.com/blog/2011/06/13/controlling-the-control-flow-in-biml/ Mon, 13 Jun 2011 19:01:23 +0000 http://6.1495 Continue reading ]]> This post is part 4 of a series on using Biml in BIDS Helper. This post builds on some of the information and the sample from the previous posts.

So far, we’ve looked at some relatively simple packages, in terms of their flow. In this post, we’re going to look at how to handle more complex control flow in Biml.

One feature of Biml is the ConstraintMode property that’s part of packages and containers. This property controls how precedence constraints are generated in the control flow. In the simple case, if you want all tasks to be run in sequence, you can set the ConstraintMode to Linear. This causes the package to be produced with all tasks connected sequentially by Success precedence constraints, in the order they were specified in the Biml. So, the following Biml:

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
    <Packages>
        <Package Name="Control Flow Sample 1" AutoCreateConfigurationsType="None" ConstraintMode="Linear">
            <Tasks>
                <Dataflow Name="Task 1"/>
                <Dataflow Name="Task 2"/>
                <Dataflow Name="Task 3"/>
            </Tasks>
        </Package>
    </Packages>
</Biml>

results in a package that looks like this:

image

This type of linear flow is great in some situations, but sometimes you need more control. In those cases, you can change the ConstraintMode to Parallel. The generated package will not have any automatically created precedence constraints, so it will look like this:

image

Once a container is in Parallel constraint mode, you can start adding explicit precedence constraints. Let’s use an example to highlight this. Imagine you have a package that needs to run three data flows. (I’m using data flows for the example because they are simple to read in Biml, and I want the focus to be on the constraints, not the tasks.) I want one data flow to execute first – this data flow will be named “Me First”. If “Me First” succeeds, the data flow named “Me Next (Success)” should execute. The third data flow, named “I’m Last (Always)” should always be executed, regardless of the success or failure of the other two tasks. So my package should look like this:

image

So how do we get this output from Biml? We can use the PrecedenceConstraints collection on each task. At it’s simplest, you just add an Input to the collection, and reference the output of the task that should execute prior to this one. In Biml, all tasks have a built-in output named Output. You can reference it using TaskName.Output (“Me First.Output” in the example below). This will create a regular, Success constraint between the tasks.

<Dataflow Name="Me Next (Success)">
    <PrecedenceConstraints>
        <Inputs>
            <Input OutputPathName="Me First.Output"/>
        </Inputs>
    </PrecedenceConstraints>
</Dataflow>

For the next set of constraints, we want to use the OR logical type, using the LogicalType property, for the constraints, since either of them should cause the third task to run. We also need to explicitly set the evaluation value on these, using the EvaluationValue property.

<Dataflow Name="I'm Last (Always)">
    <PrecedenceConstraints LogicalType="Or">
        <Inputs>
            <Input OutputPathName="Me First.Output" EvaluationValue="Failure"/>
            <Input OutputPathName="Me Next (Success).Output" EvaluationValue="Completion"/>
        </Inputs>
    </PrecedenceConstraints>
</Dataflow>

You can also add expression constraints to the Inputs, to control whether tasks run based on the results on an expression. You use the EvaluationOperation and Expression properties to configure that.

<Package Name="Control Flow Sample 3" AutoCreateConfigurationsType="None" ConstraintMode="Parallel">
    <Variables>
        <Variable Name="Continue" DataType="Int32">0</Variable>
    </Variables>
    <Tasks>
        <Dataflow Name="Task 1"/>
        <Dataflow Name="Task 2">
            <PrecedenceConstraints>
                <Inputs>
                    <Input OutputPathName="Task 1.Output" EvaluationOperation="Expression" Expression="@Continue==1"/>
                </Inputs>
            </PrecedenceConstraints>
        </Dataflow>
    </Tasks>
</Package>

That Biml results in a package that looks like this.

image

That’s how to control the precedence constraints. I’ve uploaded the Biml from this post to my SkyDrive here, so you can download and experiment with this yourself. In the next post, we’ll look at controlling the data paths in a data flow.

]]>