Biml Functionality in BIDS Helper

If you follow along with the BIDS Helper project on CodePlex, you may have noticed that we released a new beta version on April 20th. The new build includes some bug fixes, and two new features.

Duplicate Role (SSAS)

The Duplicate Role feature allows you to create a  copy of an existing SSAS role, along with all of it’s settings and permissions. Performing a regular copy and paste on a Role in SSAS will only copy the role membership, not all the permissions. The new feature will copy the role itself, and also copy the permissions associated with the original role. I’ve found this pretty useful over the past few months.

Biml Package Generator (SSIS)

The Biml Package Generator feature is one I’m pretty excited about, as it leverages some of the things I’ve been working on at Varigence over the past year. This feature enables you to use BI Markup Language (Biml) to generate SSIS packages. This is nice for a couple of reasons. First, Biml is a XML based language with a readable syntax, and for a lot of packages, I find it easier to just type the Biml to create the package, rather than using the designers. Second, the Biml compiler includes the ability to script the generation of Biml code, in a similar way to how you can use ASP.NET to script the production of HTML code. Using that, I can easily generate large numbers of packages very easily.

I’m starting a series of posts that will illustrate how to use the Biml functionality in BIDS Helper. This post will serve as an index page to locate these, and they will also be posted to the BIDS Helper documentation and the  Varigence blog.

Part 1: Creating a Basic Package Using Biml
Part 2: Creating Tables using Biml and BimlScript
Part 3: Copy Data Dynamically with BimlScript
Part 4: Controlling the Control Flow with Biml
Part 5: Defining the Data Flow in Biml


As always, if you have questions, let me know in the comments.

Creating a Basic Package Using Biml

This article is going to walk through the process of creating a simple package using Biml and the Biml Package Generator feature in BIDS Helper. To start out, you need to install the latest beta of BIDS Helper from CodePlex. Once that is set up, you should create a new Integration Services project in BIDS. In the project, right-click on the Project in the Solution Explorer. There’s a new item in this menu – Add New Biml File.

image

Clicking Add New Biml File will add a new file to the Miscellaneous folder in the solution named BimlScript.biml. (The name is automatically generated, so it may be BimlScript1.biml, etc). You can right-click on the file and choose rename to give the file a more specific name. For this example, rename the file “BasicPackage.biml”.

Double-clicking on the file will open the XML editor inside of BIDS. The editor supports Intellisense for Biml, so typing an opening tag (“<”) will give you a list of valid options for tags you can use. (If you aren’t seeing the Intellisense, please check this link for troubleshooting steps.)

image

For this example, copy and paste the following Biml into the document. Since the code below includes the document root tags (<Biml>), you’ll want to make sure you replace the entire contents of the Biml file.

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
    <Connections>
        <Connection Name="AdventureWorks" ConnectionString="Server=.;Initial Catalog=AdventureWorks;Integrated Security=SSPI;Provider=SQLNCLI10"/>
    </Connections>
    <Packages>
        <Package Name="Biml Sample" AutoCreateConfigurationsType="None" ConstraintMode="Linear">
            <Tasks>
                <Dataflow Name="Extract Table List">
                    <Transformations>
                        <OleDbSource Name="Get Table List" ConnectionName="AdventureWorks">
                            <DirectInput>SELECT * FROM sys.tables</DirectInput>
                        </OleDbSource>
                        <Multicast Name="Multicast"/>
                    </Transformations>
                </Dataflow>
            </Tasks>
        </Package>
    </Packages>
</Biml>

The first section (<Connections>) of this Biml defines an OleDbConnection that points to the AdventureWorks database. The next section (inside the <Packages> tag) defines a single package that contains a Dataflow task (the <Dataflow> tag). The Dataflow task contains two components, an OleDb Source and an Union All transformation.

The next step is to take this definition of a package, and actually generate the package from it. To do this, right-click on the Biml file, and choose Expand Biml File from the context menu.

image

A new package will be added to the SSIS Packages folder, named Biml Sample.dtsx. If you review the generated package, you’ll see that it matches up to what was defined in the Biml code.

imageimage

That’s a quick introduction to the Biml functionality in BIDS Helper. In the next article, we’ll set the stage for some more advanced (read: more interesting) uses of Biml, including some scripting.

Presenting at Carolina Code Camp 2011

I’m going to be presenting a couple of sessions at the Carolina Code Camp this weekend. The Carolina Code Camp is always a good time – well organized, educational, and fun. I’m looking forward to it.

Do More (ETL) with Less (Effort) – Automating SSIS
SSIS is a great tool for transferring data from one data source to another, and for implementing complex ETL processes. However, for simple, straightforward data transfer tasks or packages that adhere to a pattern, creating SSIS packages by hand can be time-consuming and repetitious. By attending this session, you’ll learn how to automate package creation in SSIS, including the dynamic generation of data flows. We’ll cover some of the free and open source tools available for this, and discuss “roll your own” options.

 

Do You Know the Data Flow?
The Data Flow task is one of the most powerful and most complex tools available in SSIS. Whether you are brand new to SSIS, or you’ve been using it for a while, it’s likely you’ve had some questions about the Data Flow. Why are some components so much slower than others? Why can’t I store a value (like a row count) in one component, and use it in another component later in the Data Flow? And why does it always seem to be the part of my package that fails when I run it against real data? Well, you’re not alone. During this session, we’ll answer these questions (and many others) by learning how the Data Flow operates internally. After attending this session, you’ll know a lot more about getting the most out of Data Flows in SSIS.

Tuning SSAS Processing Performance

Thanks to all those that attended either the webcast of this that I did for the PASS BI virtual chapter, or the presentation at SQLSaturday #74 in Jacksonville this weekend. I really appreciate all the great feedback that I’ve received from the attendees at both events.

I’ve attached the slides from the presentation. I’ve also attached a few definition files, one for a Performance Monitor data collector set, and two trace templates, one for SSAS and one for SQL Server. Feel free to customize these files as necessary for your own tuning efforts. Also, while these trace templates can be used in Profiler, the best thing to do is to generate the scripts from them to run the trace directly on the server – lower overhead and less impact on performance.

The file is located on my SkyDrive.

The Mentoring Experiment Applications

Andy Warren (blog | twitter) and Steve Jones (blog | twitter) have started another great initiative in the SQL Server community – The Mentoring Experiment. If you haven’t heard about it yet, I highly recommend reading up on it. The post is mainly to remind people that the deadline for applications is April 30th – so if you are interested and haven’t submitted one yet, hurry up!

Presenting at SQLSaturday #67–Chicago This Weekend

I’ll be presenting on “Do More (ETL) With Less (Effort) – Automating SSIS” this weekend at the Chicago SQLSaturday. I’m looking forward to it – there’s a lot of great speakers on the schedule. Since I have an early time slot at this one, I should get the chance to relax and enjoy some of the other presentations.

The abstract for my presentation is below. If you happen to be attending the event, please stop by.

Do More (ETL) With Less (Effort) – Automating SSIS

SSIS is a great tool for transferring data from one data source to another, and for implementing complex ETL processes. However, for simple, straightforward data transfer tasks, creating SSIS packages by hand can be time-consuming and repetitious. By attending this session, you’ll learn how to automate package creation in SSIS, including the dynamic generation of data flows. We’ll cover some of the free and open source tools available for this, and discuss “roll your own” options.

Handling Advanced Data Warehouse Scenarios in SSIS

For everyone that attended my presentation of “Handling Advanced Data Warehouse Scenarios in SSIS”, thank you. I’ve posted the slides, SSIS packages, and database script from the presentation on my SkyDrive. You are welcome to download them and take a look at the inner workings. A note of caution, though: please test these in your environment and with your data before applying the patterns everywhere. As I pointed out during the presentation, I had no foreign keys on the data warehouse I was using as a sample, and several of the examples used the technique of generating your own keys in the data flow. If you are using foreign keys or identity / sequence keys, you will need to make some adjustments to the patterns.

The presentation covers the loading of Late Arriving Dimensions, Parent-Child dimensions, Type 3 dimensions, and Type 6 dimensions. Over the next couple of weeks, I’ll be producing a series of blog posts detailing the patterns from the presentation in more detail. I’ll also cover a couple of bridge table patterns which I couldn’t squeeze into the presentation.

Presenting at SQLSaturday #70

I’ll be doing two presentations at the SQLSaturday #70 (#SQLSat70) in Columbia, SC this weekend (Saturday, 3/19). Really looking forward to this event – the last Columbia SQLSaturday was a blast, and this one promises more of the same. There are a number of excellent speakers presenting, so there will be plenty of sessions to learn from.

Varigence will have a table at the event. If you are working in Microsoft BI, I encourage you to stop by and take a look at what we are doing – it’s very cool.

Do More (ETL) with Less (Effort) – Automating SSIS

SSIS is a great tool for transferring data from one data source to another, and for implementing complex ETL processes. However, for simple, straightforward data transfer tasks, creating SSIS packages by hand can be time-consuming and repetitious. By attending this session, you’ll learn how to automate package creation in SSIS, including the dynamic generation of data flows. We’ll cover some of the free and open source tools available for this, and discuss “roll your own” options.

***If all goes well, you may even get a sneak preview of some exciting new BIDS Helper functionality at this session ***

Handling Advanced Data Warehouse Scenarios in SSIS

So you’ve used SSIS to populate a simple star schema data mart, and everybody’s happy. But now you have new requirements that require more advanced data warehouse approaches, like late arriving dimensions, bridge tables, parent child dimensions, and Type 3 or Type 6 slowly changing dimensions (SCD). How do you handle those in a scalable, efficient way in SSIS? This session will present some common patterns for handling these scenarios. You’ll learn when to use each advanced approach and the pros and cons associated with each pattern. You will learn how to implement these patterns in SSIS, and how to tune them for high performance.

T-SQL Tuesday #15: Automating XML Updates w/ PowerShell

This month’s T-SQL Tuesday is being hosted by Pat Wright (blog | twitter). He selected the topic of automation with T-SQL or PowerShell. As it happened, I’d just had a scenario come up that seemed appropriate for a little PowerShell, so I went with it for this post.

The Varigence toolset is based on a language called BIML (Business Intelligence Markup Language), which is an XML-based language for creating BI solutions. We needed to make some mass updates to some of our sample files, but there were quite a few individual ones to update. I already had an XSLT transform that would update each file individually, so all I needed was a quick way to run it on each file in a directory.

I’m pretty new to PowerShell, but it seemed like a good fit for this. So, after a little trial and error, I came up with the following script:

$path = "C:\temp\xml\*.xml"
get-childItem $path | rename-item -newname { $_.name -replace '\.xml','_backup.xml' }
$files = Get-ChildItem $path
$xslt = New-Object System.Xml.Xsl.XslCompiledTransform
$xslt.Load("C:\temp\xml\DataType.xslt")
foreach ($file in $files) {
    $oldFileName = $file.FullName
    $newFileName = $file.FullName -replace '_backup\.xml', '.xml'
    $xslt.Transform($oldFileName, $newFileName)
}

Basically, it goes through a directory, renames all the existing .XML files to create a backup, and then gets a list of the files. It then loads an XSLT transformation (used to transform XML documents by inserting or removing markup). Finally, it loops through the files in the folder, and uses the XSLT transform to create an updated version of the file with the original filename.

There are probably better ways to do this, but it does exactly what I needed, and it let me learn a little more about PowerShell. It also struck me that this could be very useful to transform XML into a friendlier shape for processing in SSIS.

T-SQL Tuesday: Fix the Glitch or Resolve the Problem

One of the interesting things about T-SQL Tuesday is that it’s a 24-hour event. So, being on the east coast of the US, I get the chance to read some of the blog posts for the event on Monday night, before I actually post mine, though I usually write them ahead of time. Last night I saw that Rob Farley (twitter | blog) had posted his T-SQL Tuesday post. Reading it, I came to a horrifying realization – Rob and I think alike! (If you know Rob, you’ll realize that’s a lot scarier than it sounds). His post and thought process was very similar to mine – though, while I did think about the screen resolution joke, I had the good sense not to mention it.

Seriously, since Rob’s one of the smartest guys I know, I think it’s pretty cool that we had the same approach.

This month’s T-SQL Tuesday is being hosted by Jen McCown (twitter | blog). She chose the topic of resolutions, with the new year in mind. I’ve never been big on the traditional idea of resolutions, and usually focus more on goals that I revisit on a regular basis.

So I started thinking about other ways that I could blog about resolutions, and the idea of resolving problems came to mind pretty quickly. In the roles I’ve filled over the last several years, I was often called in to deal with problems (technical and non-technical). The scope of the problems could vary quite a bit, from simple, 5 minute technical issues (“Why won’t my cube process?”) to business problems requiring full projects (“I need to analyze student performance and determine what factors impact it most heavily.”). One of the trends that I’ve noticed around solving problems is the shift from resolving the problem to simply fixing the glitch.

What’s the difference?

Well, just a second there, professor. We, uh, we fixed the *glitch*. So he won’t be receiving a paycheck anymore, so it’ll just work itself out naturally.
-Office Space

 

When you fix the glitch, you are basically looking for the quickest, cheapest, least effort patch to getting the immediate results you desire. When you resolve a problem, you are making sure the root cause of the issue is fixed, so that the problem doesn’t resurface in a new form a few months later, and so that you address the logical ramifications of the problem.

Of course, fixing the glitch is sometimes appropriate. If you have a server down situation and 50 people sitting around twiddling their thumbs till it’s back up, you need to get that fixed immediately. However, all too often, it stops there. Rather than investigating the root cause and how to prevent it, often the reaction is “It’s working – now back to business as usual.”

The biggest issue with fixing the glitch is that it sets you up for future failure. Most of the time, the patch fails to prevent issues because it didn’t address the problem completely. Even if the patch continues working, it’s creating something that will have to be worked around in the future. We’ve all dealt with those systems that have so many patches and quick fixes that everyone is scared to make any changes to them, for fear of breaking something else.

Resolving Problems

One of my goals for this year is to avoid quick fixes, and focus on resolving the root problems. That’s sometimes painful in the short term, as you don’t get that immediate satisfaction of resolving a problem 5 minutes into the conversation. The benefit, though, is that you won’t have to keep dealing with that same problem on a recurring basis for the next 5 years, so it’s a price I’m willing to pay.