Developer Gaps

I’ve been meaning to post about this ever since I saw Jamie’s post on “The SQL Developer Gap”. I couldn’t agree more with what Jamie expressed. Prior to getting into serious BI development, I was primarily an application developer. This was at a time when agile development was becoming very popular. As part of that, refactoring, test driven development, continuous integration, and automated unit testing were becoming accepted as good software engineering practices, and tool support was coming along very quickly.

Then I switched over to BI development in the SQL Server 7 time frame. In a lot of ways, it was like going back to the dark ages. No refactoring support, no automated testing, no concept of builds. Nothing significant changed until SQL Server 2005, when tools like SSIS and SSAS took their first steps toward becoming more “developer friendly” by leveraging Visual Studio to easily integrate into source control and the beginnings of multi-developer support. However, there haven’t really been any improvements in this since 2005. Refactoring, automated testing, automated builds, etc., can all be done, but they are painful and time consuming to set up, and require a fair amount of specialized knowledge to do correctly. In addition, these are all skills that the average BI developer usually doesn’t posses.

To join in Jamie’s rant, this is something that has aggravated me increasingly over the last few years. In many ways, BI is ideally suited to an agile approach and developer tools that increase productivity – requirements shift on the whim of the business, you need to deliver quickly and often, and you need easy mechanisms to confirm that what you are delivering provides the correct results. There are many tasks in developing BI solutions that are repetitive and could be easily automated, if only the tools provided better support for it. And developer productivity using the SQL Server BI tools hasn’t seen a significant increase since 2005.

I’m spending a fair amount of my time these days working in Visual Studio, where I have the luxury of a built in unit testing tool, the capability to switch between visual editing and text editing depending on which makes the most sense, the ability to easily do a diff between two versions in source control, a full undo-redo stack, etc. And I get to use add-ins like ReSharper (a fantastic tool that I can’t recommend enough). It really highlights the difference between developing traditional applications and BI applications these days.

That’s part of the reason I joined Varigence, where I have the opportunity to actually help developers deliver BI solutions faster and better. Our approach makes it much easier to support the same features that you see in traditional application development tools. I’ve been pretty pleased to see how easy it is for us to add productivity features to our tools – honestly, it makes me wonder why BI developers had to wait this long for these features to be available in the tools we use on a daily basis.

Posted in Uncategorized | Comments Off on Developer Gaps

Presenting ‘Enterprise Class Analysis Services Development’ at the Vancouver PASS Chapter

Hey Guys!

I’m going to be giving a presentation on Enterprise Class Analysis Services Development at the Vancouver PASS Chapter Friday, March 12th. 

http://vancouverbi.sqlpass.org/Home/tabid/1551/Default.aspx

I’ll be talking about some of the topics I’ve blogged about here, including working with multiple developers, using a custom MSBuild task to build SSAS Databases, and reviewing a monitoring/reporting solution for SSAS.  If you don’t happen to be in Vancouver, you can always attend virtually.  🙂

Title: Enterprise Class Analysis Services Development

SSAS is one of the most popular tools for OLAP, but many organizations experience challenges when attempting to use their standard development best practices with the tool. This session will cover many topics around enterprise development practices for SSAS, including how to effectively use source control with multiple developers, enable robust automated build/deployment strategies, implement usage monitoring and tracking solutions, and support unit testing for SSAS solutions.

Objective 1: Demonstrate how to use source control with multiple developers.

Objective 2: Show techniques to automate builds and enable robust deployment strategies.

Objective 3: Review strategies for robust monitoring of multiple SSAS deployments for development, administrative, and business purposes.

Posted in Uncategorized | Comments Off on Presenting ‘Enterprise Class Analysis Services Development’ at the Vancouver PASS Chapter

SQL Saturday #33 – Coming Up Next Weekend

The Charlotte SQL Saturday is coming up next weekend, March 6th. There is a great lineup of speakers presenting, and we are approaching capacity for the event, so get registered soon, if you haven’t already. It’s going to be a great day – free SQL training from a lot of well-known, well-respected names, a number of our local community members, and some Microsoft people as well.

I’ll be presenting on Creating Custom Components for SSIS – a great way to extend the out-of-the-box functionality of Integration Services, and on Vulcan, an open source framework for modeling and generating portions of your BI solution.

We’re got a great set of sponsors, including SQL Sentry (thanks Peter and Greg for all the work and support for the event – without them, it wouldn’t be what it is today – and if you are using SSAS, you really should check out Performance Advisor for Analysis Services), and Microsoft, who’s providing access to their campus for the event, among other things. Quest, Confio, and Red Gate provide some great tools for administering and developing for SQL Server, Intellinet provides services for SQL Server, and CozyRoc provides a great set of SSIS tasks and components.

Posted in Uncategorized | Comments Off on SQL Saturday #33 – Coming Up Next Weekend

Slides From the Columbia Code Camp

Thanks to everyone who made it out to the Columbia Code Camp this weekend, even with the sleet and snow in the area. I had a number of requests for the slides from my presentations, so I’ve uploaded them to my SkyDrive.


Introduction to SSIS (SpeakerRate link)


Creating Custom Components for SSIS (SpeakerRate link)  (the sample component used in this presentation is on CodePlex in the Community Tasks and Components project.


Thanks again for attending, and if you have any follow up questions, please leave them in the comments.

Posted in Uncategorized | Comments Off on Slides From the Columbia Code Camp

Slides From “Processing Flat Files with SSIS”

Thanks to the Columbia Enterprise Developer’s Guild for letting me present last night. The audience was great, and I got a lot of good questions. Several people asked if the samples could be made available, and I also had a request to post the slides for some people who weren’t able to make it. So, here they are. I’ve posted them to my SkyDrive here. If you have any questions or comments, please feel free to leave them here.

Posted in Uncategorized | Comments Off on Slides From “Processing Flat Files with SSIS”

A New Year and Upcoming Presentations

It’s a new year, and already a lot going on. The new job is going well, but keeping me extremely busy. I’ve got several upcoming presentations, and there’s a SQL Saturday event planned for Charlotte in March that I’m helping organize. I’m also happy to say that my MVP status was re-awarded for 2010.

I have an upcoming presentation at the Columbia Enterprise Developers Guild, next Wednesday the 13th. The presentation will be on handling flat files in SSIS.

Processing Flat Files with SSIS

When doing data integration, a common requirement is to work with flat files, whether for importing data into a system from an external source, or to export it to provide to other systems. SQL Server Integration Services (SSIS) supports flat files, but there can be a number of challenges when working with them. This is particularly true if your flat files have multiple data formats contained in a single file, the data has complex formatting, or the files have inconsistent formatting. This session will help you to be more efficient when working with these types of files. You’ll learn to handle missing delimiters in the files, and parsing files that have multiple data formats. You’ll also see how to produce complex output formats, like headers and footers that contain summary information.

I’ll also be doing a couple of presentations at the Columbia Code Camp on January 30th.

Creating Custom Components for SSIS

SSIS data flows are great tools for moving data. But what if you need to go beyond the out-of-the-box components provided with SSIS? Custom components are a great way to encapsulate and reuse functionality for the data flow in SSIS. We will discuss what it takes to create and deploy custom components in SSIS, review the pros and cons of using custom components instead of scripts, and discuss some of the common challenges and issues with creating them.

Introduction to SSIS

SQL Server Integration Services is a tool provided with SQL Server for moving data between data stores. It is the successor to DTS, but there are many fundamental changes in how SSIS works. This session will provide an overview of SSIS, with a focus on the key elements of SSIS that you need to know to get the most use out of it. This session will help developers efficiently use SSIS when they need to move data around the organization.

If you happen to be in the area, please drop by for these presentations.

Posted in Uncategorized | Comments Off on A New Year and Upcoming Presentations

SharePoint JavaScript Error: Library Not Registered

The other day one of my clients upgraded from Outlook 2003 to Outlook 2007.  Since then, whenever she went to her company’s SharePoint site, she received the following error: 

================================================================== 

Website error details 

User Agent: Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; 
.NET CLR 1.1.4322; MS-RTC LM 8; .NET CLR 2.0.50727; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729)
Timestamp: Mon, 4 Jan 2010 18:27:45 UTC

Message: Library not registered.

Line: 1935
Char: 4
Code: 0
URL: http://server/_layouts/1033/init.js?rev=ck%2BHdHQ8ABQHif7kr%2Bj7iQ%3D%3D

==================================================================

Apparently the Outlook upgrade did something to a DLL (removed it?) that SharePoint needs.
If you have just upgraded from Outlook 2003 to Outlook 2007 and you receive a similar error, simply run Office Diagnostics and it should fix the problem!

To run Office Diagnostics follow these steps:
1. Open Microsoft Outlook 2007
2. On the Help menu, click Office Diagnostics
3. Click Continue, and then click Start Diagnostics

Posted in SharePoint | Tagged , , , , , , | Comments Off on SharePoint JavaScript Error: Library Not Registered

Performance of Raw Files vs. Flat Files

It’s pretty well accepted that raw files in SSIS are a very fast means of getting data in and out of the pipeline. Jamie Thomson has referenced the use of raw files a number of times and Todd McDermid recently posted about using them for staging data. It occurred to me, that even though I’d always heard they were faster than other options, I’d never actually tested it to see exactly how much of a difference it would make. So, below I’ve posted some admitted unscientific performance testing between raw files and flat (or text) files.

I tested two variations of flat files, delimited and ragged right. The delimited file was configured with a vertical bar (|) as the column delimiter and CR/LF as the row delimiter. The ragged right file was configured as a fixed width with row delimiters – each column had a fixed width, and a final, zero-width column was appended with CR/LF as the delimiter. The same data was used for each test, the following columns being defined:

Name Data Type Precision Scale Length
TestInt32 DT_I4 0 0 0
TestString DT_STR 0 0 50
TestBool DT_BOOL 0 0 0
TestCurrency DT_CY 0 0 0
TestDBTimestamp DT_DBTIMESTAMP 0 0 0
TestWString DT_WSTR 0 0 50
TestNumeric DT_NUMERIC 18 6 0

One thing to note is that when importing from flat files, everything was imported as strings, to avoid any data conversion issues. This is one of the strengths of raw files – no data conversion necessary. But for this test, I was primarily looking at speed of getting the data on and off disk. I also looked at the difference in file sizes between the formats.

I tested each option with 500,000, 1 million, and 10 million rows. I ran each one 4 times for each row count, and discarded the first run to offset the effects of file caching. The results of the runs were averaged for comparison.

When writing files, there’s no big surprises between the options. raw files are faster on 10 million rows by 9.8 seconds. The difference on smaller numbers of rows is pretty insignificant. Here’s a chart showing the times (the raw data is at the end of the post):

image

Reading files did show a difference that I didn’t expect. Read speeds on raw files and delimited files are fairly comparable, with raw files still having the edge in speed. However, reads on ragged right files are significantly slower – well over twice as slow when compared to raw files.

image

File sizes were also as expected, with delimited files having a slight edge over raw files, likely because the string values I used were not all 50 characters in length.

image

In summary, it’s clear that raw files have an advantage in speed. However, the differences weren’t as large as I was expecting, except in the case of ragged right files. So, in general, using raw files are best for performance, but if you are dealing with row counts of less than 1 million rows, it’s not a huge difference unless you are really concerned with performance. Of course, there are plenty of other differences between the formats, and I’d encourage you to research them before making a decision.

Here’s the raw data on the number of seconds to produce each file:

  500,000 1,000,000 10,000,000
Write To Delimited 2.61 5.16 47.02
Write To Ragged 2.66 5.31 49.03
Write To Raw 2.21 4.23 39.21
  500,000 1,000,000 10,000,000
Read From Delimited 0.77 1.52 16.59
Read From Ragged 2.74 5.89 35.39
Read From Raw 0.60 1.08 10.03

and the file size in KB for each:

  500,000 1,000,000 10,000,000
Delimited 44,624 89,792 946,745
Ragged 92,286 184,571 1,845,704
Raw 47,039 94,402 973,308

Please let me know if you’d like more details or have any questions.

Posted in Uncategorized | Comments Off on Performance of Raw Files vs. Flat Files

New Path, Same Focus

I’ve worked with Mariner for almost 12 years. It’s been a very good journey, with many great experiences. I’ve worked with a lot of great people, and delivered some really interesting BI solutions to clients in a number of industries. One aspect of my job that I always particularly enjoyed was helping developers be more productive when creating BI solutions, and reducing the repetitive (read: “boring”) aspects of developing solutions on the Microsoft stack.

Recently, a new opportunity to focus more heavily on that came along. As a result, after a long and enjoyable career with Mariner doing business intelligence consulting, I am taking a new position with Varigence, a company that is producing tools that will make implementing BI solutions faster and easier, as well as introduce new capabilities and better integration into the Microsoft BI stack.

I’m really looking forward to the new role and the new experiences it will offer. I will continue to be heavily involved in Microsoft BI, so I plan to maintain this blog and continue speaking and writing on it as often as often as possible.

Posted in Uncategorized | Comments Off on New Path, Same Focus

Sample Files for Introduction To Analysis Services 2008

We had good turnout at the Greenville, SC SSIG on Tuesday. If you attended, I hope you enjoyed the presentation. After the meeting, I promised several attendees that I would make the samples developed during the demo available, and here they are. The zip includes both the SSAS project files, and a backup the sample database that the cube was built on. Both are done using the 2008 version of SQL Server.

If you have any questions about , please post them in the comments.

Posted in Uncategorized | Comments Off on Sample Files for Introduction To Analysis Services 2008