Posts tagged ‘Agile Development’

You Don’t Have Time for Testing?!

I wrote a post for the Pragmatic Works blog that I thought would be interesting for my readers, so I’m posting a teaser here. If you want to read the whole post, go here.

I’ve been a big advocate of testing for applications, databases, data warehouses, BI and analytics for a while now. Not just any testing, but real tests that help you truly verify the state of your code, applications and data. I like Test Driven Development, but really any approach that focuses on automated, repeatable tests that verify meaningful functionality I find hugely beneficial. And almost no one I’ve ever talked to about this topic has disagreed with me. (There was that one guy, but he was a FoxPro developer, so…) But there’s often a point where the conversation goes sideways.

Continue reading….

Follow Up for Continuous Delivery Presentation at CBIG

I presented Continuous Delivery for Data Warehouses and Marts at the Charlote BI Group Tuesday night. They have a great group there and I look forward to going back.

This is one of my favorite topics, and I always get good questions. CBIG was no exception, with some great questions on managing database schema changes when using continuous delivery, how continuous delivery and continuous deployment differ, and how to manage this in a full BI environment.

One question came up that I needed to verify – “Can you call an executable from a post-deployment script in SSDT?” The scenario for this was running a third-party utility to handle some data updates. I have confirmed that the post-deployment scripts for SSDT can only execute SQL commands, so you can’t run executables directly from them. However, as we discussed at the meeting, you can add additional executable calls into the MSBuild scripts I demonstrated to manage that part of your deployment process.

I promised to make my presentation and demos available, so here they are. Please let me know if you have any questions.

Where’s John These Days?

Apologies for the lack of updates to the blog recently. It’s been a very busy time, but hopefully things will settle down a bit now.

Exciting news today (for me at least)! It was my first day as a Pragmatic Works employee. I’ve joined their product group, and will be helping manage the development of their BI tools. As I’ve commented on this blog before, one of the things I ‘m really passionate about is enabling BI developers to create solutions faster and more easily, and I’m looking forward to the opportunities that Pragmatic Works presents to continue doing exactly that. I also get to work with a great group of developers and some really sharp BI people, so it promises to be a lot of fun.

My excitement is tempered somewhat by sadness at leaving another great group of developers at Varigence. I enjoyed working with everyone there, and wish them success in their future endeavors.

In other news, I have a number of presentations coming up. I’ll be at SQLBits in London on March the 29th, presenting a precon with Matt Masson on SSIS Performance Design Patterns (space is limited, register now!). I also have a session on SSIS Unit Testing at SQLBits.

On April 14th, I’ll be presenting at SQL Saturday #111 in Atlanta, which is always a great time. I’ll be presenting on Tuning SSAS Processing Performance

Last, but definitely not least, I was thrilled to find out that I’ll be presenting the Tuning SSAS Processing Performance session at SQL Rally in Dallas on May 10-11 as well. Please vote for one of my other sessions in the community choice options, if you see one that appeals to you. I’m really looking forward to seeing some of my friends from Texas again.

Developing a Foundation While Doing Iterative Development

In my initial post about doing iterative development for BI/DW projects, I mentioned that one of the challenges was developing a solid foundation while doing iterative development, especially in the first few iterations. If you are started from scratch on a new BI initiative, there is often a lot of work to do in getting environments established, developing processes for moving data and code between environments, and exploring and validating the source data to be used. Unfortunately, most of this work does not result in deliverables that an end user would consider valuable. Since, as part of each iteration, you want to have the possibility of delivering working functionality to the stakeholders, this can present a problem.

Since most end users consider working functionality to be something that they can see in a nice user interface, you need to look at ways to minimize the development time required to present data to the end user. Some of the common time-consuming tasks in the first couple of iterations are:

  • Establishing the environments
  • Exploring and validating the source data
  • Developing ETL processes to move and cleanse the data

There are really no quick workarounds to setting up the environments. In fact, I’ve usually found that taking shortcuts on the environments leads to much bigger problems down the road. However, what can be effective is to minimize the number of of environments that you deal with in each iteration. While theoretically you should be able to deploy to production in the first iteration of a project, it’s rare that this is actually needed. So instead of creating a development, QA, and production environment, consider only establishing the development and QA environments. I do think that having at least two environments is important, so that you can begin validating your deployment procedures.

Exploring and validating the source data is definitely important. In the first couple of iterations, though, it’s often necessary to limit and restrict what you explore. For example, a project I was involved in recently had some very serious issues with data quality. The source database did not enforce referential integrity, so a large percentage of the data in the source was not related to the rest of the data correctly. Rather than derailing the current iteration to completely research and resolve the data quality issues, the project team and the stakeholders made the decision to only import data that was related correctly. This enabled the project team to still present a set of working reports to the stakeholders at the end of the iteration, rather than not being able to demonstrate any working functionality. The subsequent iterations were adjusted to better reflect the level of data quality.

ETL processes can be time-consuming to develop, particularly if the organization does not already have a framework in place for the ETL processes. In the first couple of iterations, an alternative approach is to load data in a more manual fashion, using SQL scripts or direct manipulation to get an initial set of data populated. This has a couple of benefits. One, it allows the time for building a full ETL process to be spread across multiple iterations. Two, it allows the end users to get a look at the data (in a friendly user interface, if you follow the advice above) and validate that it is correct.

A key part of developing the foundation, while still providing value in each iteration, is accepting that you can’t do it all in a single iteration. The foundation development will have to be spread across multiple iterations, and it is acceptable to build some scaffolding code in order to deliver functionality in each iteration. Clearly, you want to minimize that amount of code that can’t be reused. Generally, I’ve found that with the Microsoft toolset for BI, it’s pretty easy to build incrementally, with minimal rework. However, even if your tools don’t support this as well, in my experience the downsides of having some code that gets replaced in a later iteration are far outweighed by the benefits of being able to demonstrate tangible progress to the stakeholders in each iteration of the project.

Getting Business Value from BI Today

Today’s economy and business environment poses problems for many IT departments. Budgets are tight, companies are looking for ways to reduce headcount, and strategic investments are not very popular right now. Instead, companies are focusing on projects that have immediate ROI, primarily in the form of cutting costs. This has particular impact on BI projects, since they often have longer time frames, and in many cases, don’t have clearly defined ROI. The effects of this are becoming evident. A number of consulting firms in the southeast US have cut back on the number of BI consultants they employ, and a number of companies in the region are cutting back on planned BI expenditures.

Since I believe very strongly that BI is a valuable investment, I hate to see this, but it is understandable. If you are beginning a large data warehouse project, that projects 12 months or more before any value is delivered to the users, it’s not a very attractive investment right now. So, given the current environment, what can you do to ensure that your BI projects don’t end up on the chopping block?

Solve Existing Problems

Focus on solving existing problems, rather than trying to anticipate future needs. Don’t get me wrong, sometimes you do need to invest in creating the BI infrastructure to handle future needs. However, in the current environment, business decisions makers are much more interested in hearing about the problem you can solve today, not the problems that they may encounter in a few years (if the business is still around).

Identify ROI

Identify the ROI up front. Too many BI projects get started on the vague promise of “When we have all this data together, we’ll be able to do great things!”. That’s not enough to justify expenditures today. Instead, you need a clear understanding of the problem, the cost of solving it, and the cost of not solving it. In a cost cutting environment, the latter point is often effective in communicating why the project needs to be done.

Incremental Wins

Look for incremental wins. As I’ve commented before, many BI initiatives take on a massive amount of scope, and run for many months before producing anything of value. An incremental approach allows you to demonstrate value to the stakeholders rapidly, and those incremental demonstrations of your ability to solve the problem often result in additional funding coming your way.

Do More With What You Have

Now that PerformancePoint Services is being rolled into SharePoint, there is an opportunity to provide additional BI with minimal or no investment in software, particularly if your company already uses SharePoint. By combining the capabilities of PerformancePoint with the workflow capabilities in SharePoint, you can provide some very interesting solutions in optimizing business processes, and making sure that the users of the process have the appropriate information at their fingertips.

By focusing on quick wins that have immediate ROI, BI teams can still provide a lot of value to the business. Demonstrating this value is the key to keeping your BI initiatives alive in a down economy.

Managing Scope for Iterative Development

As I mentioned in my previous post, defining an appropriate scope is difficult when you are attempting to do iterative development on BI/DW projects. The problems with this are not unique to BI/DW projects, of course. All projects have scope issues. Most of the same scope management techniques that work well on traditional application development projects also work for BI/DW projects. There are two unique aspect of scope for BI/DW projects, though. One is that there is often significant work required “behind the scenes” to deliver even small pieces of visible end user functionality. The other is that there can be significant “hidden” scope in properly cleansing the data and making it useful from a business perspective. This can be challenging because the end users may have a perception that they are not getting very much functionality, particularly in the first several iterations of the project.

What are some of the hidden aspects of BI/DW projects? ETL and data profiling are two of the most common. I consider these hidden because the end users of a BI application rarely are intimately involved in the ETL process development. They may be involved in the data profiling, but often are not involved in the data cleansing that usually accompanies it. These are time-consuming activities that the users only get to appreciate indirectly, so they often don’t put very much value on it.

How can this be addressed? I’ve found that there’s two parts to it. First, you need to do a certain amount of education with the stakeholder on what happens behind the scenes, so that they have a better understanding of the effort that has to be expended. The benefits that they get from this effort need to be explained as well. Telling them that ETL processes are time-consuming to implement isn’t very effective unless you also explain the benefits of well-implemented ETL: cleaner, consistent, conformed data, with appropriate controls to verify that the right data is going into the data warehouse, and the bad data is being excluded.

However, education is not enough by itself. The second part of it is to show them that they can get incremental benefits. Again, as pointed out in the previous article, each iteration should deliver something of value to the users. It’s important to do this from the first iteration on, and to continue to do it consistently. One effective way to determine what would be of value for an iteration is to ask the users to decide on one or two questions that they want to be able to answer. The scope of the iteration becomes delivering the information that allows them to answer those questions. But what if the questions are complex, and you don’t feel that they can be addressed in a single iteration? I’ve generally found that if the questions are that complex, you can break them up into smaller questions and negotiate with the stakeholders to deliver them over multiple iterations.

This does require that the development team on the project has an agile mindset, and is focused on meeting the deliverables for the iteration. It also poses a more significant challenge when a project is in an initial phase, and the infrastructure is still being put in place. I’ll discuss this challenge further in my next post.

In conclusion, scope management is important on all projects, not just BI/DW projects. However, perceptions of scope may be more challenging on BI/DW challenges, because of the hidden nature of some of the activities. It’s important to communicate the value of these activities to the project stakeholders, and to demonstrate that you can consistently produce incremental deliverables, while still carrying out these valuable activities.

As always, comments and feedback are welcome.

Challenges to an Iterative Approach to Business Intelligence

I’m a fan of agile development. Prior to focusing on business intelligence and data warehousing, I architected and developed client server and n-tier applications, and I found that agile development techniques delivered better end results. I believe that, in large part, this came about because of the focus on smaller, functional iterations over a more traditional waterfall approach. However, this approach is still not regularly used on BI projects. Since it can have a big impact on the success of your BI initiatives, I’d like to list some of the challenges that prevent adoption of this approach. I’ll go into more detail on each of these in my next few posts, and also cover some of the benefits you can see if you overcome the challenges.

First, though, let me define what I mean by an iteration1. An iteration is a short development cycle that takes a specific set of requirements and delivers working, useful functionality. Generally, I think the duration of an iteration should be 2 weeks to a month, but I don’t consider that an absolute rule. More than 2 months for an iteration, though, really impacts the agility of the project, so I prefer to keep them shorter. Delivering working functionality in an iteration doesn’t necessarily mean that it has to go to production. It does mean that it could go to production, if the project stakeholders choose to do that. Delivering useful functionality means that what’s been developed actually meets a stakeholder’s need.

There are a number of challenges that you might encounter in doing iterative development for BI. Fortunately, you can work around these, though it’s not always easy.

  1. Scope
    Many BI/DW initiatives are large and involve multiple systems and departments. Even smaller BI projects often have multiple audiences with differing needs and wants. Managing the scope of the effort appropriately can be challenging in that environment. This means that defining and managing scope is critical to successful iterative development.
  2. Foundation development
    Particularly when BI projects are getting off the ground, they need a lot of foundational setup – environments, software installations, data profiling, and data cleansing. This poses definite problems for the first couple of iterations, especially when you take into account the guideline of delivering working and useful functionality with each iteration. The foundation needs to be built in conjunction with delivering useful functionality.
  3. Existing infrastructure and architecture
    Iterative BI development requires an infrastructure for development that is flexible and adaptive, and it makes it much easier if existing solutions were architected to be easily modified. Sadly, this is not the case in many organizations. Existing process and infrastructure tends to be rigid and not support or encourage rapid development. Existing data warehouses tend to be monolithic applications that are difficult to modify to address changing business needs. And many BI development tools do not provide adequate support for rapidly changing BI applications.
  4. Changing requirements2
    Changing requirements is a fact of life in most IT projects, and it’s definitely the case in BI/DW projects. While agile and iterative development can help address changing requirements, it still poses a challenge. As requirements shift, they can have a ripple effect on the BI infrastructure – adding a new field to a report may require changes to the database and ETL processes in addition to the report itself. This can make seemingly inconsequential changes take much longer than expected, and lower the adaptability of the project.
  5. Design 2
    Due to the scope and complexity of many BI/DW initiatives, there is a tendency to get bogged down in design. This is particularly true when you consider that the database is a key part of BI/DW projects, and many BI developers feel most comfortable having a complete and stable data model before beginning development (which is perfectly reasonable). However, design by itself does not produce working functionality, so an iteration that involves nothing but design doesn’t really meet my definition of iterative.

After looking at all these challenges, you may feel that it’s pointless to try iterative development on BI projects. However, I’m happy to say that all these items can be addressed (granted, some of them are much more easily fixed than others). If you do make some changes, you can get a number of benefits, including the ability to rapidly accommodate change, deliver working functionality more quickly, and facilitate emergent requirements and design. The end result? Happy project stakeholders who are more satisfied with the results of their projects, and less wear and tear on the BI developers. Everybody wins!

Over the next couple of months, I’ll be posting more information about the challenges, and how you can work around them. Please keep reading, and if you have questions or comments, don’t hesitate to get in touch with me.


1. If you are familiar with Scrum, you will notice some similarities between what I describe as an iteration and a sprint. Scrum has influenced my thinking on agile methodologies quite a bit, and I’m a big fan of it. However, because I don’t follow the definitions in Scrum rigidly, I’ve found it better to not use the same terminology to avoid confusion. If I do refer to a sprint, though, I will be referring to the Scrum definition (a 30-day increment, resulting in potentially shippable product).
2. Please don’t take the above to mean that I don’t believe in requirements or design – I feel that both are vital to iterative development. However, the approach that many BI practitioners take to requirements and design does not lend itself to iterative development.