Performance – Agile BI http://agilebi.com A community for sharing ideas about business intelligence development using agile methods. Mon, 02 May 2011 15:47:16 +0000 en-US hourly 1 https://wordpress.org/?v=6.2.2 Tuning SSAS Processing Performance http://agilebi.com/blog/2011/05/02/tuning-ssas-processing-performance/ Mon, 02 May 2011 15:47:16 +0000 http://6.1437 Continue reading ]]> Thanks to all those that attended either the webcast of this that I did for the PASS BI virtual chapter, or the presentation at SQLSaturday #74 in Jacksonville this weekend. I really appreciate all the great feedback that I’ve received from the attendees at both events.

I’ve attached the slides from the presentation. I’ve also attached a few definition files, one for a Performance Monitor data collector set, and two trace templates, one for SSAS and one for SQL Server. Feel free to customize these files as necessary for your own tuning efforts. Also, while these trace templates can be used in Profiler, the best thing to do is to generate the scripts from them to run the trace directly on the server – lower overhead and less impact on performance.

The file is located on my SkyDrive.

]]>
Death by SQL…an Act in Two Parts http://agilebi.com/blog/2010/09/20/death-by-sqlan-act-in-two-parts/ Mon, 20 Sep 2010 19:46:55 +0000 http://9.34 Continue reading ]]> CartoonHow common is it to run into performance issues with SQL Server? Daily?  Hourly? Maybe for you, it’s a common existence; and for that, I’m sorry. And how are some ways that you deal with performance degradation in SQL? I’d venture to say that, for most, it would involve juggling indexes, statistics, etc. on the tables in question. But what about going about this all differently?

What if we take a step back and look at the code itself? Maybe the code is the problem, not the server performance. Since running across SQLServerCentral in the early days of my BI experience, there were a few blog posts and articles which have stuck with me throughout. One such article, More RBAR and “Tuning” UPDATEs, has been of great help to me.

This article opened up my eyes to a completely different way of thinking when it comes to performance problems within SQL Server. I highly suggest reading it before continuing with the rest of my post here.

I ran into this “tuning” problem the other day when working with some Fact records that I was trying to tie to a Type 2 Dimension. I have about 37,000 school enrollment records for which I need to find the appropriate Student ID surrogate key among 273,000 different student records. It seemed pretty simple enough:

  • Link the Fact record to the Dimension record using the Student Number
  • Based upon the Fact record’s registration date column, place the record with the correct version of the Student

Act 1

There are two different ways to construct our SQL statement to get this job accomplished: either set-based or row-by-row (see RBAR above) Obviously, one is much more preferred above the other method. Take, for example, the code below (RBAR):

UPDATE [PreStage_FactSchoolEnrollment]
SET [AlternateStudentID] =
(SELECT
    (SELECT TOP 1 DS.[StudentID]
    FROM [DimStudent] DS
    WHERE DS.[EffectiveEndDate] >= [FactSchoolEnrollment].[RegistrationDate]
        AND DS.[StudentSISID] = [FactSchoolEnrollment].[Pupil_Number]
    ORDER BY DS.[EffectiveEndDate],DS.[StudentID])
FROM [FactSchoolEnrollment]
WHERE [PreStage_FactSchoolEnrollment].[ID] = [FactSchoolEnrollment].[ID])

Seems innocent enough, right? However, there is a huge performance issue with this query. Below is a screenshot of one particular piece of the actual execution plan from this query above. Remember our record counts: ~37,000 Fact records and ~273,000 Dimension records.

queryplan_indexspool

That’s right…that number circled above is over 8 BILLION rows that were created in memory!! (8,465,578,262 to be exact). This is the base problem with RBAR queries. In essence, this query, as it is currently structured, queried and stored the ENTIRE dimension (all 273,000 records) for EACH of the incoming Fact records (37,000). That is where the 8.4 Billion records are created. Notice that this update took over 48 minutes run. There isn’t an index in the world that is going to help this type of performance monster.

Act 2

Enter set-based SQL. How about we reconstruct this query as a set-based query instead? Look at the differences in the SQL below:

UPDATE PreStage_FactSchoolEnrollment
SET PreStage_FactSchoolEnrollment.AlternateStudentID = ISNULL(b.StudentID,-1)

FROM (SELECT ROW_NUMBER() OVER (PARTITION BY StudentSISID ORDER
BY EffectiveEndDate ASC) as rownum, Pupil_Number, sub_b.StudentID
FROM FactSchoolEnrollment sub_a
INNER JOIN DimStudent sub_b ON
    sub_a.Pupil_Number = sub_b.StudentSISID
WHERE (sub_b.EffectiveEndDate >= sub_a.RegistrationDate)) b
WHERE PreStage_FactSchoolEnrollment.Pupil_Number = b.Pupil_Number

This end result of this query is EXACTLY the same as the above query; the only difference is that this query took all of 9 seconds to return data. Now that’s a performance gain!

Followup

Now, understandably, it may not be feasible to rewrite your SQL code because of different constraints. But, if you can, at all, (and I’m pleading with you here), PLEASE try to rewrite the code itself. You will be surprised at how much of a difference syntax can make!

]]>