I’ve spent the past 15 months, off and on, working as the technical lead on a performance management project using PerformancePoint Server. The project was to create a SharePoint Portal where users could view the performance of a school system. The portal was initially internally facing, but the goal was to make it publicly accessible. My last project with this particular client was to implement an Internet facing version of the portal.
Since the school system in question has more than 120,000 students, scalability and performance was definitely an issue. I spent a week at the Microsoft Technology Center in Atlanta, GA testing and tuning the solution. We performed our testing in a lab with seven desktop machines (to test with) and three servers to simulate our production environment. I was joined for the week by one of the PPS M&A Architects, who helped me evaluate the solution. Everyone at the MTC was super helpful and friendly, and we got a lot of really good advice.
I want to share some of the things I learned during the performance testing of the solution.
Recommended Reading
A few good things to read before getting started on a venture like this (among all the other white papers about scaling SQL Server, SharePoint, and the like).
- Performance Tuning and Capacity Planning for PerformancePoint Monitoring Server
- Contention, poor performance, and deadlocks when you make Web service requests from ASP.NET applications
Steps
To performance test and tune our SharePoint site, we needed to…
- Deploy the site in the test lab.
- Create load tests.
- Tune Windows Server, SharePoint, IIS, PerformancePoint Server, SQL Server, and Analysis Services as needed.
- Run and analyze the load tests.
- Modify the PPS design as required.
Just five little items on the list… how hard can that be, right? Items #2 through #5 were done in iterations.
System Under Test
The application we created uses SharePoint Enterprise Server 2007 (MOSS), PerformancePoint Server Monitor & Analyze 2007 (SP1), and SQL Server 2005 (SP2 CU8). Both SQL Server RDBMS and SSAS reside on the same server. The RDBMS supports only SharePoint and PerformancePoint. Analysis Services is dedicated to the Dashboards.
Data Source Design
The data source is a SQL Server Analysis Services DB. It uses a straightforward star schema design. All of the KPI’s that we’re using in the PPS Dashboards are surfaced using calculated measures in the cube. This is definitely a good way to go. This design makes developing the dashboards very easy, it encapsulates the logic of the KPIs, and it helps with efficient caching inside of SSAS. The cube was less than 1 GB in size for this phase of the project. Some default aggregations were built, and usage based aggregations were applied against the cube. Query performance was base lined at a few dozen milliseconds for the largest/most complex queries once the cache was warmed.
Dashboard Design
Our portal has seven dashboards. Each dashboard has several pages.
- The Overall view contains a scorecard with a Year Member Selection filter. Each KPI on the scorecard conditionally displays an Analytic Grid and a Web Page report (pointing to a HTML file in SharePoint).
- The Compare view contains two scorecards, with up to 14 filters (primarily Member Selection… more on this later). Seven filters apply to each scorecard; users can make different selections, and compare the two views.
- The Look Inside the Data view contains a single scorecard with up to 12 filters (and a few more columns than the Compare view, to show historical values).
The Analytic Charts were all fairly straightforward, with a default view of a calculated measure against time. The users can drill into the charts.
The filters were primarily Member Selection, with at most a few dozen members. We have one MDX filter (for Schools), that returns about 400 members (this will come into play later).
The Dashboard was implemented using Application Security.
SharePoint
I enabled Anonymous Authentication for SharePoint. The site basically supports all of our Dashboards, as well as a number of Help files and some content to help people use the site and the data. I customized some of the master pages (to remove features such as SharePoint Help, Search, the breadcrumb trail, and stuff like that), but the site design was fairly trivial for this version of the site. Excel Services and other MOSS features were not used.
Deploying to a Test Environment
Our friendly neighborhood Microsoft rep arranged for us to have some time in the Microsoft Technology Center in Atlanta, GA. The provided an environment for us to test in, and they provided three servers (to mimic our production hardware) and seven desktop machines for us to run load tests on. It took a bit ~1.5 days to install/configure the software, install service packs, and deploy our solution (include a SharePoint site collection, PPS M&A Dashboards, and SSAS DB).
Hardware Under Test
We tested our solution on the following hardware. Our production hardware was similar. A hardware network load balancing appliance was used for the web front ends.
Web Front End (1)
- DL380 G5
- Dual Core x 2 – 2.66 GHz – 64 bit
- 8 GB RAM
- Windows Server 2003 (SP2)
- IIS 6.0
- SharePoint Enterprise Server 2007 (SP1+Infrastructure update)
- PerformancePoint Server Monitor & Analyze (SP1)
Web Front End (2)
- DL 382 G5
- Dual Core x 2 – 2.6 GHz – 64 bit
- 4 GB RAM
- Windows Server 2003 (SP2)
- IIS 6.0
- SharePoint Enterprise Server 2007 (SP1+Infrastructure update)
- PerformancePoint Server Monitor & Analyze (SP1)
SQL Server (RDBMS and SSAS)
- DL 585 G2
- Dual Core x4 – 2.66 GHZ – 64 bit
- 16 GB RAM
- Windows Server 2003 (SP2)
- SQL Server (SP2 CU8)
- RAID10 disk array
Test Rig
We used Visual Studio 2008 (SP1) Team Systems for Testers (VSTT) to perform our testing. We used one machine as a Controller and four machines as Agents. We saved our test results to SQL Server on a separate machine.
Test Machines (x5)
- Dual core 64 bit
- 2 GB RAM
- Vista
This configuration let us scale to 4,000+ users without severely taxing any of the machines. Due to the network appliance being used for our load balancing, IP Spoofing was not required for our test machines. We had some issues with using Vista for the agents… it was just a little bit trickier to configure than XP. We were never able to get the counter collection working for those machines (annoying, but not really an issue since we had so much capacity and all the machines were in the same room). We didn’t run into any major issues getting our test rig up and running, but it definitely took some time.
Creating the Load Tests
In order to do our tuning and performance testing, we first had to create everything needed to create a load test. VSTT is a good tool for load testing, but be aware that there is a learning curve. If you don’t have any experience performing web based load testing, then this could be a significant hurdle. Luckily, in a previous life I was a Load Test Architect, and I had experience with the tool.
Test Design
I first identified several use cases that I thought would be common for users of our portal. These consisted of actions such as viewing the top level summary Dashboards, using filters to change the view of the scorecards, clicking on analytic grids, and things like that. I created a mixture of coded and graphical Web Tests for each small work flow (such as changing a filter four times on a certain Dashboard, or clicking through eight charts on a Dashboard). I then used Web Tests that call other Web Tests to build up my use cases. The use cases were then combined in different ratios in each of my load tests so I could simulate a variety of user populations.
Designing your tests correctly is crucial. If you’re way off on what the users will be doing, all of your performance tuning could be for naught. I would recommend using a pilot group, or at least some test subjects, to figure out how people will use your portal.
Web Test Implementation
Microsoft Consulting Services provided a test harness for PPS. The test harness is a coded web test that can parse the XML snippets used by PPS, and perform actions such as changing filter values, activating charts, etc. It’s a pretty cool tool, and was very helpful. Unfortunately, it is proprietary, so I can’t share it… you’ll have to get engage MCS if you want to use it. I did make some modifications to it to change the way it would use filters, to make it work when Web Tests called other Web Tests, etc. If anyone is using this test harness (or if it does become publicly available), I can share the (minor) modifications that I made.
I constructed all of the visual web tests using Fiddler 2 (in conjunction with a home grown helper application to insert comments, and do a few other useful tasks). Thanks to the MCS test harness, a few extra tools, and the way the web tests were constructed, the actual test generation was very straightforward.
Testing Considerations
One of the key elements to consider when performing a load test against PPS is that, due to the architecture, the process of loading a single page may involve dozens of calls. For example, a dashboard that has 15 filters, two scorecards, and two charts would actually make 20 http requests… so if each request takes 1 second, it takes 20 seconds to load the page. I was primarily interested in the time to last byte for each page, so I used comments while recording my tests to know where to place the transactions, so I could analyze those instead of individual calls.
I implemented reasonable timing and pacing for the individual steps in each of the tests, and made sure to gradually ramp up my users when applying the load. Testing was slightly weighted towards the Dashboards that were the largest and would have the heaviest usage, though another use case was used to randomly hit every part of the site to bust the cache if possible.
System Tuning
We tuned a few things, or at least reviewed the settings, out of the gate. During our testing we continuously monitored the servers (and test machines), identified bottlenecks, tweaked settings, then did it all again.
During our initial testing, we found SSAS to be the bottleneck for our solution. I suspect that our web front ends (SharePoint and/or IIS) could be tweaked a little more, but it wasn’t really necessary in our scenario.
ASP.NET Tuning
We started tuning ASP.NET with some of the suggestions in the article Contention, poor performance, and deadlocks when you make Web service requests from ASP.NET applications. In the end, I found the performance to be good using the autoConfig=”true” (the formulas listed in the article are automagically applied) with a few tweaks. I increased the number of threads available significantly… that was a major bottleneck for awhile. I also increased the minWorkThreads to 50 (otherwise SharePoint stalled after adding a few hundred users while spinning up more threads, even when ramping users up fairly slowly). I also increased the minFreeThreads and minLocalRequestFreeThreads. I increased the number of connections to the DB server. Don’t just blindly apply any of these settings… your mileage will vary. I went through a number of test iterations carefully monitoring each of the servers and testing these settings.
I made the following change to the machine.config (on all web front ends):
1: …
2: <system.web>
3: <processModel autoConfig=“true” maxWorkerThreads=“400” maxIoThreads=“400” minWorkerThreads=“50” />
4: <httpRuntime minFreeThreads=“352” minLocalRequestFreeThreads=“304” />
5: …
6: </system.web>
7: …
8: <system.net>
9: <connectionManagement>
10: <add address=“10.1.11.11” maxconnection=“48”/> ? Database server ip address
11: </connectionManagement>
12: </system.net>
13: …
14: </configuration>
IIS Tuning
I adjusted the application pool (for both the Performance Management site as well as the Central Administration page). Pinging and Rapid Failure were turned off on the Health tab. I did NOT use web gardens in this configuration, even though I have seen it recommended. I did not see a difference in performance between using them and not, and was advised by one of the local IIS gurus not to use them.
SQL Server 2005 Tuning
We updated SQL Server to Service Pack 2 – Cumulative Update 8 (build 3257). Before applying the CU, we ran into some significant performance issues. SSAS did not like some of our very small dimensions, and performance was severely degraded. The CU resolved the issues.
Data files, transaction logs, and the Temp DB were spread out over multiple disks. Our usage of the RDBMS was very light, so we needed minimal tuning here.
SQL Server Analysis Services 2005 Tuning
I updated the Query Max Threads element of Analysis Services to 20 to provide additional threads for querying. Adjusting this element needs to be done carefully, as it can improve the performance of the web front ends at the expense of maxing out SQL Server. I actually ramped this number up quite a bit during testing (with good effect), but found 20 to be adequate after resolving some other issues. I suggest monitoring the queued requests in IIS along with the queued and active threads in SSAS to determine a good balance here. I initially found a lot of queuing in the Long Parsing Jobs and Short Parsing Jobs as well, due to some filters.
PerformancePoint Server M&A Tuning
OK, this is important. Since we Internet facing, we’re supporting anonymous users… a decent number of them, in fact. I updated the stored procedure [PPSMonitoring]. [dbo].[ParameterValuesCreate] to comment out a section that is not needed for an anonymous site, but that has an extremely negative impact on performance. This is the piece that stores the current filter selections. In addition to not being necessary, it was totally thrashing our transaction log.
I made the following change to the [PPSMonitoring]. [dbo].[ParameterValuesCreate] sproc:
1: — init
2: SET @TransactionIsOurs = 0
3: IF @@TRANCOUNT = 0
4: BEGIN — only if @@TRANCOUNT is 0, we do BEGIN TRAN
5: BEGIN TRANSACTION
6: SET @TransactionIsOurs = 1
7: END
8: — DD – 20080821 – Commented out the following section to increase performance
9: — –If this parameter value already exists (for this login), update. Otherwise, insert.
10: — IF(EXISTS(SELECT * FROM [ParameterValues] WHERE Login = @Login AND [ParameterUniqueName] = @ParameterUniqueName))
11: — BEGIN
12: — SELECT 1
13: — UPDATE [ParameterValues]
14: — SET
15: — [LastUpdated] = GETDATE(),
16: — [SerializedXml] = @SerializedXml
17: — WHERE [Login] = @Login AND [ParameterUniqueName] = @ParameterUniqueName
18: — IF @@ERROR <> 0
19: — BEGIN
20: — RAISERROR (5580001, 16, 1, @tErrMsg, 7, N’ParameterValues’) WITH LOG
21: — SET @ReturnCode = -1
22: — GOTO exit_label
23: — END
24: — END
25: — ELSE
26: — BEGIN
27: — –Insert record
28: — INSERT INTO [ParameterValues]
29: — ([Login], [ParameterUniqueName],[LastUpdated],[SerializedXml])
30: — VALUES
31: — (@Login, @ParameterUniqueName, GETDATE(), @SerializedXml)
32: — IF @@ERROR <> 0
33: — BEGIN
34: — RAISERROR (5580002, 16, 1, @tErrMsg, 8, N’ParameterValues’) WITH LOG
35: — SET @ReturnCode = -1
36: — GOTO exit_label
37: — END
38: — END
39:
40: IF @TransactionIsOurs = 1
41: BEGIN
42: COMMIT TRANSACTION
43: END
44: RETURN 0
SharePoint Tuning
There is a lot of tuning that can be done to SharePoint; tuning SharePoint was not a focus for us. We did perform a few changes.
Allowing the SharePoint Server worker process to consume large amounts of memory can decrease performance. For computers that have more than 4 GB of RAM, the ASP.NET cache size can be constrained with the privateBytesLimit attribute, set on the cache element of the Web.config file. By setting privateBytesLimit=”2576980378” (that is, 60% of 4 GB), you can avoid a scenario in which a server that has more than 4 GB of memory creates an oversized cache.
For a machine with 8 GB of RAM with a limit of 60% of the memory, we would add the following to the web.config. The private bytes limit number is calculated via the formula (<gigabytes of RAM available> * <percentage of RAM to use>* 1,024^3).
I made the following change to my web.config file for the SharePoint site:
1: …
2: <system.web>
3: …
4: <caching>
5: <cache privateBytesLimit = “5153960755” />
6: </caching>
7: …
8: </system.web>
9: …
Tweaking the Dashboard Design
Overall, I was expecting to be able to support about 500 concurrent users on each of the front end servers. I expected the WFE’s to be the bottleneck. With the original implementation of the Dashboard, I found we could support up to about 2,000 users… as long as they weren’t using our Comparison and Look Inside the Data pages.
When testing those pages, SSAS was getting slammed. I was able to support 1,000 users… but SSAS was at 80% utilization constantly, with occasional spikes and severe performance degradation. Not a really good situation.
Through testing and profiling, I identified the culprit… it was the School filter.
The School filter is an MDX filter, and it looks like this:
1: FILTER(
2: DESCENDANTS(
3: [Entity].[Entity]
4: ,[Entity].[Entity]
5: ,SELF_AND_BEFORE
6: )
7: ,[Entity].[Entity Type] = [Entity].[Entity Type].&[SCHOOL]
8: )
No big deal, right? Except… each time it is loaded (i.e., a lot) it fires off a few other MDX statements. They look like this:
1: SELECT
2: { [Entity].[Entity].[All]
3: ,[Entity].[Entity].&[School A]
4: ,[Entity].[Entity].&[School B]
5: //…about 400 more
6: } DIMENSION PROPERTIES MEMBER_TYPE on 0
7: FROM [MetricsDatamart]
8:
1: SELECT
2: {
3: IIF([Entity].[Entity].[All].Parent IS NULL,[Entity].[Entity].[All],[Entity].[Entity].[All].Parent)
4: ,IIF([Entity].[Entity].&[School A].Parent IS NULL,[Entity].[Entity].&[School A],[Entity].[Entity].&[School A].Parent)
5: // … about 400 more
6: } DIMENSION PROPERTIES MEMBER_TYPE on 0
7: FROM [MetricsDatamart]
8:
These are fairly long statements, and they are expensive to run. SSAS was having to queue dozens of these statements just to parse them when under load. They are fired for each page load, even though our data is static (i.e., no caching is occurring). All those inline IIF’s did horrible, terrible things to SSAS. SSAS was getting very backed up, which caused IIS to queue, which caused extremely long waits on the front end.
To fix the problem, I switched that filter to use a SQL Server RDBMS source. The data was static, so no big deal there. I did find that using more than 1 tabular source gave me an error in PPS… I wasn’t able to resolve that on our timeline, and I was able to fix the performance issue anyway.
Final Results
After that one change, the performance of the whole system skyrocketed. With our heaviest test case, I was able to run 1,500 concurrent users. The WFE processor was at ~70% utilization, averaging 300 requests per second with no contention or queuing. The database server was at ~20 utilization, no issues. Responses were < 1 second, with an average response time of ~.3 seconds. User experience excellent. Even when the response times would increase, page load times were still reasonable (under 10 seconds for pages with more 15 objects).
Next Steps
We got everything pretty much humming for this phase of the project… but we’re building a much larger version for our client in the very near future. Here are some thoughts on making our solution bigger, faster, better.
SharePoint
We can tune SharePoint quite a bit to increase performance. This includes lightening some of the templates and base pages, optimizing images and web pages, and making sure all the user supplied content is fairly lean (i.e., not using a 300kb file produced by Word when a 20kb html file could be used). We can use some of the caching features available in MOSS to improve the performance on the web servers.
PerformancePoint Filters
OK, I saw our filters as a bit of a bottleneck. For a large installation, I think performance could be significantly improved by building a custom filter using the Monitor SDK. Some of the key features would be enabling caching and perhaps using XMLA discover queries instead of some of the MDX. I’d also make the ordering of the elements a little more configurable, and perhaps try and implement a way to create a ‘global filter’ (I find I use a single filter on many different Dashboards… and I’d like to reduce the maintenance/development time/improve consistency.
Servers
Scaling out the web servers and scaling up the SQL Servers looks like a fairly effective strategy here. Coupled with some custom components and a well thought out design, this looks like an effective strategy with fairly predictable performance characteristics. We could easily double our capacity right now by snagging two extra WFEs. To scale by a factor of 10, we’d probably need an extra SQL Server and some minor changes to our strategy.
Key Takeaways
Our testing went well, and we were able to exceed the performance we were expecting. Here are a few key takeaways from this exercise.
- Plan on hunting down some performance bottlenecks. In a PPS deployment, they can be located on any tier, and it may take a little time to find and fix them. We had to make adjustments to basically every technology in this instance.
- You’re going to want to tune your servers. It took a little over a week for me to go through that exercise, but I had help, a lab, and experience with all the various components. You’re mileage may vary. We identified and corrected a number of significant performance issues… everything from tweaking IIS and SQL Server to ensuring the correct Cumulative Update was applied. Make only small changes for each test iteration, and analyze carefully. And document the changes you make.
- The components you use, and your design, will have extreme ramifications on performance. If your dashboards heavily use PAS views vs. Excel Services vs. Analytic Charts, your performance characteristics will change dramatically. Carefully consider the ramifications of the different technologies, and look at where the work is being done in your configuration.
- Overall, scalability and performance was good, and everything was pretty much in line with what I saw in the Performance Tuning and Capacity Planning for PerformancePoint Monitoring Server. Keep in mind that your implementation will probably be be somewhat different than the PPS Team’s test installation, so keep that in mind in your planning.
Happy PerformancePointing…
David