Performance Tuning the ForumsBy Bill Graziano on 3 December 2001 | Tags: Performance Tuning This article is a little different than what I normally write. SQLTeam uses forum software from Snitz. It's open source, free and I really like it. I've been involved in some performance tuning using SQL Server with the Snitz forums. This article discusses how I evaluated performance, the tools I used and what suggestions I made. It's a variation on a looong post in the Snitz support forums. Hopefully seeing what I went through will give others a head start on any performance tuning they might attempt. The Snitz Forums are a cross database ASP-based bulletin board system. They support Access, SQL Server and mySQL. They have one code base that runs across all platforms. This article analyzes the database calls inside their default.asp page. I had three goals in this. My first goal was to see if I could find some better indexes that would improve the performance of DEFAULT.ASP. You can usually get a cheap, easy performance gain with proper indexing. I also wanted to find a standard set of SQL that could run against any installation to gauge the performance of their SQL server. My last goal was to see if I could find some things we could do in the ASP to improve the performance. Please don't think I'm saying that Snitz has poor performance. Until I wrote this article the SQLTeam forums ran on an unmodified version of Snitz and handled 40,000 page views per week with no problems. Snitz is the BEST free ASP/SQL Server bulletin board I've found. I'd also like to mention ORCSweb. They host SQLTeam and do a tremendous job. Their servers are fast and reliable. They also have some of the best support of any technology company I have EVER dealt with. SQL ProfilerI used a tool called Profiler. It's part of a typical SQL Server installation. This tool lets you record every statement that is sent to SQL Server. It also lets you record things like the amount of CPU and disk IO used to process the query. Remember that Profiler does put additional load on your SQL Server so don't run it all the time. With that in mind I fired up Profiler and pulled up DEFAULT.ASP. Here is a list of all the SQL statements that default.asp sends to the database. I truncated it at 100 characters so it would be easier to read :) SELECT FORUM_MEMBERS.MEMBER_ID, FORUM_MEMBERS.M_LEVEL, FORUM_MEMBERS.M_NAME, FORUM_MEMBERS.M_PASSWOR SELECT FORUM_MEMBERS.MEMBER_ID FROM FORUM_MEMBERS WHERE M_NAME = 'GRAZ' SELECT COUNT(*) AS SubCount FROM FORUM_SUBSCRIPTIONS SELECT FORUM_MEMBERS.MEMBER_ID, FORUM_MEMBERS.M_LEVEL, FORUM_MEMBERS.M_NAME, FORUM_MEMBERS.M_PASSWOR SELECT FORUM_CATEGORY.CAT_ID, FORUM_CATEGORY.CAT_STATUS, FORUM_CATEGORY.CAT_NAME, FORUM_CATEGORY.CAT SELECT FORUM_MEMBERS.MEMBER_ID, FORUM_MEMBERS.M_LEVEL, FORUM_MEMBERS.M_NAME, FORUM_MEMBERS.M_PASSWOR Select Count(*) as PostCount FROM FORUM_TOPICS T, FORUM_CATEGORY C, FORUM_FORUM F WHERE Select Count(*) as PostCount FROM FORUM_REPLY R, FORUM_CATEGORY C, FORUM_FORUM F WHERE SELECT FORUM_FORUM.FORUM_ID, FORUM_FORUM.F_STATUS, FORUM_FORUM.CAT_ID, FORUM_FORUM.F_SUBJECT, FORUM_ SELECT mo.FORUM_ID FROM FORUM_MODERATOR mo, FORUM_MEMBERS me WHERE mo.FORUM_ID = 10 AND mo.MEMB SELECT FORUM_ID FROM FORUM_FORUM WHERE CAT_ID=2 Select Count(*) as PostCount FROM FORUM_TOPICS T WHERE T.CAT_ID = 2 AND T.T_STATUS > 1 Select Count(*) as PostCount FROM FORUM_REPLY R WHERE R.CAT_ID = 2 AND R.R_STATUS > 1 Select Count(*) as PostCount FROM FORUM_TOPICS T WHERE T.FORUM_ID = 10 AND T.T_STATUS > 1 Select Count(*) as PostCount FROM FORUM_REPLY R WHERE R.FORUM_ID = 10 AND R.R_STATUS > 1 SELECT FORUM_MEMBERS.M_NAME FROM FORUM_MODERATOR , FORUM_MEMBERS WHERE (FORUM_MODERATOR.FORUM_ID SELECT FORUM_MEMBERS.M_NAME FROM FORUM_MODERATOR , FORUM_MEMBERS WHERE (FORUM_MODERATOR.FORUM_ID Select Count(*) as PostCount FROM FORUM_TOPICS T WHERE T.FORUM_ID = 10 AND T.T_STATUS > 1 Select Count(*) as PostCount FROM FORUM_REPLY R WHERE R.FORUM_ID = 10 AND R.R_STATUS > 1 . . . (A whole bunch of lines deleted for readability) . . . SELECT mo.FORUM_ID FROM FORUM_MODERATOR mo, FORUM_MEMBERS me WHERE mo.FORUM_ID = 7 AND mo.MEMBE Select Count(*) as PostCount FROM FORUM_TOPICS T WHERE T.FORUM_ID = 7 AND T.T_STATUS > 1 Select Count(*) as PostCount FROM FORUM_REPLY R WHERE R.FORUM_ID = 7 AND R.R_STATUS > 1 SELECT FORUM_MEMBERS.M_NAME FROM FORUM_MODERATOR , FORUM_MEMBERS WHERE (FORUM_MODERATOR.FORUM_ID SELECT FORUM_MEMBERS.M_NAME FROM FORUM_MODERATOR , FORUM_MEMBERS WHERE (FORUM_MODERATOR.FORUM_ID Select Count(*) as PostCount FROM FORUM_TOPICS T WHERE T.FORUM_ID = 7 AND T.T_STATUS > 1 Select Count(*) as PostCount FROM FORUM_REPLY R WHERE R.FORUM_ID = 7 AND R.R_STATUS > 1 SELECT M_NAME, MEMBER_ID FROM FORUM_MEMBERS WHERE M_STATUS=1 AND MEMBER_ID > 1 ORDER BY MEMBER_ID SELECT COUNT(MEMBER_ID) AS U_COUNT FROM FORUM_MEMBERS WHERE M_POSTS > 0 AND M_STATUS=1 SELECT FORUM_TOPICS.TOPIC_ID, FORUM_TOPICS.T_SUBJECT, FORUM_TOPICS.T_LAST_POST, FORUM_TOPICS.T_LAST_ SELECT COUNT(FORUM_TOPICS.T_LAST_POST) AS NUM_ACTIVE FROM FORUM_TOPICS WHERE (((FORUM_TOPICS.T_LAS SELECT FORUM_MEMBERS.MEMBER_ID, FORUM_MEMBERS.M_LEVEL, FORUM_MEMBERS.M_NAME, FORUM_MEMBERS.M_PASSWOR Default.asp ended up sending 115 SQL statements to display the page. That's a lot. That's a whole lot! The sections that I took out look alot like the six lines above and below the section I removed. And yes, I noticed it's running the same code again and again for each FORUM_ID. My goal was to only modify the database so that's not something I could solve in the scope of this article. More on that at the end though. Analyzing a QueryLet's break down a few of these queries a little further. Here are the 10 slowest queries based on Duration. Duration CPU Reads Writes TextData -------------------- ----------- -------------------- -------------------- ------------------------------ 80 20 291 0 Select Count(*) as PostCount F 60 40 385 0 Select Count(*) as PostCount F 50 20 175 0 SELECT FORUM_MEMBERS.MEMBER_ID 50 0 1 0 SELECT mo.FORUM_ID FROM FORUM 50 20 291 0 Select Count(*) as PostCount F 50 50 175 0 SELECT M_NAME, MEMBER_ID FROM 50 40 385 0 SELECT COUNT(FORUM_TOPICS.T_LA 40 20 292 0 Select Count(*) as PostCount F 40 40 386 0 Select Count(*) as PostCount F 40 20 144 0 SELECT FORUM_FORUM.FORUM_ID, F Duration is the number of millisends used to processes the query. A millisecond is 1/1000th of a second. CPU is the number of milliseconds the CPU spent processing the query. Reads is the number of logical disk reads required to process the query and Writes is the number of physical disk writes to process the query. I'm going to spend a little time analyzing this query. It determines the number of replies that need to be moderated in a forum. Select Count(*) as PostCount FROM FORUM_REPLY R WHERE R.FORUM_ID = 10 AND R.R_STATUS > 1 FORUM_REPLY is the table that holds the replies to the topics. REPLY_ID is the primary key. This query is typical of many of the queries. It appears to be run twice for each forum. I'm going to look at the SHOWPLAN output (also called an estimated execution plan). There are two ways to look at this. The first is through a graphical showplan in Query Analyzer. Highlight a query and choose Query -> Display Estimated Execution Plan (or type control+L). The second is to run the following batch: SET SHOWPLAN_ALL ON GO Select Count(*) as PostCount FROM FORUM_REPLY R WHERE R.FORUM_ID = 10 AND R.R_STATUS > 1 GOI'd suggest running this with the results in grid format. The results will look something like this: |--Compute Scalar(DEFINE:([Expr1001]=Convert([Expr1004]))) |--Stream Aggregate(DEFINE:([Expr1004]=Count(*))) |--Table Scan(OBJECT:([PerfBase].[dbo].[FORUM_REPLY] AS [R]), . . . It will also have a quite a bit of information about the query. For example it will tell you how much CPU and Disk IO it thinks the query will use. The biggest thing that I look at is the TotalSubtreeCost of the query. The cost of the entire query (at least on my system) is 0.280. This is shown on the first line. fyi - These are much easier to read in a graphical format. The last line of the execution plan says "Table Scan". That means that SQL Server is going to scan the entire FORUM_REPLY table to get this count. The esimated cost of the table scan is 0.264 so that's the bulk of this query. Also note that estimated cost doesn't depend on what's cached in memory at the time of the query. There is an index on FORUM_ID however SQL Server has decided not to use it. SQL Server compared the estimated cost (in terms of CPU and disk) of using the index and decided it was faster to scan the entire table. In order to use the index, SQL Server would first have to read through the index to find the required rows, then go read those rows to determine if R_STATUS is greater than 1. My guess is that SQL Server decided reading the extra rows was faster than having to jump around through the index and the table. Forum number 10 is my articles comment forum. It has about 1100 replies or just over 5% of the total number of replies. I have a forum that only has 6 replies. If we run this query against this forum we get the following plan: |--Compute Scalar(DEFINE:([Expr1001]=Convert([Expr1004]))) |--Stream Aggregate(DEFINE:([Expr1004]=Count(*))) |--Filter(WHERE:([R].[R_STATUS]>1)) |--Bookmark Lookup(BOOKMARK:([Bmk1000]), OBJECT:([PerfBase].[dbo].[FORUM_REPLY] AS [R])) |--Index Seek(OBJECT:([PerfBase].[dbo].[FORUM_REPLY].[FORUM_FORUM_ID] AS . . . This is more steps but the key part is that we're doing an Index Seek rather than a Table Scan. The estimated cost of this query is 0.0126 or roughly 20 times faster than having to do a table scan. Of course this forum only has 6 replies so that helps :) The Bookmark Lookup is the part where it uses the values returned from the index to look up the actual rows in the table. It does this so it can check R_STATUS. Let's assume for a second that we didn't care about R_STATUS in this query. Let's run the first query with that piece commented out. We get this query plan: |--Compute Scalar(DEFINE:([Expr1001]=Convert([Expr1002]))) |--Stream Aggregate(DEFINE:([Expr1002]=Count(*))) |--Index Seek(OBJECT:([PerfBase].[dbo].[FORUM_REPLY].[FORUM_FORUM_ID] AS ... And it has an estimated cost of 0.00789 which is even faster yet. In this case the query doesn't even touch the table (no Bookmark Lookup). It just goes to the index and counts. This is 35 times faster than our original query. Adding the IndexIf you add up all the durations on the page it runs 2,430 milliseconds or 2.4 seconds -- just for the SQL statements to execute. So how do we make this page faster? My first goal was to make a performance improvement without changing the ASP code. My first guess would be to add an index on FORUM_REPLY that is composed of FORUM_ID and R_STATUS. If we do that, our original query returns this execution plan: |--Compute Scalar(DEFINE:([Expr1001]=Convert([Expr1002]))) |--Stream Aggregate(DEFINE:([Expr1002]=Count(*))) |--Index Seek(OBJECT:([PerfBase].[dbo].[FORUM_REPLY].[IX_TEST_INDEX] . . . This is what we'd hoped to see. No table scan and no bookmark lookup. This query doesn't even touch the table anymore. The estimated cost is .00641. That's even faster than before! It's 43 times faster than our original query. Holy Smokes! So how does this affect the total performance of the page? The total time spent executing SQL statements dropped from 2.43 seconds to 1.93 seconds which is a 20% drop. Not bad but I'd hoped to see more :) I did have to add an index to a table which will slow down inserts and some updates. There is also an index we can remove though. Notice the following index list: [CAT_ID],[FORUM_ID],[TOPIC_ID],[REPLY_ID] [CAT_ID], [FORUM_ID], [TOPIC_ID] [FORUM_ID] [REPLY_ID] [TOPIC_ID] [FORUM_ID], [R_STATUS] <-- my new index The first index is created in SQL Server when you define those fields as the primary key. The second is created by a CREATE INDEX statement. I think the second index is redundant and I've removed it from my database with no ill effects so far. This gets me back to the same number of indexes as before. I could probably also remove the index on FORUM_ID since those queries could also be handled by my new index. Other ThoughtsIf you've really looked closely at the SQL at the top of the page you can see that the same SQL is called for each forum. The way this code is written, the more forums you have, the slower your default page will load. For every forum that you add default.asp will generate two table scans for FORUM_REPLY and two tables scans of FORUM_TOPIC. It's not just that more forums equals more posts but that more forums equals two extra table scans of the entire table for each forum added. This is one case in which just changing the SQL doesn't do enough for performance. I'm guessing this same data could be returned by a query like this: SELECT FORUM_ID, COUNT(*) AS PostCount FROM FORUM_REPLY WHERE R_STATUS > 1 GROUP BY FORUM_ID With the additional index this has an estimated cost of 0.103 and without the extra index the cost is 0.103! I have thirteen forums. That generates 26 table scans of FORUM_REPLY at a total cost of 7.28. A little extra coding and caching on the client side can reduce that by a factor of 70! Very similar math can be done on FORUM_TOPIC also. It should also help Access and MySQL just as much. Unfortunately that would require a pretty extensive rewrite of default.asp. Hopefully I'll talk them into it for a future version :) ConslusionI hope you found that interesting. Let me know if you're interested in seeing future articles like this. Or let me know if you thought this was a waste of your time. I know there is a lot more I can do with this table and with these queries. I just wanted to walk you through the thought process of some performance tuning. And if you're looking for great forum software for your site, I highly recommend Snitz. In spite of these problems I found here it's pretty darned fast. And very flexible. And future versions are sure to be faster :)
|
- Advertisement - |