Clustered indexes - So very confused!

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

General SQL Server Forums

New to SQL Server Administration

Clustered indexes - So very confused!

Author

Topic

dmeagor
Starting Member

16 Posts

Posted - 2012-01-30 : 16:49:55

Hi

I have a table with 250 million rows which stores answers from a number of web forms. It's structure looks like this...

id[int] (PK)
form_id[int]
session_id[int]
qu_id[int]
subQu_id[int]
intAns[int]
txtAns[varchar(4000)]

clustered index on id
Index1 on form_id
Index2 on session_id

When checking the queries against this table it appears that none are using the primary key "id" and no other table refers to the "id" column either.

All of the queries are against these columns.
form_id,session_id,qu_id,subQu_id
form_id,session_id,qu_id
form_id,session_id
form_id

From my limited understanding if I were to delete the unused "id" column and create a new primary key/clustered index on (form_id,session_id,qu_id,subQu_id) that that would cover all of the above.

However the table has a very high level of INSERTS and DELETES (almost no UPDATES) each on an average of about 40 rows at a time.

SELECTS tend to be occasional but retrieve large amounts of data in one go, querying against the "form_id" field (maybe 50,000 records retrieved per query.) though time is less important in this case.

What option would you experts recommend.

1. Leave it as it is with a clustered "ID" column which is not used but ensures new inserts are just added to the end of the index for speed.

3. Have no clustered index at all and just use the existing indexes on form_id and session_id.

2. have a clustered index of (form_id,session_id,qu_id,subQu_id)

4. something else.

Any advice really appreciated.

David Meagor

Transact Charlie
Master Smack Fu Yak Hacker

3451 Posts

Posted - 2012-01-30 : 17:40:27

depends if new inserts get a higher [formID] value than old inserts.

If your application is insert heavy then this would be the deciding factor. Assuming that new formID's are higher than old ones then I don't see a problem clustering on that.

If formID's are random though then you'll be shuffling the table every time you insert. 250 million rows, you don't want that.

Do you every query the [txtAns] column? if people can shove a ton of text in there but you don't want to search it maybe you should hive it off to it's own table to get a higher page density for this one. I'm not sure if VARCHAR(MAX) might not also be a better option. I think that is just a pointer which would take up less space on the page.

Do you really never use the ID value? How do you do DELETES? are they also based on the formID...... searches?

Charlie
===============================================================
Msg 3903, Level 16, State 1, Line 1736
The ROLLBACK TRANSACTION request has no corresponding BEGIN TRANSACTION

GilaMonster
Master Smack Fu Yak Hacker

4507 Posts

Posted - 2012-01-30 : 17:41:31

My initial thoughts:

Leave the clustered index as-is (the new clustered index you propose is quite wide and not ever-increaseing). Create a nonclustered index on (form_id,session_id,qu_id,subQu_id), create it unique if those columns are unique. Delete both of the existing nonclustered indexes if there are no queries that filter only on session_id.

--
Gail Shaw
SQL Server MVP

dmeagor
Starting Member

16 Posts

Posted - 2012-01-31 : 13:29:16

quote:
Originally posted by Transact Charlie

depends if new inserts get a higher [formID] value than old inserts.

Each form has many sessions, each session has many qu_id's each qu_id has many subQu_id's. "form_id" is a foreign key autoincremented in another table but in this table rows inserted could be for any active form.

quote:

If formID's are random though then you'll be shuffling the table every time you insert. 250 million rows, you don't want that.

If I understand correctly this means that if a page in the table is full then 50% of the data is moved to a new page. Does the table size complicate this?

quote:

Do you every query the [txtAns] column? if people can shove a ton of text in there but you don't want to search it maybe you should hive it off to it's own table to get a higher page density for this one. I'm not sure if VARCHAR(MAX) might not also be a better option. I think that is just a pointer which would take up less space on the page.

No txtAns it's not searched. I will look into using the Varchar(MAX) instead. I didn't realize there was a difference.

quote:

Do you really never use the ID value? How do you do DELETES? are they also based on the formID...... searches?

only entire forms or sessions are ever deleted. no need to delete on an ID column.

David Meagor

dmeagor
Starting Member

16 Posts

Posted - 2012-01-31 : 13:42:25

quote:
Originally posted by GilaMonster
Leave the clustered index as-is (the new clustered index you propose is quite wide and not ever-increaseing). Create a nonclustered index on (form_id,session_id,qu_id,subQu_id), create it unique if those columns are unique. Delete both of the existing nonclustered indexes if there are no queries that filter only on session_id.

That makes sense. Yes they are unique.

What is the purpose of the clustered index in this case though if it's never going to be searched on? Wouldn't a heap be more suitable?

Thanks for your suggestions!

David Meagor

GilaMonster
Master Smack Fu Yak Hacker

4507 Posts

Posted - 2012-01-31 : 15:16:21

No, a heap would not be more suitable. SQL is partially designed with the idea that tables have clustered indexes. I like the cluster to organise the table, not primarily for queries.

--
Gail Shaw
SQL Server MVP

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources