Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
Author |
Topic |
dmeagor
Starting Member
16 Posts |
Posted - 2012-01-30 : 16:49:55
|
HiI have a table with 250 million rows which stores answers from a number of web forms. It's structure looks like this...id[int] (PK)form_id[int] session_id[int] qu_id[int]subQu_id[int]intAns[int]txtAns[varchar(4000)]clustered index on idIndex1 on form_idIndex2 on session_idWhen checking the queries against this table it appears that none are using the primary key "id" and no other table refers to the "id" column either.All of the queries are against these columns.form_id,session_id,qu_id,subQu_idform_id,session_id,qu_idform_id,session_idform_idFrom my limited understanding if I were to delete the unused "id" column and create a new primary key/clustered index on (form_id,session_id,qu_id,subQu_id) that that would cover all of the above.However the table has a very high level of INSERTS and DELETES (almost no UPDATES) each on an average of about 40 rows at a time.SELECTS tend to be occasional but retrieve large amounts of data in one go, querying against the "form_id" field (maybe 50,000 records retrieved per query.) though time is less important in this case.What option would you experts recommend.1. Leave it as it is with a clustered "ID" column which is not used but ensures new inserts are just added to the end of the index for speed.3. Have no clustered index at all and just use the existing indexes on form_id and session_id.2. have a clustered index of (form_id,session_id,qu_id,subQu_id)4. something else.Any advice really appreciated.David Meagor |
|
Transact Charlie
Master Smack Fu Yak Hacker
3451 Posts |
Posted - 2012-01-30 : 17:40:27
|
depends if new inserts get a higher [formID] value than old inserts.If your application is insert heavy then this would be the deciding factor. Assuming that new formID's are higher than old ones then I don't see a problem clustering on that.If formID's are random though then you'll be shuffling the table every time you insert. 250 million rows, you don't want that.Do you every query the [txtAns] column? if people can shove a ton of text in there but you don't want to search it maybe you should hive it off to it's own table to get a higher page density for this one. I'm not sure if VARCHAR(MAX) might not also be a better option. I think that is just a pointer which would take up less space on the page.Do you really never use the ID value? How do you do DELETES? are they also based on the formID...... searches?Charlie===============================================================Msg 3903, Level 16, State 1, Line 1736The ROLLBACK TRANSACTION request has no corresponding BEGIN TRANSACTION |
|
|
GilaMonster
Master Smack Fu Yak Hacker
4507 Posts |
Posted - 2012-01-30 : 17:41:31
|
My initial thoughts:Leave the clustered index as-is (the new clustered index you propose is quite wide and not ever-increaseing). Create a nonclustered index on (form_id,session_id,qu_id,subQu_id), create it unique if those columns are unique. Delete both of the existing nonclustered indexes if there are no queries that filter only on session_id.--Gail ShawSQL Server MVP |
|
|
dmeagor
Starting Member
16 Posts |
Posted - 2012-01-31 : 13:29:16
|
quote: Originally posted by Transact Charlie depends if new inserts get a higher [formID] value than old inserts.
Each form has many sessions, each session has many qu_id's each qu_id has many subQu_id's. "form_id" is a foreign key autoincremented in another table but in this table rows inserted could be for any active form. quote: If formID's are random though then you'll be shuffling the table every time you insert. 250 million rows, you don't want that.
If I understand correctly this means that if a page in the table is full then 50% of the data is moved to a new page. Does the table size complicate this?quote: Do you every query the [txtAns] column? if people can shove a ton of text in there but you don't want to search it maybe you should hive it off to it's own table to get a higher page density for this one. I'm not sure if VARCHAR(MAX) might not also be a better option. I think that is just a pointer which would take up less space on the page.
No txtAns it's not searched. I will look into using the Varchar(MAX) instead. I didn't realize there was a difference.quote: Do you really never use the ID value? How do you do DELETES? are they also based on the formID...... searches?
only entire forms or sessions are ever deleted. no need to delete on an ID column.David Meagor |
|
|
dmeagor
Starting Member
16 Posts |
Posted - 2012-01-31 : 13:42:25
|
quote: Originally posted by GilaMonsterLeave the clustered index as-is (the new clustered index you propose is quite wide and not ever-increaseing). Create a nonclustered index on (form_id,session_id,qu_id,subQu_id), create it unique if those columns are unique. Delete both of the existing nonclustered indexes if there are no queries that filter only on session_id.
That makes sense. Yes they are unique.What is the purpose of the clustered index in this case though if it's never going to be searched on? Wouldn't a heap be more suitable?Thanks for your suggestions!David Meagor |
|
|
GilaMonster
Master Smack Fu Yak Hacker
4507 Posts |
Posted - 2012-01-31 : 15:16:21
|
No, a heap would not be more suitable. SQL is partially designed with the idea that tables have clustered indexes. I like the cluster to organise the table, not primarily for queries.--Gail ShawSQL Server MVP |
|
|
|
|
|
|
|