Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2005 Forums
 SQL Server Administration (2005)
 cluster on column with 2 values?

Author  Topic 

inbs
Aged Yak Warrior

860 Posts

Posted - 2010-09-12 : 04:33:25
i have table with 1M rows with column "STATUS"

STATUS
1
2

half rows have status=1 and another have status=2

what is the selectivity?
should i build on it index (cluster or noncluster), i have some qury like
select * from table where STATUS=2 and startdate>20100101
select * from table where STATUS=1 and startdate>20100101

slimt_slimt
Aged Yak Warrior

746 Posts

Posted - 2010-09-12 : 05:28:00
if you have very frequent select distinquishing based on column status you definitly need index on this column.

i suggest you build nonclustered index on column STATE or STARTDATE, because you probably have clustered index on another column that uniquely distinguish each of the rows.

you might optimize this nonclustered index with column STARTDATE, if it turns out to be faster than just having nonclustered index on column STATE. but I assume that you have more different values in column STARTDATE than in column STATE so i suggest you build nonclustered index firstly on STARTDATE column, test it and then if it turns out that you might optimize your query, add also index on STATE or not.

but there are also downsize to this indexing. if this is production table you ought to be aware that adding additional index might slow the process of insert, update or delete in your application. as well you should be aware of space taken up for each extra index, etc. etc. and also what are the other queries running against this table or if you are joining this table with other tables, which would be best index.

you should also consider your select statement if you are writing query correctly -> using right format on date column STARTDATE, etc.

nevertheless, 1M rows is relatively small amount of data, so you can easly test which is the fastest solution.
Go to Top of Page

GilaMonster
Master Smack Fu Yak Hacker

4507 Posts

Posted - 2010-09-12 : 06:13:04
That's a really bad column for a clustered index. Clustered indexes should be as unique as possible.

Based on the queries you list, try a nonclustered index on (state, startdate). That's one index with the columns in that order. Then replace the SELECT * with an explicit list of columns that you need.

--
Gail Shaw
SQL Server MVP
Go to Top of Page

inbs
Aged Yak Warrior

860 Posts

Posted - 2010-09-12 : 06:25:55
i do not understsnd,

SELECT DISTINCT STATUS FROM TABLEA

i get 2 valuse(1,2)

is it right to build nonclustered on this column(even i have 2 distinct valuse in table with 1 M rows)?

2/1M =0.000000%

(i read about selectivity, and i have low ratio,so i didnt build index)
Go to Top of Page

GilaMonster
Master Smack Fu Yak Hacker

4507 Posts

Posted - 2010-09-12 : 07:20:44
Selectivity isn't just about the leading column. Your queries have two columns in them, the index should also have two columns in it.
A noncovering index with just the state column is almost certainly useless. A 2-column index may or may not be used, depending how many rows the combination will return.

If the index can be made covering, selectivity is no longer an issue. That's why I suggested removing the * and selecting just the columns absolutely required, as it's very hard to cover a SELECT * query

--
Gail Shaw
SQL Server MVP
Go to Top of Page

inbs
Aged Yak Warrior

860 Posts

Posted - 2010-09-12 : 15:01:17
how can i know if is high/low selectivity?

first query return 300K rows
Second query return 600K rows

what is mean for the index?
Go to Top of Page

namman
Constraint Violating Yak Guru

285 Posts

Posted - 2010-09-16 : 19:51:30
quote:

i have table with 1M rows with column "STATUS"

STATUS
1
2

half rows have status=1 and another have status=2

what is the selectivity?
should i build on it index (cluster or noncluster), i have some qury like
select * from table where STATUS=2 and startdate>20100101
select * from table where STATUS=1 and startdate>20100101

how can i know if is high/low selectivity?

first query return 300K rows
Second query return 600K rows




With that kind of data, your two queries are not much different from this: select * from table.

I think, in this case, index provides either little help or none. SQL will prefer to scan. If you force SQL to use index, performance may be worse.


Go to Top of Page

dinakar
Master Smack Fu Yak Hacker

2507 Posts

Posted - 2010-09-17 : 12:48:11
you can also use the INCLUDE clause to have "Status" type columns included..

Dinakar Nethi
************************
Life is short. Enjoy it.
************************
http://weblogs.sqlteam.com/dinakar/
Go to Top of Page
   

- Advertisement -