Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2000 Forums
 SQL Server Development (2000)
 Optimising Free Text query

Author  Topic 

Kristen
Test

22859 Posts

Posted - 2007-10-06 : 05:52:58
If a query includes some "table stuff" and some "free text" stuff is the "free text stuff" constrained to the rows that are matched by the "table stuff"?

I've been doing some optimisation work for a client and they had something like

SELECT ...
FROM MyTable
WHERE Active = 0
AND CONTAINS(...)

I'm confident that the core table query is efficient (I know it doesn't look it in this snippet).

We moved 80% of the data (roughly from 1,000,000 rows to 200,000) to an "archive" table and the query speed has gone up dramatically, and "busy time" CPU has dropped from 80% to 30%

I'm inclined to think that the CONTAINS was looking at everything in the Free Text Catalogue, rather than just the rows from the table-stuff query.

Any thoughts?

Kristen

rmiao
Master Smack Fu Yak Hacker

7266 Posts

Posted - 2007-10-06 : 12:32:05
Yes, full text index query uses full text catalogs. That's why query result will be off if don't populate catalog properly.
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2007-10-06 : 13:16:12
Sorry rmiao, but I haven't understood the significance of your comment.

We have a full text catalogue just for this table. it is correctly populated (as far as I know).

Does the CONTAINS always search the whole of the Catalogue, regardless of number of rows the rest of the (SQL) query returns?

Thanks,

Kristen
Go to Top of Page

rmiao
Master Smack Fu Yak Hacker

7266 Posts

Posted - 2007-10-06 : 19:48:38
Yes as I understand.
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2007-10-07 : 02:33:30
OK, that explains the performance problems they have been having, thanks for that.

Kristen
Go to Top of Page

iminore
Posting Yak Master

141 Posts

Posted - 2007-10-07 : 12:51:16
More generally a statement such as:
Go to Top of Page

iminore
Posting Yak Master

141 Posts

Posted - 2007-10-07 : 12:53:46
More generally a statement such as:

select * from T where col1 = 'A' and col2 = 'B'

Does this read all the col1s and all the col2s or just the col2s where col1 = 'A' ?
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2007-10-07 : 13:17:29
In general (but you would have to check the specifics for a "real" case):

If neither column is index SQL Server will do a table scan - that's like thumb'ing through every record checking if col1 = 'A' and col2 = 'B'.

If one is indexed AND the index has high selectivity then that index will be used (so matching whichever column is indexed). Then the rows for that match are retrieved and the test is made for the other column.

If both columns are contained in a single index (and I think one of them will need to be the FIRST key in that index), then the index will "cover" the WHERE clause, and the col1 = 'A' and col2 = 'B' will be done and then the row will be retrieved. Again, the index needs to have high enough selectivity.

If the "SELECT *" was actually a list of a few columns which was ALSO "Covered" by the index then the whole query would be solved from the index without actually retrieving the record at all. Clever stuff eh?!

The "Selectivity" of the index is "guessed" by using the Statistics, which provides the Query Optimisers with a Hint/Best-Guess as to whether the query will be more quickly answered from the Index, or just by thumb'ing through all the records.

Make sure your Statistics are kept up to date through good housekeeping / maintenance

Kristen
Go to Top of Page

iminore
Posting Yak Master

141 Posts

Posted - 2007-10-08 : 04:21:06
I was under the impression that while an index helps in sorting it doesn't help in searching. From what you say I'm wrong.
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2007-10-08 : 04:46:40
"From what you say I'm wrong"

Yup, an appropriate index will help MASSIVELY in searching

Kristen
Go to Top of Page

iminore
Posting Yak Master

141 Posts

Posted - 2007-10-09 : 04:17:20
hmm

That's a big word 'Appropriate'. Go on - give me a clue.
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2007-10-09 : 04:38:16
"'Appropriate'"

Thought I'd covered it above.

Index must be Selective and Relevant to the query.

An index on a Date column won't help you search by the Name column

An index on a Yes/No column (where the data is 50:50) is not going to be sufficiently selective, and won't be used. (Although an index on Yes/No which is 1:99 might well be used to speed up finding the 1% choice).

An index for which the Statistics are not fairly regular updated, via a decent Housekeeping/Maintenance strategy, are going to send SQL Server down a wrong turn - i.e. may caused an Index to be used when it is a poor choice (e.g. lots of data deleted since Statistics updated) or fail to use an index (e.g. lots of data inserted)

Kristen
Go to Top of Page

jackv
Master Smack Fu Yak Hacker

2179 Posts

Posted - 2007-10-09 : 05:36:11
Another thing to consider , is that sometimes you can return far more records than required from the Full Text Catalog. For example , if you have a search of different cars , with all the various sub categories , which might be published in different publications , you might consider with coming up different strategies in filtering the data at the Full Text Catalog , PRIOR to it returning to sql server. For example , filter on "car" "red" "Publication=myPub"(which is something you add to the record) . This may require some string manipulation , to format text for the client side


Jack Vamvas
--------------------
Search IT jobs from multiple sources- http://www.ITjobfeed.com/SQL
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2007-10-09 : 06:43:21
The client is currently searching on any combination that the user provides. So if they ask for "Red Car" they do a CONTAINS for that. However, there are several columns in the Free Text Catalogue that could contain those terms, so in effect the query becomes

Col1 CONTAINS("Red Car")
OR Col2 CONTAINS("Red Car")

CONTAINS(Col1 , '"Red*" OR "Car*"')
OR CONTAINS(Col2 , '"Red*" OR "Car*"')

...

Maybe it is possible to have Col1 and Col2 share the same Catalogue, and then just search the Catalogue?

Edit: Slightly more realistic syntax for Contains!

Kristen
Go to Top of Page

jackv
Master Smack Fu Yak Hacker

2179 Posts

Posted - 2007-10-09 : 08:56:46
The following is dependant on the data model, which hasn't really been outlined. I was thinking more of adding like prefix information to a given column , which would allow the records to be further filtered such as : CONTAINS(Col1 , ('"Red*" OR "Car*" AND "Catlog=example"'),
you would place the text = "Catalog=example" in the column at the data entry point .Therefore , the recordset coming back from the Full Text would be less .
At the SELECT point you would do a Replace to make sure the "catlog=example" doesn't go through to the client side. This would decrease CPU pressure


Jack Vamvas
--------------------
Search IT jobs from multiple sources- http://www.ITjobfeed.com/SQL
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2007-10-09 : 09:35:00
That's an interesting idea.

The more I think about it the more I think their dynamic SQL is jacking up the size, and number of iterations , of the query.

I'll ask them for some real-world examples out of SQL Profiler and see what they look like.

Thanks for your help,

Kristen
Go to Top of Page
   

- Advertisement -