Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2005 Forums
 SQL Server Administration (2005)
 Full Text With Fuzzy matching

Author  Topic 

AskSQLTeam
Ask SQLTeam Question

0 Posts

Posted - 2007-01-11 : 06:53:58
Andrew Worral writes "I am currenly working on a website that uses Full Text search to search the name of companies.

We are having trouble figuring out what tools are best suited for this with SQL 2005 Standard/Enterprise and how to implement them.

The first issue to address would be Misspelling of words. Such as Looking For "Davids Shoe Repare" and returning "David's Shoe Repair"

Besides the spelling in "Davids Shoe Repare" there is also the issue of the " ' " in David's which we have not come up with a good solution for yet. So a search for David will not returns "David's"

I have done a little looking into Fuzzy matching with Integration Services but I am not sure this is the right tool, nor am I sure of the overhead involved and any speed issues with this. Nor am I in any way overly familiar with Integration Services.

What would you suggest?

Thank you in advance!

-Andy"

snSQL
Master Smack Fu Yak Hacker

1837 Posts

Posted - 2007-01-11 : 11:45:21
I don't have a direct answer but I can tell you that Integration Services won't be the solution. SSIS is for importing, exporting, transforming data, not for online queries or searches/operations.

Have you fully explored the options of the CONTAINS predicate with your fulltext index? I would think that using *, FORMSOF, INFLECTIONAL, THESAURUS, NEAR, and ISABOUT would give you the results you want?
Go to Top of Page

AndyNY
Starting Member

4 Posts

Posted - 2007-01-11 : 17:10:38
I'll have to look more into them I guess. Anyone know of some good guides for FullText search or some examples that would be beneficial?
Go to Top of Page

snSQL
Master Smack Fu Yak Hacker

1837 Posts

Posted - 2007-01-11 : 17:16:14
Have you even looked at CONTAINS and FREETEXT in Books Online, there are a bunch of examples.
Go to Top of Page

AndyNY
Starting Member

4 Posts

Posted - 2007-01-11 : 18:41:31
Thanks that got me a bit closer. What are the rules for trying to compound the FORMSOF and ISABOUT commands? Can you nest them or only and / or them again eachother on a containstable pull?

Also what is the best way do close spellings. That is probably my biggest problem right now. Is there any fast way to run a spell check on the terms against my own dictionary file built from one of my columns using SQL? (like word's suggested spellings)

Thanks!

-Andy
Go to Top of Page

AndyNY
Starting Member

4 Posts

Posted - 2007-01-11 : 18:55:18
By the way... I have looked at the MSDN documentation and online books. But for some reason all the examples for FREETEXT Search seem to be shallow.

I don't follow exactly how the FullText engine works with passed queries. (nesting etc like I mentioned in last post)

Also I know I can create a dictionary table and run a difference command against a parsed word list. But these seem like they should be pretty common functions in the business world. So I thought maybe I was missing something that might already exist that would be a good fit, rather the redesigning the wheel.

Thanks for any help you can give though!
Go to Top of Page

AndrewMurphy
Master Smack Fu Yak Hacker

2916 Posts

Posted - 2007-01-12 : 07:51:20
It might be a good idea if you can "cleanup" your search criteria first, so that you can provide some assistance to the search engine used....but that would mean building soem "celaup process"....but at least it would be tuned to your needs...rather than be generic.
Go to Top of Page

AndyNY
Starting Member

4 Posts

Posted - 2007-02-05 : 11:45:48
Well I apreciate everyone's help and have a solution about 90% ready

The final issue I am having is weighting on isabout in a full text search.

If I do an isabout pull with this weighting.

isabout("andy*" weight (1.00), "handy*" weight (.6), "handys*" weight (.5))

would a higher weighting be given to the word "Andy" or "Handys"? Because I am getting "Handys" sometimes.

Basically what my alogorithm does is it first take your search string. Splits it into words. Runs a double metaphone, and edit distance match on a word dictionary to get back other words and then I create a Full text search based on those words with weightings set based on how close the word was to original. with a perfect match being weight of 1.00.

For some reason my result set's ranking seem off according to the weightings I am using. Perhaps I should try a different weighting alogorithm?

Any suggestions?

Go to Top of Page
   

- Advertisement -