Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
| Author |
Topic |
|
DanielA
Starting Member
2 Posts |
Posted - 2011-11-02 : 11:56:10
|
| Hello,I've been doing some fairly basic things in SQL server for a few years, with some simple select and update statements. Recently I was asked to find all possible double companies in the companies CRM system. I got my hands on a JaroWinkler algorithm which works quite well at comparing address in a fuzzy method.Basically the function takes in two variables and returns a matching percentage. The tidbit of code compares the entries in the address table of the CRM to each other and finds out how much they are alike.When I run the comparison in the manner i just wrote, it takes rather long to execute.I have not been able to use the variable returned by the function in the query, so I nested the query and check on the comparison afterwards. However this leads to a rather massive performance issue. Would anyone know a way to do this trick more efficiently? I'd be much obliged.<CODE>Select AID,BID,AName,BName,NameScore, AAdres,BAdres,AdresScore, AZipCode,BZipCode,ZipCodeScoreFROM (select a.[ID Company] as AID,b.[ID Company] as BID, a.Name as AName, b.Name as BName, (select score from CompareJaroWinkler(a.Name,b.Name)) as NameScore, a.Adres + ' ' + a.Huisnummer as AAdres, b.Adres + ' ' + b.Huisnummer as BAdres, (select score from CompareJaroWinkler(a.Adres + a.Huisnummer,b.adres + b.Huisnummer)) as AdresScore, a.ZipCode as AZipCode, b.ZipCode as BZipCode, (select score from CompareJaroWinkler(a.ZipCode,b.ZipCode)) as ZipCodeScore from Zeno a, Zeno b where a.Name<>'' and b.Name<>'' ) compareWHERE AID<>BIDand NameScore > 0.8and (AdresScore > 0.8 or AdresScore is null)and (ZipCodeScore > 0.8 or ZipCodeScore is null)</CODE> |
|
|
Kristen
Test
22859 Posts |
Posted - 2011-11-02 : 12:11:34
|
| Dunno about JaroWinkler, I've only come across Levenshtein for this type of thing.Here's a post with a method that might work, and it also has a link to Levenshtein method toohttp://www.sqlteam.com/forums/topic.asp?TOPIC_ID=83128 |
 |
|
|
DanielA
Starting Member
2 Posts |
Posted - 2011-11-03 : 10:59:19
|
| Thank you, that post did help a bit.This query checks the companies before running which is quicker. It might be better to store the function result in a variable so it doesn't have to be called twice. It speeds things up by 50%. It remains slow, but at least it works now. Aside from that I first filtered the results into a temp table to decrease the amount of records by 50% as was suggested.<CODE>SELECT set1.companyID,set2.companyID,set1.Name,set2.Name,(select score from CompareJaroWinkler(set1.Name,set2.Name)) as NaamScore,set1.Adres,set2.Adres,(select score from CompareJaroWinkler(set1.adres,set2.Adres)) as AdresScore,set1.Zipcode,set2.Zipcode,(select score from CompareJaroWinkler(set1.Zipcode,set2.Zipcode)) as PostcodeScore FROM @zeno1 set1inner join @zeno1 set2 on set1.CompanyID<>set2.CompanyID and ((select score from CompareJaroWinkler(set1.Name,set2.Name)) > 0.8and (select score from CompareJaroWinkler(set1.adres,set2.adres)) > 0.9and (select score from CompareJaroWinkler(set1.Zipcode,set2.Zipcode)) > 0.9)</CODE> |
 |
|
|
|
|
|
|
|