Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 General SQL Server Forums
 New to SQL Server Programming
 Low Performance on query calling functions

Author  Topic 

DanielA
Starting Member

2 Posts

Posted - 2011-11-02 : 11:56:10
Hello,

I've been doing some fairly basic things in SQL server for a few years, with some simple select and update statements. Recently I was asked to find all possible double companies in the companies CRM system. I got my hands on a JaroWinkler algorithm which works quite well at comparing address in a fuzzy method.

Basically the function takes in two variables and returns a matching percentage.

The tidbit of code compares the entries in the address table of the CRM to each other and finds out how much they are alike.
When I run the comparison in the manner i just wrote, it takes rather long to execute.

I have not been able to use the variable returned by the function in the query, so I nested the query and check on the comparison afterwards. However this leads to a rather massive performance issue. Would anyone know a way to do this trick more efficiently? I'd be much obliged.

<CODE>

Select AID,BID,AName,BName,NameScore,
AAdres,BAdres,AdresScore,
AZipCode,BZipCode,ZipCodeScore
FROM (

select
a.[ID Company] as AID,b.[ID Company] as BID,
a.Name as AName, b.Name as BName,
(select score from CompareJaroWinkler(a.Name,b.Name)) as NameScore,
a.Adres + ' ' + a.Huisnummer as AAdres, b.Adres + ' ' + b.Huisnummer as BAdres,
(select score from CompareJaroWinkler(a.Adres + a.Huisnummer,b.adres + b.Huisnummer)) as AdresScore,
a.ZipCode as AZipCode, b.ZipCode as BZipCode,
(select score from CompareJaroWinkler(a.ZipCode,b.ZipCode)) as ZipCodeScore
from Zeno a, Zeno b
where a.Name<>'' and b.Name<>''
) compare
WHERE AID<>BID
and NameScore > 0.8
and (AdresScore > 0.8 or AdresScore is null)
and (ZipCodeScore > 0.8 or ZipCodeScore is null)

</CODE>

Kristen
Test

22859 Posts

Posted - 2011-11-02 : 12:11:34
Dunno about JaroWinkler, I've only come across Levenshtein for this type of thing.

Here's a post with a method that might work, and it also has a link to Levenshtein method too
http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=83128
Go to Top of Page

DanielA
Starting Member

2 Posts

Posted - 2011-11-03 : 10:59:19
Thank you, that post did help a bit.

This query checks the companies before running which is quicker. It might be better to store the function result in a variable so it doesn't have to be called twice.

It speeds things up by 50%. It remains slow, but at least it works now. Aside from that I first filtered the results into a temp table to decrease the amount of records by 50% as was suggested.

<CODE>
SELECT set1.companyID,set2.companyID,
set1.Name,set2.Name,
(select score from CompareJaroWinkler(set1.Name,set2.Name)) as NaamScore,
set1.Adres,set2.Adres,
(select score from CompareJaroWinkler(set1.adres,set2.Adres)) as AdresScore,
set1.Zipcode,set2.Zipcode,
(select score from CompareJaroWinkler(set1.Zipcode,set2.Zipcode)) as PostcodeScore
FROM @zeno1 set1
inner join @zeno1 set2
on set1.CompanyID<>set2.CompanyID
and ((select score from CompareJaroWinkler(set1.Name,set2.Name)) > 0.8
and (select score from CompareJaroWinkler(set1.adres,set2.adres)) > 0.9
and (select score from CompareJaroWinkler(set1.Zipcode,set2.Zipcode)) > 0.9)
</CODE>
Go to Top of Page
   

- Advertisement -