Matching rows based off another row by score

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

General SQL Server Forums

New to SQL Server Programming

Matching rows based off another row by score

Author

Topic

realities
Starting Member

3 Posts

Posted - 2011-06-20 : 14:22:08

Greetings, I'm new here and fairly new to SQL programming.

My application needs to find all the rows in a database that are similar to a row of the same table, sorted by relevance (or matching score). This table consists of about 15 int fields, so I need to check each entry and assign a score based off how many fields are matching exactly and less value to a row partially matching all fields.

A example would be
Row 1
5 13 17 324 4
Row 2
5 15 17 3 1
Row 3
6 13 17 323 4
Row 4
7 2 25 68 5

Check against
6 13 17 324 1

and the result should be something like
Row 1 score 3
Row 2 score 2
Row 3 score 3
Row 4 score 0

And then sort that by score.

Is that something doable in a single query while keeping it fast enough?

Thanks,

russell
Pyro-ma-ni-yak

5072 Posts

Posted - 2011-06-20 : 14:54:51

Are these values in seperate columns?

realities
Starting Member

3 Posts

Posted - 2011-06-20 : 15:04:32

Yes, they are all in separate columns, thanks for asking :-)

Seventhnight
Master Smack Fu Yak Hacker

2878 Posts

Posted - 2011-06-20 : 15:31:25

[code]
Declare @t1 table (
r int,
a int,
b int,
c int,
d int,
e int,
Primary Key (r)
)

Insert Into @t1 Select 1, 5, 13, 17, 324, 4
Insert Into @t1 Select 2, 5, 15, 17, 3, 1
Insert Into @t1 Select 3, 6, 13, 17, 323, 4
Insert Into @t1 Select 4, 7, 2, 25, 68, 5

-- Generates ~2000 records

Insert Into @t1
Select
number,
a = convert(int,RAND(CHECKSUM(NEWID())%1000000000)*5) + 5,
b = convert(int,RAND(CHECKSUM(NEWID())%1000000000)*10) + 15,
c = convert(int,RAND(CHECKSUM(NEWID())%1000000000)*15) + 15,
d = convert(int,RAND(CHECKSUM(NEWID())%1000000000)*400) + 1,
e = convert(int,RAND(CHECKSUM(NEWID())%1000000000)*5) + 1
From master..spt_values
Where type = 'P'
and number > 4

Declare @t2 table (
r int,
a int,
b int,
c int,
d int,
e int,
Primary Key (r)
)

Insert Into @t2 Select 1, 6, 13, 17, 324, 1

--Query Starts Here --v
Select
A.r,
s = sum(ISNULL(B.n,0))
From
(
SELECT r, col, val
FROM @t1 p
UNPIVOT
(
val FOR col IN (a,b,c,d,e)
) AS unpvt1
) A
Left Join
(
SELECT r, col, val, n = 1
FROM @t2 p
UNPIVOT
(
val FOR col IN (a,b,c,d,e)
) AS unpvt2
) B
On A.col = B.col
and A.val = B.val
Group By A.r
Order By s desc
[/code]

Corey

I Has Returned!!

realities
Starting Member

3 Posts

Posted - 2011-06-20 : 15:44:59

Wow, this is great! Thanks!
Probably the longest query I've ever done!

I'm curious though, is my goal something that could potentially be quite inefficient, or can it be fairly fast on a table of several thousand entries?

Seventhnight
Master Smack Fu Yak Hacker

2878 Posts

Posted - 2011-06-20 : 15:57:33

I'm not very familiar with 'unpivot' performance... but see the code in blue above.

It seems to be ok in my sample... but it does do a lot of reads:


Table '#4F145267'. Scan count 1, logical reads 20471, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#4B43C183'. Scan count 1, logical reads 11, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Corey

I Has Returned!!

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources