Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 General SQL Server Forums
 New to SQL Server Programming
 Number of matching columns...

Author  Topic 

NewMedia42
Starting Member

35 Posts

Posted - 2010-10-10 : 21:45:55
Hi, I have a tricky problem that I haven't found an elegant solution for. I have a large number of columns (84) that I'm storing different hash values in, and what I need to know is how many columns match, and ideally maybe even have them returned in some order (such as most matches to least matches). I also need to know which columns match eachother, but if I can get the number of matches, I can figure the rest out. For a realworld example, let's say my DB only had 4 columns:

V1 V2 V3 V4
Row1 1 5 14 2
Row2 1 2 44 88
Row3 3 2 44 81
Row4 2 5 22 46

So in the above example, ideally I could get back something like:

Row3, 2 matches
Row2, 2 matches (even though it matches Row1 for 1, I need it to be the most it shares with any other single row)
Row1, 1 match
Row4, 1 match

Even more helpful, but not necessary, would be if it would identify which rows it had the matches with, for example:

Row3, 2 matches, Row2
Row2, 2 matches, Row3
Row2, 1 match, Row1
Row1, 1 match, Row4
Row4, 1 match, Row1

The database is already pretty large (13m+ records), and will be growing quite a bit larger than this. Thanks in advance!

khtan
In (Som, Ni, Yak)

17689 Posts

Posted - 2010-10-10 : 21:59:31
can you explain why Row2 only have 2 matches ?

and what do you mean by "(even though it matches Row1 for 1, I need it to be the most it shares with any other single row)" ?


KH
[spoiler]Time is always against us[/spoiler]

Go to Top of Page

NewMedia42
Starting Member

35 Posts

Posted - 2010-10-10 : 23:46:59
Sure, so Row 2 has basically has two matches in two rows, so for Row3, column 2 (value 2) and column 3 (value 44) match - for a total of two matches (out of a possible 4, V1 ... V4). Row2 also matches Row1, but in this case they only share the value in column 1 (value 1), for a total of one match (out of a possible 4).

Does that clarify things?
Go to Top of Page

khtan
In (Som, Ni, Yak)

17689 Posts

Posted - 2010-10-11 : 04:53:11
[code]
DECLARE @tbl TABLE
(
row varchar(10),
v1 int,
v2 int,
v3 int,
v4 int
)

INSERT INTO @tbl
SELECT 'Row1', 1, 5, 14, 2 UNION ALL
SELECT 'Row2', 1, 2, 44, 88 UNION ALL
SELECT 'Row3', 3, 2, 44, 81 UNION ALL
SELECT 'Row4', 2, 5, 22, 46

; WITH cte AS
(
SELECT *
FROM @tbl t
unpivot
(
val
FOR col IN (v1, v2, v3, v4)
) p
),
row AS
(
SELECT DISTINCT row
FROM @tbl
)
SELECT c.row, matches = SUM(m.match), m_row = m.row
FROM cte c
CROSS APPLY
(
SELECT x.row, match = COUNT(*)
FROM cte x
WHERE x.row <> c.row
AND x.col = c.col
AND x.val = c.val
GROUP BY x.row
) m
GROUP BY c.row, m.row
ORDER BY c.row, matches DESC

/*
row matches m_row
---------- ----------- ----------
Row1 1 Row2
Row1 1 Row4
Row2 2 Row3
Row2 1 Row1
Row3 2 Row2
Row4 1 Row1

(6 row(s) affected)
*/
[/code]


KH
[spoiler]Time is always against us[/spoiler]

Go to Top of Page
   

- Advertisement -