| Author |
Topic |
|
NewMedia42
Starting Member
35 Posts |
Posted - 2010-10-10 : 21:45:55
|
| Hi, I have a tricky problem that I haven't found an elegant solution for. I have a large number of columns (84) that I'm storing different hash values in, and what I need to know is how many columns match, and ideally maybe even have them returned in some order (such as most matches to least matches). I also need to know which columns match eachother, but if I can get the number of matches, I can figure the rest out. For a realworld example, let's say my DB only had 4 columns: V1 V2 V3 V4Row1 1 5 14 2Row2 1 2 44 88Row3 3 2 44 81Row4 2 5 22 46So in the above example, ideally I could get back something like:Row3, 2 matchesRow2, 2 matches (even though it matches Row1 for 1, I need it to be the most it shares with any other single row)Row1, 1 matchRow4, 1 matchEven more helpful, but not necessary, would be if it would identify which rows it had the matches with, for example:Row3, 2 matches, Row2Row2, 2 matches, Row3Row2, 1 match, Row1Row1, 1 match, Row4Row4, 1 match, Row1The database is already pretty large (13m+ records), and will be growing quite a bit larger than this. Thanks in advance! |
|
|
khtan
In (Som, Ni, Yak)
17689 Posts |
Posted - 2010-10-10 : 21:59:31
|
can you explain why Row2 only have 2 matches ?and what do you mean by "(even though it matches Row1 for 1, I need it to be the most it shares with any other single row)" ? KH[spoiler]Time is always against us[/spoiler] |
 |
|
|
NewMedia42
Starting Member
35 Posts |
Posted - 2010-10-10 : 23:46:59
|
| Sure, so Row 2 has basically has two matches in two rows, so for Row3, column 2 (value 2) and column 3 (value 44) match - for a total of two matches (out of a possible 4, V1 ... V4). Row2 also matches Row1, but in this case they only share the value in column 1 (value 1), for a total of one match (out of a possible 4).Does that clarify things? |
 |
|
|
khtan
In (Som, Ni, Yak)
17689 Posts |
Posted - 2010-10-11 : 04:53:11
|
[code]DECLARE @tbl TABLE( row varchar(10), v1 int, v2 int, v3 int, v4 int)INSERT INTO @tblSELECT 'Row1', 1, 5, 14, 2 UNION ALLSELECT 'Row2', 1, 2, 44, 88 UNION ALLSELECT 'Row3', 3, 2, 44, 81 UNION ALLSELECT 'Row4', 2, 5, 22, 46; WITH cte AS( SELECT * FROM @tbl t unpivot ( val FOR col IN (v1, v2, v3, v4) ) p),row AS( SELECT DISTINCT row FROM @tbl)SELECT c.row, matches = SUM(m.match), m_row = m.rowFROM cte c CROSS APPLY ( SELECT x.row, match = COUNT(*) FROM cte x WHERE x.row <> c.row AND x.col = c.col AND x.val = c.val GROUP BY x.row ) mGROUP BY c.row, m.rowORDER BY c.row, matches DESC/*row matches m_row ---------- ----------- ---------- Row1 1 Row2Row1 1 Row4Row2 2 Row3Row2 1 Row1Row3 2 Row2Row4 1 Row1(6 row(s) affected)*/[/code] KH[spoiler]Time is always against us[/spoiler] |
 |
|
|
|
|
|