Author |
Topic |
LaurieCox
158 Posts |
Posted - 2011-02-24 : 16:03:24
|
Hi,I have two tables (ddl and sample data at end of post) and I want to find matches between the two tables where the data in at least r out n columns match. If say I wanted at least three of the columns to match I could code it this way:[CODE]SELECT PN.ID AS PROGID ,BH.ID AS BILLID from #ProgressNotes PN ,#BillingHistory BHwhere PN.PATID = BH.PATID -- I will always match on PATID and ( (PN.Serv_Start_Date = BH.Serv_Start_Date and PN.serv_start_time = BH.serv_start_time and PN.bill_duration = BH.bill_duration) or (PN.Serv_Start_Date = BH.Serv_Start_Date and PN.serv_start_time = BH.serv_start_time and PN.bill_program_value = BH.bill_program_value) )--... for every possible group of three ... order by PN.ID,BH.ID [/CODE]... but that is a lot* of typing. And what happens if I decide that four columns have to match (or only two)? Or what if the number of total columns to check for matching is not 6 but 8? quote: * I think the number of "or" clauses = (n!)/(r!(n-r)!)Where: n = number of columns r = number of columns that must matchand each "or" clause would have r "and" clauses
And once you start throwing factorials around things can get ugly really fast ...So unless I can come up with a different approach to this problem, it is pretty much hopeless.From the sample data (below) the code above gives this output:[CODE]PROGID BILLID====== ======NOTE1 BILL5 [/CODE]If I completed the code (and made no typos in the sample data) the output should be:[CODE]PROGID BILLID====== ======NOTE1 BILL1NOTE1 BILL2NOTE1 BILL5NOTE2 BILL4[/CODE]Thanks,LaurieTable DDL and sample data:[CODE]create table #ProgressNotes (ID varchar(5), PATID varchar(20), Serv_start_date datetime, serv_start_time varchar(10), bill_duration int, bill_program_value varchar(40), serv_clinician varchar(20), bill_location varchar(40))insert into #ProgressNotes SELECT 'NOTE1', '10000', '2010-05-11', '01:19 PM', 0, 'MIA SUPPORT', '008409', 'Jail' UNION ALLSELECT 'NOTE2', '20000', '2010-09-05', '08:20 PM', 0, 'MIA CRISIS', '008475', 'Home' create table #BillingHistory (ID varchar(5), PATID varchar(20), Serv_start_date datetime, serv_start_time varchar(10), bill_duration int, bill_program_value varchar(40), serv_clinician varchar(20), bill_location varchar(40))insert into #BillingHistorySELECT 'BILL1', '10000', '2010-05-11', '01:19 PM', 20, 'DDD SUPPORT', '008409', 'Other' UNION ALLSELECT 'BILL2', '10000', '2010-05-12', '03:20 PM', 10, 'MIA SUPPORT', '008409', 'Jail' UNION ALLSELECT 'BILL3', '10000', '2010-05-11', '03:20 PM', 0, 'INQUIRY', '009999', 'Home' UNION ALLSELECT 'BILL4', '20000', '2010-09-05', '08:11 PM', 0, 'INQUIRY', '008700', 'Home' UNION ALLSELECT 'BILL5', '10000', '2010-05-11', '01:19 PM', 10, 'MIA SUPPORT', '001111', 'Other'[/CODE] |
|
robvolk
Most Valuable Yak
15732 Posts |
Posted - 2011-02-24 : 16:29:39
|
This doesn't quite do what you want...or maybe it does...but anyway it might give you an idea:;with col6(PATID,Serv_start_date,serv_start_time,bill_duration,bill_program_value,serv_clinician,bill_location)as (select PATID,Serv_start_date,serv_start_time,bill_duration,bill_program_value,serv_clinician,bill_locationfrom #ProgressNotesintersectselect PATID,Serv_start_date,serv_start_time,bill_duration,bill_program_value,serv_clinician,bill_locationfrom #BillingHistory),col5(PATID,Serv_start_date,serv_start_time,bill_duration,bill_program_value,serv_clinician)as (select PATID,Serv_start_date,serv_start_time,bill_duration,bill_program_value,serv_clinicianfrom #ProgressNotesintersectselect PATID,Serv_start_date,serv_start_time,bill_duration,bill_program_value,serv_clinicianfrom #BillingHistory),col4(PATID,Serv_start_date,serv_start_time,bill_duration,bill_program_value)as (select PATID,Serv_start_date,serv_start_time,bill_duration,bill_program_valuefrom #ProgressNotesintersectselect PATID,Serv_start_date,serv_start_time,bill_duration,bill_program_valuefrom #BillingHistory),col3(PATID,Serv_start_date,serv_start_time,bill_duration)as (select PATID,Serv_start_date,serv_start_time,bill_durationfrom #ProgressNotesintersectselect PATID,Serv_start_date,serv_start_time,bill_durationfrom #BillingHistory),col2(PATID,Serv_start_date,serv_start_time)as (select PATID,Serv_start_date,serv_start_timefrom #ProgressNotesintersectselect PATID,Serv_start_date,serv_start_timefrom #BillingHistory),col1(PATID,Serv_start_date)as (select PATID,Serv_start_datefrom #ProgressNotesintersectselect PATID,Serv_start_datefrom #BillingHistory),all_CTEs(Match, PATID) AS (select 6, PATID from col6union all select 5, PATID from col5union all select 4, PATID from col4union all select 3, PATID from col3union all select 2, PATID from col2union all select 1, PATID from col1)select PATID, MAX(Match) MatchedColumns FROM all_CTEs GROUP BY PATID Each CTE compares a different set of columns, then all_CTEs unions them and finds the largest set of matched columns for each PATID. If you need to work all combinations for n columns, then you'll need n! CTEs plus an all_CTE that unions them. It's a quick copy/paste/remove-extra-columns operation.edit: it's probably not n!, but you can work out the math. |
 |
|
LaurieCox
158 Posts |
Posted - 2011-02-25 : 14:23:01
|
Hi robvolk,Thanks for the reply. I got pulled off to do something else today, but I look forward to trying to figure out your code. I will have to read up on cte to figure out what is going on there.Anyway, thanks again for your help.Laurie |
 |
|
|
|
|