| Author |
Topic |
|
cdex3
Starting Member
3 Posts |
Posted - 2011-09-15 : 11:37:21
|
| I have to merge multiple records into one master record. Some of the records have null values. I have deduped these records using the FuzzyGrouping in SSIS so each similar record has the same "CustFuzzyGroup_key_out" on which they should be grouped by. Looking for the best approach using SQL Server 2008 R2.Thanks in advance.The scripts are below.--Table SchemaCREATE TABLE [#CustomerInfo]( [CustRowID] [int] NULL, [CustFuzzyGroup_Key_Out_ID] [int] NULL, [CustLocID] [int] NULL, [CustNo] [varchar](17) NULL, [FirstName] [varchar](25) NULL, [MiddleName] [varchar](25) NULL, [LastName] [varchar](40) NULL, [AddressLine1] [varchar](45) NULL, [City] [varchar](35) NULL, [State] [varchar](3) NULL, [ZipCode] [varchar](10) NULL, [CellPhone] [varchar](10) NULL, [BusinessPhone] [varchar](15) NULL, [EmailAddress] [varchar](40) NULL, [CreateDate] [datetime] NULL) ON [PRIMARY]GO-- Test DataINSERT INTO #CustomerInfo ( CustRowID ,CustFuzzyGroup_Key_Out_ID ,CustLocID ,CustNo ,FirstName ,MiddleName ,LastName ,AddressLine1 ,City ,State ,ZipCode ,CellPhone ,BusinessPhone ,EmailAddress ,CreateDate )VALUES ( 25, 3, 204, '127489', 'Tommy', 'A', 'James', '123 Any Street', 'New York', 'NY', '10011', '2121231234', '', '', '2011-08-09 16:23:45' ),( 18, 3, 255, '36987', 'Tommy', '', 'James', '123 Any Street', 'New York', 'NY', '10011', '', '2124441212', 'corpemail@somecorp.com', '2011-09-09 00:00:00' ),( 1, 3,173, '68974', 'Tommy', '', 'James', '123 Any Street', 'New York', 'NY', '10011', '', '2124155000', 'email@somemail.com', '2011-07-09 23:23:23' ),( 31, 4, 89, '789657', 'Paula', 'P', 'Lady', '45 Main Street', 'Milwaukee', 'WI', '53111', '4148889797', '4148256000', 'email@somemail.com', '2011-08-07 23:00:00' ),( 22, 4,58, '9897865T', 'Paula', '', 'Lady', '45 Main Street', 'Milwaukee', 'WI', '53112', '4148011234', '', '', '2011-09-03 23:00:00' )GOSELECT * FROM [#CustomerInfo]GO--DROP TABLE [#CustomerInfo]-- Desired Results/*MasterCustID FirstName MiddleName LastName AddressLine1 City State ZipCode CellPhone BusinessPhone EmailAddress CreateDate1 Tommy A James 123 Any Street New York NY 10011 2121111212 2124441212 corpemail@somecorp.com 9/9/20112 Paula P Lady 45 Main Street Milwaukee WI 53112 4148011234 4148256000 email@somemail.com 9/3/2011*/ |
|
|
visakh16
Very Important crosS Applying yaK Herder
52326 Posts |
Posted - 2011-09-15 : 12:46:52
|
| [code]SELECT ci.*FROM #CustomerInfo ciINNER JOIN (SELECT CustFuzzyGroup_Key_Out_ID,MAX(CreateDate) AS Latest FROM #CustomerInfo GROUP BY CustFuzzyGroup_Key_Out_ID) ci1ON ci1.CustFuzzyGroup_Key_Out_ID = ci.CustFuzzyGroup_Key_Out_IDAND ci1.Latest = ci.CreateDate[/code]------------------------------------------------------------------------------------------------------SQL Server MVPhttp://visakhm.blogspot.com/ |
 |
|
|
cdex3
Starting Member
3 Posts |
Posted - 2011-09-15 : 15:21:22
|
| Thanks for the reply Visakhm. I mistakenly entered empty strings as opposed to nulls in all of the non null fields. What I really want returned is the row with the max createdate and all non null values. eg for "Paula Lady." I actually want returned "Paula P Lady 45 Main Street Milwaukee WI 53112 4148011234 4148256000 email@somemail.com 9/3/2011. Notice that the middle initial P, businessphone 4148256000 and email are from the earlier record. I need to return the max record and then any non null values that may exist in the previous rows if the row with the max date has nulls for those columns. I hope this is clear.Thanks again for the response. |
 |
|
|
visakh16
Very Important crosS Applying yaK Herder
52326 Posts |
Posted - 2011-09-16 : 11:58:42
|
| if there are multiple values existing in previous rows for a null valued field, which one should you return? what would be rule in that case?------------------------------------------------------------------------------------------------------SQL Server MVPhttp://visakhm.blogspot.com/ |
 |
|
|
cdex3
Starting Member
3 Posts |
Posted - 2011-09-16 : 15:34:09
|
| In the case of multiple values I would take the one with the MAX CreateDate. |
 |
|
|
visakh16
Very Important crosS Applying yaK Herder
52326 Posts |
Posted - 2011-09-17 : 02:31:46
|
| [code]SELECT ci.CustRowID,ci.CustFuzzyGroup_Key_Out_ID,ci.CustLocID,ci.CustNo,ci.FirstName,ci.MiddleName,ci.LastName,COALESCE(ci.AddressLine1,ci2.AddressLine1) AS AddressLine1,COALESCE(ci.City,ci3.City) AS City,COALESCE(ci.State,ci4.State) AS State,COALESCE(ci.ZipCode,ci5.ZipCode) AS ZipCode,COALESCE(ci.CellPhone,ci6.CellPhone) AS CellPhone....FROM #CustomerInfo ciINNER JOIN (SELECT CustFuzzyGroup_Key_Out_ID,MAX(CreateDate) AS Latest FROM #CustomerInfo GROUP BY CustFuzzyGroup_Key_Out_ID) ci1ON ci1.CustFuzzyGroup_Key_Out_ID = ci.CustFuzzyGroup_Key_Out_IDAND ci1.Latest = ci.CreateDateOUTER APPLY(SELECT TOP 1 AddressLine1 FROM #CustomerInfo WHERE CustFuzzyGroup_Key_Out_ID=ci.CustFuzzyGroup_Key_Out_ID AND AddressLine1 IS NOT NULL ORDER BY CreateDate DESC)ci2OUTER APPLY(SELECT TOP 1 City FROM #CustomerInfo WHERE CustFuzzyGroup_Key_Out_ID=ci.CustFuzzyGroup_Key_Out_ID AND City IS NOT NULL ORDER BY CreateDate DESC)ci3OUTER APPLY(SELECT TOP 1 State FROM #CustomerInfo WHERE CustFuzzyGroup_Key_Out_ID=ci.CustFuzzyGroup_Key_Out_ID AND State IS NOT NULL ORDER BY CreateDate DESC)ci4OUTER APPLY(SELECT TOP 1 ZipCode FROM #CustomerInfo WHERE CustFuzzyGroup_Key_Out_ID=ci.CustFuzzyGroup_Key_Out_ID AND ZipCode IS NOT NULL ORDER BY CreateDate DESC)ci5OUTER APPLY(SELECT TOP 1 CellPhone FROM #CustomerInfo WHERE CustFuzzyGroup_Key_Out_ID=ci.CustFuzzyGroup_Key_Out_ID AND CellPhone IS NOT NULL ORDER BY CreateDate DESC)ci6....[/code]------------------------------------------------------------------------------------------------------SQL Server MVPhttp://visakhm.blogspot.com/ |
 |
|
|
|
|
|