Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
| Author |
Topic |
|
craigwg
Posting Yak Master
154 Posts |
Posted - 2011-03-11 : 12:56:30
|
Here is my situation. I have a table that I am cleaning up. It used to contain multiple records, for each month for each client. We are reorganzing and it will now keep one record per client. It's my job to clean up the table.So I have multiple records and I have to delete the duplicates and keep the freshest data. (most fresh?)For example lets say I haveclient col1 col2 col3 date1 1 4 9 3/11 null 5 null 5/11 2 2 2 4/1In this example client 1 has three records the most recent of which is 5/1 Col1 and Col3 are null, col2 has a value. The new result should fill in the nulls from the next most recent value, in this case 4/1. So the new record should look like this:client col1 col2 col3 date1 2 5 2 5/1I think I need a self join. This is what I have but its not right:update table set a.[col1] = b.[col1]from table a join table b on a.id=b.idwhere a.[col1] is null and b.[col1] is not null There's gotta be a way to do this. Help!Craig Greenwood |
|
|
sunitabeck
Master Smack Fu Yak Hacker
5155 Posts |
Posted - 2011-03-11 : 13:30:43
|
You need to be able to tell the update statement from which row to pick up the values that stand in for the null value. The following code should do it. It compiles, but I have not tested it, if you are going to do this please test before you do it for real.with cte as( select row_number() over (partition by id order by date desc) as rowId,* from YourTable ) update a set col1 = coalesce(a.col1,b.col1), col2 = coalesce(a.col2,b.col2), col3 = coalesce(a.col3,b.col3)from cte a inner join cte b on a.id = b.id and a.rowId = 1 and b.rowId = 2 But, what if the second to most recent date has a null value as well? Do you want to dig deeper and go to the previous date? And what if that is null - ad infinitum (or is it ad nausem? Latin is not my native language :--)Edits: I had absolutely incorrect logic in the join condition. Hopefully fixed it. |
 |
|
|
Ifor
Aged Yak Warrior
700 Posts |
Posted - 2011-03-11 : 13:32:02
|
I am not quite sure what you want, but the following should get you started:-- *** Test Data ***CREATE TABLE #t( client int NOT NULL ,Col1 int NULL ,Col2 int NULL ,Col3 int NULL ,[date] datetime NOT NULL)INSERT INTO #tSELECT 1, 1, 4, 9, '20110103'UNION ALL SELECT 1, NULL, 5, NULL, '20110105'UNION ALL SELECT 1, 2, 2, 2, '20110104'UNION ALL SELECT 1, NULL, NULL, 9, '20110203'UNION ALL SELECT 1, NULL, 5, NULL, '20110205'UNION ALL SELECT 1, NULL, 2, 2, '20110204'-- *** End Test Data ***;WITH MaxDatesAS( SELECT client, Col1, Col2, Col3, [date] ,MAX(CASE WHEN Col1 IS NOT NULL THEN [date] END) OVER (PARTITION BY client, DATEADD(m, DATEDIFF(m, 0, [date]), 0)) MaxCol1Date ,MAX(CASE WHEN Col2 IS NOT NULL THEN [date] END) OVER (PARTITION BY client, DATEADD(m, DATEDIFF(m, 0, [date]), 0)) MaxCol2Date ,MAX(CASE WHEN Col3 IS NOT NULL THEN [date] END) OVER (PARTITION BY client, DATEADD(m, DATEDIFF(m, 0, [date]), 0)) MaxCol3Date ,MAX([date]) OVER (PARTITION BY client, DATEADD(m, DATEDIFF(m, 0, [date]), 0)) AS MaxDate FROM #t)SELECT client ,MAX(CASE WHEN [date] = MaxCol1Date THEN Col1 END) AS Col1 ,MAX(CASE WHEN [date] = MaxCol2Date THEN Col2 END) AS Col2 ,MAX(CASE WHEN [date] = MaxCol3Date THEN Col3 END) AS Col3 ,MaxDate AS [date]-- INTO #keepFROM MaxDatesGROUP BY client, MaxDate--UPDATE T-- SET Col1 = K.Col1-- ,Col2 = K.Col2-- ,Col3 = K.Col3--FROM #t T-- JOIN #keep K-- ON T.client = K.client-- AND T.[date] = K.[date]--DELETE T--FROM #t T-- LEFT JOIN #keep K-- ON T.client = K.client-- AND T.[date] = K.[date]--WHERE K.Client IS NULL |
 |
|
|
jcelko
Esteemed SQL Purist
547 Posts |
Posted - 2011-03-11 : 17:54:10
|
| >> I have a table that I am cleaning up. It used to contain multiple record sic sic, for each month for each client. <<Sorry, but this was not a table. Rows are not record sic and tables do not have duplicate rows because of keys. This is a deck of punch cards declared with SQL by someone who needs to be fired. >> We are re-organizing and it will now keep one record sic per client. It's my job to clean up the table. <<The fifth labor of Hercules was to clean the stables of King Augeas in a single day. The Augean stables held thousands of animals and were over a mile long. This story has a happy ending for three reasons: (1) Hercules solved the problem in a clever way (2) Hercules got one tenth of the cattle for his work (3) At the end of the story of the Labors of Hercules, he got to kill the bastard that gave him this job. >> For example lets say I have .. << Please post real DDL. Learn to use ISO-11179 rules for the data element names, avoid needless dialect and use ISO-8601 temporal formats, codes and so forth. People cannot read your mind, so post your code and clear specs if you really want help. In this example client 1 has three records sic the most recent of which is 5/1 Col1 and Col3 are NULL, col2 has a value. The new result should fill in the NULLs from the next most recent value, in this case 4/1. So the new record sic should look like this:client col1 col2 col3 date1 2 5 2 5/1You got the DML wrong and you should NEVER use the proprietary nightmare of UPDATE..FROM. It cannot detect cardinality problems. And you forgot to tell us how to handle two “records” on the same date for the same client. Can we assume that this does not happen? Since you are only going to do this once, let's do it in several passes CREATE TABLE Clients(client_id INTEGER NOT NULL PRIMARY KEY, col1 INTEGER, col3 INTEGER, col3 INTEGER,posting_date DATE NOT NULL);If the3re3 is a duplicate for the most recent data, this will fail and you have to fix it yourself. INSERT INTO Clients SELECT client_id, col1, col2, col3, posting date FROM (SELECT client_id, col1, col2, col3, posting date, MAX(posting_date) OVER (PARTITION BY client_id) AS last_posting _date FROM PunchCards) AS XWHERE X.posting _date = X.last_posting _date;Assuming that you got one row into the Clients table for everyone, then remove them from the Punch Cards.DELETE FROM PunchcardsWHERE EXISTS (SELECT * FROM Clients AS C WHERE C.client_id = Punchcards.client_id AND C.posting_date = Punchcards.posting_date);Now, we need to consolidate the remaining “records” starting with the now most current one for each client.MERGE INTO Clients USING (SELECT client_id, col1, col2, col3, posting date FROM (SELECT client_id, col1, col2, col3, posting date, MAX(posting_date) OVER (PARTITION BY client_id) AS last_posting _date FROM PunchCards) AS X1 WHERE X.posting _date = X1.last_posting _date) AS X2 ON Clients.client_id = X2.client_idWHEN MATCHEDTHEN UPDATE SET Clients.col1 = COALESCE(Clients.col1, X2.col1), Clients.col2 = COALESCE(Clients.col2, X2.col2), Clients.col3 = COALESCE(Clients.col3, X2.col3);Strip off the second newest dates and repeat the MERGE. It is a layer at a time approach. Yes, it is complicated and a bitch to code. --CELKO--Books in Celko Series for Morgan-Kaufmann PublishingAnalytics and OLAP in SQLData and Databases: Concepts in Practice Data, Measurements and Standards in SQLSQL for SmartiesSQL Programming Style SQL Puzzles and Answers Thinking in SetsTrees and Hierarchies in SQL |
 |
|
|
jcelko
Esteemed SQL Purist
547 Posts |
Posted - 2011-03-14 : 11:14:17
|
| Another idea. Get rid of the NULLs, get each column's last posting date value and put all of them into a SELECT clause. Here is the skeleton.WITH Last_Col1AS(SELECT X.client_id, X.value FROM (SELECT client_id, col_x, posting_date, MAX(posting_date) OVER (PARTITION BY client) AS last_posting_date FROM PunchCards WHERE col_x IS NOT NULL) AS X WHERE last_posting_date = posting_date),Last_Col2 AS (..), etc.SELECT Last_Col1.value, Last_Col2.value, etc FROM Last_Col1, Last_Col2, etc WHERE Last_Col1.client_id = Last_Col2.client_id AND Last_Col2.client_id = Last_Col3.client_id AND ..Performance will be awful, if you have a lot of data. --CELKO--Books in Celko Series for Morgan-Kaufmann PublishingAnalytics and OLAP in SQLData and Databases: Concepts in Practice Data, Measurements and Standards in SQLSQL for SmartiesSQL Programming Style SQL Puzzles and Answers Thinking in SetsTrees and Hierarchies in SQL |
 |
|
|
craigwg
Posting Yak Master
154 Posts |
Posted - 2011-03-15 : 16:50:07
|
| I resolved this by using a cte. (Thanks sunitabeck and Ifor.) There was more, mostly dropping the defunct records and recreating indexes. I realize I could have done it smoother but at least this made sense to me. Celko, part of life is deciphering cryptic issues, especially when listening to and helping those who are not as experienced as you (me). It's better to ask questions than rip me up and down for not asking questions exactly how you want them. Also there are issues of proprietary information and simple brevity that block me from posting the code I'm running verbatim. The queries I ran successfully were over 1000 printed lines. I'm doing the best I can, and appreciate the help I get here. I hope I can ask my questions here in a safe, comfortable, and helpful environment.Craig Greenwood |
 |
|
|
X002548
Not Just a Number
15586 Posts |
|
|
jcelko
Esteemed SQL Purist
547 Posts |
Posted - 2011-03-15 : 23:15:09
|
quote: [i]..hilarious...NOBODY knows Punch Cards until you've seen someone drop a deck on the floor
Been there; Done that. My story was four decades ago when I agreed to typeset an alumni directory for a small black college in Atlanta. The data was on a DG Nova minicomputer on disk. The AM-725 typesetter I used was on paper tape -- TTS as the native code. I was a clever young geek! I programmed the AM-725 to read ASCII and to use typesetting macros from 16 internal registers. We sorted the data by last name and by class year, adding macro calls for bold face, justified margins, italics and fancy stuff. The test data was beautiful. The printer was lined up.We sorted the data by class year and last name. But there was too much data (~85 years) to use the teletype on Herb's DG Nova to generate the paper tape.No problem! Georgia Tech had a high speed paper tape punch with industrial steel punch heads. I put the data on a mag tape, go over to the Computer Center and load it. I have a hand-cranked film winder to spool the paper tape as it comes out of the punch, so I an take it back on a reel. Did I stress the words "high speed"n in that last paragraph? The tape came out like a stream of water from a fire hydrant. It shot across the room and began piling up in a corner on the other side. I had no way to stop or buffer the paper tape in my code, which was a very simple (i,e, stupid) batch program for a big, fast CDC machine. Ever spend ~5 hours slowly winding paper tape onto spools, so you do not tear it? Oh, mush for brains did not think about how many spools would be needed. The punch was designed to produce minicomputer software (that is how we did in those days) so its paper supply was a reel about 1m in diameter. There was also this wonderful cutter that would chop off a length of paper tape under program control that I did not know about.--CELKO--Books in Celko Series for Morgan-Kaufmann PublishingAnalytics and OLAP in SQLData and Databases: Concepts in Practice Data, Measurements and Standards in SQLSQL for SmartiesSQL Programming Style SQL Puzzles and Answers Thinking in SetsTrees and Hierarchies in SQL |
 |
|
|
|
|
|
|
|