updating records through a self join

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

General SQL Server Forums

New to SQL Server Programming

updating records through a self join

Author

Topic

craigwg
Posting Yak Master

154 Posts

Posted - 2011-03-11 : 12:56:30

Here is my situation. I have a table that I am cleaning up. It used to contain multiple records, for each month for each client. We are reorganzing and it will now keep one record per client. It's my job to clean up the table.

So I have multiple records and I have to delete the duplicates and keep the freshest data. (most fresh?)

For example lets say I have

client col1 col2 col3 date
1 1 4 9 3/1
1 null 5 null 5/1
1 2 2 2 4/1

In this example client 1 has three records the most recent of which is 5/1 Col1 and Col3 are null, col2 has a value. The new result should fill in the nulls from the next most recent value, in this case 4/1. So the new record should look like this:

client col1 col2 col3 date
1 2 5 2 5/1

I think I need a self join. This is what I have but its not right:


update table set a.[col1] = b.[col1]
from table  a 
join table b on a.id=b.id
where a.[col1] is null and b.[col1] is not null

There's gotta be a way to do this. Help!

Craig Greenwood

sunitabeck
Master Smack Fu Yak Hacker

5155 Posts

Posted - 2011-03-11 : 13:30:43

You need to be able to tell the update statement from which row to pick up the values that stand in for the null value. The following code should do it. It compiles, but I have not tested it, if you are going to do this please test before you do it for real.

with cte as
( select row_number() over (partition by id order by date desc) as rowId,* from YourTable )   
update a set
   col1 = coalesce(a.col1,b.col1),
   col2 = coalesce(a.col2,b.col2),
   col3 = coalesce(a.col3,b.col3)
from
   cte a
   inner join cte b on
      a.id = b.id
      and a.rowId = 1 and b.rowId = 2

But, what if the second to most recent date has a null value as well? Do you want to dig deeper and go to the previous date? And what if that is null - ad infinitum (or is it ad nausem? Latin is not my native language :--)

Edits: I had absolutely incorrect logic in the join condition. Hopefully fixed it.

Ifor
Aged Yak Warrior

700 Posts

Posted - 2011-03-11 : 13:32:02

I am not quite sure what you want, but the following should get you started:


-- *** Test Data ***
CREATE TABLE #t
(
	client int NOT NULL
	,Col1 int NULL
	,Col2 int NULL
	,Col3 int NULL
	,[date] datetime NOT NULL
)
INSERT INTO #t
SELECT 1, 1, 4, 9, '20110103'
UNION ALL SELECT 1, NULL, 5, NULL, '20110105'
UNION ALL SELECT 1, 2, 2, 2, '20110104'
UNION ALL SELECT 1, NULL, NULL, 9, '20110203'
UNION ALL SELECT 1, NULL, 5, NULL, '20110205'
UNION ALL SELECT 1, NULL, 2, 2, '20110204'
-- *** End Test Data ***

;WITH MaxDates
AS
(
	SELECT client, Col1, Col2, Col3, [date]
		,MAX(CASE WHEN Col1 IS NOT NULL THEN [date] END) OVER (PARTITION BY client, DATEADD(m, DATEDIFF(m, 0, [date]), 0)) MaxCol1Date
		,MAX(CASE WHEN Col2 IS NOT NULL THEN [date] END) OVER (PARTITION BY client, DATEADD(m, DATEDIFF(m, 0, [date]), 0)) MaxCol2Date
		,MAX(CASE WHEN Col3 IS NOT NULL THEN [date] END) OVER (PARTITION BY client, DATEADD(m, DATEDIFF(m, 0, [date]), 0)) MaxCol3Date
		,MAX([date]) OVER (PARTITION BY client, DATEADD(m, DATEDIFF(m, 0, [date]), 0)) AS MaxDate
	FROM #t
)
SELECT client
	,MAX(CASE WHEN [date] = MaxCol1Date THEN Col1 END) AS Col1
	,MAX(CASE WHEN [date] = MaxCol2Date THEN Col2 END) AS Col2
	,MAX(CASE WHEN [date] = MaxCol3Date THEN Col3 END) AS Col3
	,MaxDate AS [date]
-- INTO #keep
FROM MaxDates
GROUP BY client, MaxDate

--UPDATE T
--	SET Col1 = K.Col1
--		,Col2 = K.Col2
--		,Col3 = K.Col3
--FROM #t T
--	JOIN #keep K
--		ON T.client = K.client
--			AND T.[date] = K.[date]

--DELETE T
--FROM #t T
--	LEFT JOIN #keep K
--		ON T.client = K.client
--			AND T.[date] = K.[date]
--WHERE K.Client IS NULL

jcelko
Esteemed SQL Purist

547 Posts

Posted - 2011-03-11 : 17:54:10

>> I have a table that I am cleaning up. It used to contain multiple record sic sic, for each month for each client. <<

Sorry, but this was not a table. Rows are not record sic and tables do not have duplicate rows because of keys. This is a deck of punch cards declared with SQL by someone who needs to be fired.

>> We are re-organizing and it will now keep one record sic per client. It's my job to clean up the table. <<

The fifth labor of Hercules was to clean the stables of King Augeas in a single day. The Augean stables held thousands of animals and were over a mile long. This story has a happy ending for three reasons: (1) Hercules solved the problem in a clever way (2) Hercules got one tenth of the cattle for his work (3) At the end of the story of the Labors of Hercules, he got to kill the bastard that gave him this job.

>> For example lets say I have .. <<

Please post real DDL. Learn to use ISO-11179 rules for the data element names, avoid needless dialect and use ISO-8601 temporal formats, codes and so forth. People cannot read your mind, so post your code and clear specs if you really want help.

In this example client 1 has three records sic the most recent of which is 5/1 Col1 and Col3 are NULL, col2 has a value. The new result should fill in the NULLs from the next most recent value, in this case 4/1. So the new record sic should look like this:

client col1 col2 col3 date
1 2 5 2 5/1
You got the DML wrong and you should NEVER use the proprietary nightmare of UPDATE..FROM. It cannot detect cardinality problems. And you forgot to tell us how to handle two “records” on the same date for the same client. Can we assume that this does not happen?
Since you are only going to do this once, let's do it in several passes

CREATE TABLE Clients
(client_id INTEGER NOT NULL PRIMARY KEY,
col1 INTEGER,
col3 INTEGER,
col3 INTEGER,
posting_date DATE NOT NULL);

If the3re3 is a duplicate for the most recent data, this will fail and you have to fix it yourself.

INSERT INTO Clients
SELECT client_id, col1, col2, col3, posting date
FROM (SELECT client_id, col1, col2, col3, posting date,
MAX(posting_date)
OVER (PARTITION BY client_id) AS last_posting _date
FROM PunchCards) AS X
WHERE X.posting _date = X.last_posting _date;

Assuming that you got one row into the Clients table for everyone, then remove them from the Punch Cards.

DELETE FROM Punchcards
WHERE EXISTS
(SELECT *
FROM Clients AS C
WHERE C.client_id = Punchcards.client_id
AND C.posting_date = Punchcards.posting_date);

Now, we need to consolidate the remaining “records” starting with the now most current one for each client.

MERGE INTO Clients
USING (SELECT client_id, col1, col2, col3, posting date
FROM (SELECT client_id, col1, col2, col3, posting date,
MAX(posting_date)
OVER (PARTITION BY client_id) AS last_posting _date
FROM PunchCards) AS X1
WHERE X.posting _date = X1.last_posting _date)
AS X2
ON Clients.client_id = X2.client_id
WHEN MATCHED
THEN UPDATE
SET Clients.col1 = COALESCE(Clients.col1, X2.col1),
Clients.col2 = COALESCE(Clients.col2, X2.col2),
Clients.col3 = COALESCE(Clients.col3, X2.col3);

Strip off the second newest dates and repeat the MERGE. It is a layer at a time approach.

Yes, it is complicated and a bitch to code.

--CELKO--
Books in Celko Series for Morgan-Kaufmann Publishing
Analytics and OLAP in SQL
Data and Databases: Concepts in Practice
Data, Measurements and Standards in SQL
SQL for Smarties
SQL Programming Style
SQL Puzzles and Answers
Thinking in Sets
Trees and Hierarchies in SQL

jcelko
Esteemed SQL Purist

547 Posts

Posted - 2011-03-14 : 11:14:17

Another idea. Get rid of the NULLs, get each column's last posting date value and put all of them into a SELECT clause. Here is the skeleton.

WITH Last_Col1
AS
(SELECT X.client_id, X.value
FROM (SELECT client_id, col_x, posting_date,
MAX(posting_date) OVER (PARTITION BY client)
AS last_posting_date
FROM PunchCards
WHERE col_x IS NOT NULL) AS X
WHERE last_posting_date = posting_date),

Last_Col2 AS (..),
etc.

SELECT Last_Col1.value,
Last_Col2.value,
etc
FROM Last_Col1, Last_Col2, etc
WHERE Last_Col1.client_id = Last_Col2.client_id
AND Last_Col2.client_id = Last_Col3.client_id
AND ..

Performance will be awful, if you have a lot of data.

--CELKO--
Books in Celko Series for Morgan-Kaufmann Publishing
Analytics and OLAP in SQL
Data and Databases: Concepts in Practice
Data, Measurements and Standards in SQL
SQL for Smarties
SQL Programming Style
SQL Puzzles and Answers
Thinking in Sets
Trees and Hierarchies in SQL

craigwg
Posting Yak Master

154 Posts

Posted - 2011-03-15 : 16:50:07

I resolved this by using a cte. (Thanks sunitabeck and Ifor.) There was more, mostly dropping the defunct records and recreating indexes. I realize I could have done it smoother but at least this made sense to me.

Celko, part of life is deciphering cryptic issues, especially when listening to and helping those who are not as experienced as you (me). It's better to ask questions than rip me up and down for not asking questions exactly how you want them. Also there are issues of proprietary information and simple brevity that block me from posting the code I'm running verbatim. The queries I ran successfully were over 1000 printed lines. I'm doing the best I can, and appreciate the help I get here. I hope I can ask my questions here in a safe, comfortable, and helpful environment.

Craig Greenwood

X002548
Not Just a Number

15586 Posts

Posted - 2011-03-15 : 16:52:59

JOE be Funny

Punch cards..hilarious...NOBODY knows Punch Cards until you've seen someone drop a deck on the floor

Brett

8-)

Hint: Want your questions answered fast? Follow the direction in this link
http://weblogs.sqlteam.com/brettk/archive/2005/05/25/5276.aspx

Want to help yourself?

http://msdn.microsoft.com/en-us/library/ms130214.aspx

http://weblogs.sqlteam.com/brettk/

http://brettkaiser.blogspot.com/

jcelko
Esteemed SQL Purist

547 Posts

Posted - 2011-03-15 : 23:15:09

quote:
[i]..hilarious...NOBODY knows Punch Cards until you've seen someone drop a deck on the floor

Been there; Done that.

My story was four decades ago when I agreed to typeset an alumni directory for a small black college in Atlanta. The data was on a DG Nova minicomputer on disk. The AM-725 typesetter I used was on paper tape -- TTS as the native code.

I was a clever young geek! I programmed the AM-725 to read ASCII and to use typesetting macros from 16 internal registers. We sorted the data by last name and by class year, adding macro calls for bold face, justified margins, italics and fancy stuff. The test data was beautiful. The printer was lined up.

We sorted the data by class year and last name. But there was too much data (~85 years) to use the teletype on Herb's DG Nova to generate the paper tape.

No problem! Georgia Tech had a high speed paper tape punch with industrial steel punch heads. I put the data on a mag tape, go over to the Computer Center and load it. I have a hand-cranked film winder to spool the paper tape as it comes out of the punch, so I an take it back on a reel.

Did I stress the words "high speed"n in that last paragraph? The tape came out like a stream of water from a fire hydrant. It shot across the room and began piling up in a corner on the other side. I had no way to stop or buffer the paper tape in my code, which was a very simple (i,e, stupid) batch program for a big, fast CDC machine.

Ever spend ~5 hours slowly winding paper tape onto spools, so you do not tear it? Oh, mush for brains did not think about how many spools would be needed. The punch was designed to produce minicomputer software (that is how we did in those days) so its paper supply was a reel about 1m in diameter. There was also this wonderful cutter that would chop off a length of paper tape under program control that I did not know about.

--CELKO--
Books in Celko Series for Morgan-Kaufmann Publishing
Analytics and OLAP in SQL
Data and Databases: Concepts in Practice
Data, Measurements and Standards in SQL
SQL for Smarties
SQL Programming Style
SQL Puzzles and Answers
Thinking in Sets
Trees and Hierarchies in SQL

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources