Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 General SQL Server Forums
 New to SQL Server Programming
 datcleaning(Resolved)

Author  Topic 

aakcse
Aged Yak Warrior

570 Posts

Posted - 2012-08-06 : 15:35:23
I have to clean data as below, the 1,2,3... is just a sequence added, this data is in 2 columns only, one Id has sequence as seen below and other column Raw has entire data.

I have created 3 tables one having id & data containing only Path:/...
second having only data as Ora14.sdk.. and third having all commit .. complete kind of data

Few things common, commit and complete is common in all the records, and Path:/ data always starts with Path:/ based on this I was able to separate these into tables





11 GoodTypes: /usr2/lib/kpc
12 Tivoli.rte
13 3.7.1.0 Com Pas 01/25/12 12:32:14
14 3.7.1.0 Tom Pat 01/25/12 12:32:14


Now I need to join this table in such a way that I get the results as seen below, I am fine we can write a query and do this with out creating 3 tables...



11 GoodTypes: /usr2/lib/kpc
12 Tivoli.rte
13 3.7.1.0 Com Pas 01/25/12 12:32:14
14 3.7.1.0 Tom Pat 01/25/12 12:32:14





-Neil

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2012-08-06 : 15:51:02
your file data format is not consistent. why is there no path information for second and third rowS?

------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/

Go to Top of Page

robvolk
Most Valuable Yak

15732 Posts

Posted - 2012-08-06 : 16:25:17
Is it the same circumstances as this: http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=176908
Go to Top of Page

aakcse
Aged Yak Warrior

570 Posts

Posted - 2012-08-06 : 16:25:54
the Path is coming as a header say after 100+ odd records, the second row actually got split 2 rows.

This could be a sub header Tivoli.rte as you can see on 12 row. still entire file is in one column Raw which has ID as identity

I am fine if we can use cursor or query or splitting up into 3 tables and joining them.

size of data will not be huge, it would be of less than 5K lines

yes data is not consistent however we have commit and complete in all the records apart from headers & Path, based on which I was able to split in to different tables

-Neil
Go to Top of Page

robvolk
Most Valuable Yak

15732 Posts

Posted - 2012-08-06 : 16:45:21
[code]-- test data
DECLARE @t TABLE (n INT NOT NULL PRIMARY KEY, RAW VARCHAR(255))
INSERT @t
SELECT 1,'Path: /usr/lib/kpc' UNION ALL SELECT
2,'Ora14.sdk' UNION ALL SELECT
3,'1.4.2.20 COMMIT COMPLETE 01/25/12 13:02:59' UNION ALL SELECT
4,'Tivoli.rte' UNION ALL SELECT
5,'3.7.1.0 COMMIT COMPLETE 01/25/12 12:32:14' UNION ALL SELECT
6,'X11.adt.bitmaps' UNION ALL SELECT
7,'5.3.0.0 COMMIT COMPLETE 01/25/12 13:07:11' UNION ALL SELECT
8,'Path: /usr/lib/xym' UNION ALL SELECT
9,'X11.adt.imake' UNION ALL SELECT
10,'5.3.0.30 COMMIT COMPLETE 01/25/12 13:07:11' UNION ALL SELECT
11,'Path: /usr2/lib/kpc' UNION ALL SELECT
12,'Tivoli.rte' UNION ALL SELECT
13,'3.7.1.0 COMMIT COMPLETE 01/25/12 12:32:14' UNION ALL SELECT
14,'3.7.1.0 COMMIT COMPLETE 01/25/12 12:32:14'

-- actual query
;WITH paths(PathID,PATH) AS (SELECT n,RAW FROM @t WHERE RAW LIKE 'Path:%')
, files(FileID,Filename) AS (SELECT n,RAW FROM @t WHERE RAW NOT LIKE '% %')
, rest(ID,DATA) AS (SELECT n,RAW FROM @t WHERE RAW LIKE '%COMMIT%COMPLETE%')
SELECT PATH,Filename,DATA --,PathID,FileID,id
FROM paths
CROSS JOIN files
CROSS JOIN rest
WHERE PathID=(SELECT MAX(PathID) FROM paths WHERE PathID<FileID)
AND FileID=(SELECT MAX(FileID) FROM files WHERE FileID<ID)[/code]I'm sure there's a more efficient way (not using CROSS JOIN) but my brain isn't working right now. Parsing the data columns shouldn't be too hard, I'll keep working on it and post a solution.
Go to Top of Page

aakcse
Aged Yak Warrior

570 Posts

Posted - 2012-08-06 : 17:04:25
I did not get any data when I ran in the actual env

Table name: Hist(Id,Raw)


;WITH paths(PathID,PATH) AS (SELECT id,Raw FROM Hist WHERE Raw LIKE '%Path:%')
est(ID,DATA) AS (SELECT id,Raw FROM Hist WHERE Raw LIKE '%COMMIT%COMPLETE%')

SELECT PATH,Filename,DATA --,PathID,FileID,id
FROM paths
CROSS JOIN files
CROSS JOIN rest
WHERE PathID=(SELECT MAX(PathID) FROM paths WHERE PathID<FileID)
AND FileID=(SELECT MAX(FileID) FROM files WHERE FileID<ID)


-Neil
Go to Top of Page

robvolk
Most Valuable Yak

15732 Posts

Posted - 2012-08-06 : 17:24:39
Try it without the WHERE clauses. You may get quite a lot of data so SELECT TOP 100 might be a good idea.

If that doesn't return anything I don't know what else to look.
Go to Top of Page

aakcse
Aged Yak Warrior

570 Posts

Posted - 2012-08-06 : 17:35:16
Thanks Robvolk,

The second query was not returning any data, I am sorry I did not get it why '% %' in second query to return all the rec?

SELECT id,Raw FROM Hist WHERE Raw NOT LIKE '% %'
changed it to
SELECT id,Raw FROM Hist WHERE Raw NOT LIKE 'Path:%' AND raw NOT LIKE '%COMMIT%COMPLETE%'


After this I am getting the data

-Neil
Go to Top of Page

robvolk
Most Valuable Yak

15732 Posts

Posted - 2012-08-06 : 17:54:41
Yeah, I meant to change it to what you posted but forgot.
Go to Top of Page

aakcse
Aged Yak Warrior

570 Posts

Posted - 2012-08-06 : 19:01:15
Thanks Robvolk/Visakh, I tested this for few of the records it looks like it is working. I'll concat all the 3 columns into one, and add proper delimiter.


-Neil
Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2012-08-07 : 09:48:53
ok...cool

------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/

Go to Top of Page
   

- Advertisement -