| Author |
Topic |
|
aakcse
Aged Yak Warrior
570 Posts |
Posted - 2012-08-06 : 15:35:23
|
I have to clean data as below, the 1,2,3... is just a sequence added, this data is in 2 columns only, one Id has sequence as seen below and other column Raw has entire data.I have created 3 tables one having id & data containing only Path:/...second having only data as Ora14.sdk.. and third having all commit .. complete kind of dataFew things common, commit and complete is common in all the records, and Path:/ data always starts with Path:/ based on this I was able to separate these into tables11 GoodTypes: /usr2/lib/kpc12 Tivoli.rte13 3.7.1.0 Com Pas 01/25/12 12:32:1414 3.7.1.0 Tom Pat 01/25/12 12:32:14 Now I need to join this table in such a way that I get the results as seen below, I am fine we can write a query and do this with out creating 3 tables... 11 GoodTypes: /usr2/lib/kpc12 Tivoli.rte13 3.7.1.0 Com Pas 01/25/12 12:32:1414 3.7.1.0 Tom Pat 01/25/12 12:32:14 -Neil |
|
|
visakh16
Very Important crosS Applying yaK Herder
52326 Posts |
Posted - 2012-08-06 : 15:51:02
|
| your file data format is not consistent. why is there no path information for second and third rowS?------------------------------------------------------------------------------------------------------SQL Server MVPhttp://visakhm.blogspot.com/ |
 |
|
|
robvolk
Most Valuable Yak
15732 Posts |
Posted - 2012-08-06 : 16:25:17
|
| Is it the same circumstances as this: http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=176908 |
 |
|
|
aakcse
Aged Yak Warrior
570 Posts |
Posted - 2012-08-06 : 16:25:54
|
| the Path is coming as a header say after 100+ odd records, the second row actually got split 2 rows.This could be a sub header Tivoli.rte as you can see on 12 row. still entire file is in one column Raw which has ID as identityI am fine if we can use cursor or query or splitting up into 3 tables and joining them.size of data will not be huge, it would be of less than 5K lines yes data is not consistent however we have commit and complete in all the records apart from headers & Path, based on which I was able to split in to different tables-Neil |
 |
|
|
robvolk
Most Valuable Yak
15732 Posts |
Posted - 2012-08-06 : 16:45:21
|
| [code]-- test dataDECLARE @t TABLE (n INT NOT NULL PRIMARY KEY, RAW VARCHAR(255))INSERT @tSELECT 1,'Path: /usr/lib/kpc' UNION ALL SELECT 2,'Ora14.sdk' UNION ALL SELECT 3,'1.4.2.20 COMMIT COMPLETE 01/25/12 13:02:59' UNION ALL SELECT 4,'Tivoli.rte' UNION ALL SELECT 5,'3.7.1.0 COMMIT COMPLETE 01/25/12 12:32:14' UNION ALL SELECT 6,'X11.adt.bitmaps' UNION ALL SELECT 7,'5.3.0.0 COMMIT COMPLETE 01/25/12 13:07:11' UNION ALL SELECT 8,'Path: /usr/lib/xym' UNION ALL SELECT 9,'X11.adt.imake' UNION ALL SELECT 10,'5.3.0.30 COMMIT COMPLETE 01/25/12 13:07:11' UNION ALL SELECT 11,'Path: /usr2/lib/kpc' UNION ALL SELECT 12,'Tivoli.rte' UNION ALL SELECT 13,'3.7.1.0 COMMIT COMPLETE 01/25/12 12:32:14' UNION ALL SELECT 14,'3.7.1.0 COMMIT COMPLETE 01/25/12 12:32:14'-- actual query;WITH paths(PathID,PATH) AS (SELECT n,RAW FROM @t WHERE RAW LIKE 'Path:%') , files(FileID,Filename) AS (SELECT n,RAW FROM @t WHERE RAW NOT LIKE '% %') , rest(ID,DATA) AS (SELECT n,RAW FROM @t WHERE RAW LIKE '%COMMIT%COMPLETE%')SELECT PATH,Filename,DATA --,PathID,FileID,idFROM pathsCROSS JOIN filesCROSS JOIN restWHERE PathID=(SELECT MAX(PathID) FROM paths WHERE PathID<FileID)AND FileID=(SELECT MAX(FileID) FROM files WHERE FileID<ID)[/code]I'm sure there's a more efficient way (not using CROSS JOIN) but my brain isn't working right now. Parsing the data columns shouldn't be too hard, I'll keep working on it and post a solution. |
 |
|
|
aakcse
Aged Yak Warrior
570 Posts |
Posted - 2012-08-06 : 17:04:25
|
I did not get any data when I ran in the actual env Table name: Hist(Id,Raw);WITH paths(PathID,PATH) AS (SELECT id,Raw FROM Hist WHERE Raw LIKE '%Path:%') est(ID,DATA) AS (SELECT id,Raw FROM Hist WHERE Raw LIKE '%COMMIT%COMPLETE%')SELECT PATH,Filename,DATA --,PathID,FileID,idFROM pathsCROSS JOIN filesCROSS JOIN restWHERE PathID=(SELECT MAX(PathID) FROM paths WHERE PathID<FileID)AND FileID=(SELECT MAX(FileID) FROM files WHERE FileID<ID) -Neil |
 |
|
|
robvolk
Most Valuable Yak
15732 Posts |
Posted - 2012-08-06 : 17:24:39
|
| Try it without the WHERE clauses. You may get quite a lot of data so SELECT TOP 100 might be a good idea.If that doesn't return anything I don't know what else to look. |
 |
|
|
aakcse
Aged Yak Warrior
570 Posts |
Posted - 2012-08-06 : 17:35:16
|
Thanks Robvolk,The second query was not returning any data, I am sorry I did not get it why '% %' in second query to return all the rec?SELECT id,Raw FROM Hist WHERE Raw NOT LIKE '% %'changed it to SELECT id,Raw FROM Hist WHERE Raw NOT LIKE 'Path:%' AND raw NOT LIKE '%COMMIT%COMPLETE%' After this I am getting the data-Neil |
 |
|
|
robvolk
Most Valuable Yak
15732 Posts |
Posted - 2012-08-06 : 17:54:41
|
| Yeah, I meant to change it to what you posted but forgot. |
 |
|
|
aakcse
Aged Yak Warrior
570 Posts |
Posted - 2012-08-06 : 19:01:15
|
| Thanks Robvolk/Visakh, I tested this for few of the records it looks like it is working. I'll concat all the 3 columns into one, and add proper delimiter.-Neil |
 |
|
|
visakh16
Very Important crosS Applying yaK Herder
52326 Posts |
Posted - 2012-08-07 : 09:48:53
|
| ok...cool------------------------------------------------------------------------------------------------------SQL Server MVPhttp://visakhm.blogspot.com/ |
 |
|
|
|