Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 General SQL Server Forums
 New to SQL Server Programming
 Raw file cleaning

Author  Topic 

aakcse
Aged Yak Warrior

570 Posts

Posted - 2012-07-28 : 15:37:12
I am trying to write a sp, which adjust the number of columns based on the IN Parameter for the raw file.

Raw (Id Int,RawData varchar(max))

RawData has actual data with delimiter, this column has all the raw file imported as is.

e.g. data in RawData Column

A|B|C|D|E|F
X|B|C|D|E
Z
P|ABC|ZPD|BITX|E|ETC


Here above you can see the max number of delimiter are 5 number of column as 6, however for some records there are less delimiter, here comes the problem

For such records I want to add the extra delimter and take values for missing column as NULL or best would be''

plz help me in writing Generic SP to clean this.

We can pass the source & target table name and column names as a parameter, and create a clean table dynamically by name CleanedRaw etc.

here the file is | delimited, actually It can have any single char delimiter or space or tab also. I am ok if we need to pass the delimiter also as a parameter to this sp


-Neil

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2012-07-28 : 15:43:44
you mean padd delimiter and NULL to make up for maxmimum value of 6 per column?

------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/

Go to Top of Page

aakcse
Aged Yak Warrior

570 Posts

Posted - 2012-07-28 : 15:46:24
yes 6 is for this file, it can be more or less for other files

-Neil
Go to Top of Page

aakcse
Aged Yak Warrior

570 Posts

Posted - 2012-07-28 : 15:49:49
I mean padd extra delimiters to make up to the max of 6 for this file, it can vary for ther file

e.g.

A|B|C
x

to

A|B|C
x||

such that target cleaned table will have values as below


Col1 Col2 Col3
A B C
X



-Neil
Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2012-07-28 : 16:25:39
do you mean this?

UPDATE Raw
SET RawData = RawData + REPLICATE('|',5-(LEN(RawData) - LEN(REPLACE(RawData,'|',''))))


------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/

Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2012-07-28 : 17:12:53
If there is a MAXIMUM of 6 columns in the file, create six columns in the table and used RAGGED RIGHT when importing.


N 56°04'39.26"
E 12°55'05.63"
Go to Top of Page

aakcse
Aged Yak Warrior

570 Posts

Posted - 2012-07-29 : 10:32:51
Thanks Visakh & Peso.

The file is already imported, I do not have controls over the import of the raw file.

UPDATE Raw
SET RawData = RawData + REPLICATE('|',5-(LEN(RawData) - LEN(REPLACE(RawData,'|',''))))

will the above update works, If I pass the 5 hardcoded value as a paramter?


-Neil
Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2012-07-29 : 15:02:34
quote:
Originally posted by aakcse

Thanks Visakh & Peso.

The file is already imported, I do not have controls over the import of the raw file.

UPDATE Raw
SET RawData = RawData + REPLICATE('|',5-(LEN(RawData) - LEN(REPLACE(RawData,'|',''))))

will the above update works, If I pass the 5 hardcoded value as a paramter?


-Neil


what do you mean by 5 hardcoded value? whats the value you're passing?
can you give an example?

------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/

Go to Top of Page

aakcse
Aged Yak Warrior

570 Posts

Posted - 2012-07-29 : 15:16:28
I would be knowing the number of columns of the target table before hand, I need to adjust the delimiter in the source to fit it inthe target table.

Say If I have less delimiters, then I need to add the X number of delimter to that row of Raw_data column of source table
e.g. PQT
Then I need to update by padding 5 deimiter to make it 5deimiter with 6 column
PQT|||||

Say If I have more delimiters, then I need to eliminate the extra delimiter from the end & put the entire data into last column for extra delimiter.
e.g. A|B|C|D|E|F|7AP9|Y56
need to update the above rows as
A|B|C|D|E|F7AP9Y56



-Neil
Go to Top of Page

aakcse
Aged Yak Warrior

570 Posts

Posted - 2012-07-29 : 15:26:21
for data as below

Id Raw_Data
1 PQT
2 A|B|C|D|E|F|7AP9|Y56

After Cleaning It should be as below

ID Raw_Data
1 PQT|||||
2 A|B|C|D|E|F7AP9Y56

So that the updated data can be easily pumped into target table which is of 6 columns, for this file. It can be more or less for other files which I would be knowing before hand, hence I can pass this as a parameter to the sp, as I also the know the column name of the target table I can pass the, number of column and Column name with data type as a paramter to the SP, which can dynamically create the target table & populate it from the updated source table.

e.g

Create procedure Fix
(
@Source_Table varchar(max), --e.g. Raw
@Target_Table_Name varchar(max), --e.g Cleaned_Raw
@Target_Table_Col varchar(max), -- e.g. 'Col1 Varchar(max),Col2 Varchar(max),...Col6 varchar(max)'
@Number_of_Col varchar(max) -- e.g. if '6', means there should be 5 delimter
)
begin


end



[/code]



-Neil
Go to Top of Page

aakcse
Aged Yak Warrior

570 Posts

Posted - 2012-07-29 : 16:11:04
I just would like to know the logic in between BEING...END

Assuming the target table is created with the specified number of columns passed in the SP

I would interested in the cleaning part of the Source table, I have an SP when passed the Cleaned Source table populates the target table perfectly.

Hence I would more interested in cleaning logic of the source table. If some one could help with that, rest I'll take care

Many thanks



-Neil
Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2012-07-29 : 16:32:07
the logic i gave you will add missing delimiters in case the number of delimiters is less than 5

for the ones with more delimiters you need to use different logic to remove additional delimiters.

------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/

Go to Top of Page

aakcse
Aged Yak Warrior

570 Posts

Posted - 2012-07-29 : 17:46:38
Thanks Visakh,

So I can pass the hardcoded value 5 as a in parameter right?

I sometimes back developed a logic using REVERSE & STUFF function for removing additional delimiter, unfortunate I didnt kept that code..

-Neil
Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2012-07-29 : 20:07:48
quote:
Originally posted by aakcse

Thanks Visakh,

So I can pass the hardcoded value 5 as a in parameter right?

I sometimes back developed a logic using REVERSE & STUFF function for removing additional delimiter, unfortunate I didnt kept that code..

-Neil


sorry i still didnt get which harcoded value you're asking about
if its below one ,yes you can pass it


UPDATE Raw
SET RawData = RawData + REPLICATE('|',5-(LEN(RawData) - LEN(REPLACE(RawData,'|',''))))


------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/

Go to Top of Page

aakcse
Aged Yak Warrior

570 Posts

Posted - 2012-07-30 : 02:07:52
Sorry to confuse you, yes the same below one, now I need to think about the next part of eliminating the extra delimiter..Thanks for your help


UPDATE Raw
SET RawData = RawData + REPLICATE('|',5-(LEN(RawData) - LEN(REPLACE(RawData,'|',''))))




-Neil
Go to Top of Page

aakcse
Aged Yak Warrior

570 Posts

Posted - 2012-07-30 : 15:27:21
Hi Visakh,

Any help on removing the extra delimiter from the extreem right side of each records which has extra delimiters compare to the parameter passed in, and including it in a single column.

Regards,

-Neil
Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2012-07-30 : 15:56:14
not best of ways but still you can try like

;With CTE
AS
(
SELECT r.RawData,f.ID,f.Val
FROM Raw r
CROSS APPLY dbo.ParseValues(r.RawData,'|') f

)
UPDATE r1
SET r1.RawData = t.Modified
FROM Raw r1
INNER JOIN
(
SELECT r.Rawdata,
STUFF((SELECT CASE WHEN f.ID > 6 THEN '' ELSE '|' END + Val FROM CTE WHERE RawData = r.RawData FOR XML PATH('')),1,1,'') AS Modified
FROM (SELECT DISTINCT RawData FROM CTE)r
)t
ON t.RawData = r1.RawData


------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/

Go to Top of Page

aakcse
Aged Yak Warrior

570 Posts

Posted - 2012-07-30 : 16:05:12
hmm how about If I take cursor treat each record and put the 6 columns (in para) data up till 5 paramter in one variable and rest of the data in other variable,then from second variable remove the delimiter by replace function then concat and put it back in intermediate table ... that can be used to put it in final target table.

-Neil
Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2012-07-30 : 16:06:40
sorry ParseValues can be found in below link

http://visakhm.blogspot.com/2010/02/parsing-delimited-string.html

------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/

Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2012-07-30 : 16:08:05
quote:
Originally posted by aakcse

hmm how about If I take cursor treat each record and put the 6 columns (in para) data up till 5 paramter in one variable and rest of the data in other variable,then from second variable remove the delimiter by replace function then concat and put it back in intermediate table ... that can be used to put it in final target table.

-Neil


the posted method also works more of less like that. it parses out individual vales at first and then joins them dropping extra delimiters. it might perform a touch better than cursors approach

------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/

Go to Top of Page

aakcse
Aged Yak Warrior

570 Posts

Posted - 2012-07-30 : 17:46:46
Thanks Visakh, I'll update you once I complete this SP, the performance would not be big question here as it will be run only once to clean this file ( file size is also less with <1000 Rec)

Regards,
Asif

-Neil
Go to Top of Page
    Next Page

- Advertisement -