Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2005 Forums
 SSIS and Import/Export (2005)
 .txt to .csv to sql? what tools would YOU use?

Author  Topic 

dlorenc
Posting Yak Master

172 Posts

Posted - 2010-05-25 : 10:09:13
I have a undelimited (fixed width) .txt file that is an export from a mainframe DB...I want to import it into sql using ssis. This will be done weekly.

what process/tools would you use?...So far, this is my best solution:

1. use excel to import the text file into a csv..adding the delimiters by hand (ouch, I wish this could be automated). Format the field to align to the field types in the target sql DB.

2. ssis, use the excel connector, and the excel source widget, use data conversion widget for any fields that the csv didnt store correctly.

3. Result is the .txt data moved into the target sql DB.

Issues - I have 138 delimiters (parse the txt import in .csv to seperate into fields)...is there a way to automate the delimiters in a .csv import?; apply the formating (txt to integer/datetime).

Transact Charlie
Master Smack Fu Yak Hacker

3451 Posts

Posted - 2010-05-25 : 10:14:07
I don't have to do this but I'd use bcp and a format file:

Start reading here.
http://msdn.microsoft.com/en-us/library/ms188609.aspx

In particular: the section starting:
To import data from data files with fixed-length or fixed-width fields, you must use a format file. For more information, see Understanding XML Format Files and Sample XML Format Files.


Charlie
===============================================================
Msg 3903, Level 16, State 1, Line 1736
The ROLLBACK TRANSACTION request has no corresponding BEGIN TRANSACTION
Go to Top of Page

webfred
Master Smack Fu Yak Hacker

8781 Posts

Posted - 2010-05-25 : 10:18:12
I would import the file via import wizard (using flatfile connection with fixed width) into a staging table and then via select and convert into the destination table.
If it is always the same fileformat (same fields, same width) then I would save the import as an SSIS package for future use.


No, you're never too old to Yak'n'Roll if you're too young to die.
Go to Top of Page

dlorenc
Posting Yak Master

172 Posts

Posted - 2010-05-25 : 10:24:28
yes, I see that bcp seems to be the right tool...

but all the example I have found related to a delimited text file...what would you recommend I do for an undelimited text file?...delimit it?...
Go to Top of Page

dlorenc
Posting Yak Master

172 Posts

Posted - 2010-05-25 : 10:28:52
webfred - with 138 fields, the import wizard is a bit of a pain in the tush to use....
1. I parse the undelimited file in the connector, use the flat file source to bring it in, then I need to convert the txt fields into the target sql types (datetime, int, nvarchar)?...without the data conversion on every field, it seems to throw errors when I try to match up fields (field size is different in the flat file import and the target sql field)...

is this better than bcp?

or should I use a text editor and insert a delimiter in the text file first?...(sed?..awk? if I was in unix..?)
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2010-05-25 : 13:48:01
Excel will, potentially, change the formatting of the data (strip any leading zeros off, convert 6 digit numbers into dates, convert domain names into hyperlinks, that sort of thing, and I hate it when I am given data that has been "collected" in Excel

I would use BCP to import fixed length records (or CSV for that matter).

I personally only use SSIS for one-off jobs. I find it easier to log faults, and then diagnose them, using BCP.

I agree with "pull it into a staging file and then sort it out into the target tables". Just pulling each column into a "generously wide" VARCHAR column means that the data, whatever errors it may have, will always import to the staging table, and then the "parsing" routine can react gracefully to badly formatted data (even if it evolves over time to recognise the various SNAFUs that arise). SSIS is just going to dump the whole lot on the floor and chuck its toys out of the pram when that happens!
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2010-05-25 : 13:51:53
Ah ... just send the P.S. about AWK / SED ... blast-from-the-past.

We used to use AWK for data conversion, but the scripts became horribly complicated as they had to fully process each row as-it-arrived.

In SQL you can do multiple whole-table passes to check for "Is this a date" type stuff, and multiple, rather than single, passes doing UPDATES etc. to step-by-step get the data into the correct shape. I have to say, now (looking back) I far prefer the SQL way.

If you have skills in AWK it would do to convert the fixed length to CSV (but I would import it as a fixed-length file, rather than converting it - in CSV what do you do with embedded commas? Double quotes? then what do you do with embedded quotes? double-them-up? what about embedded line-breaks? Convert to "/n" - OK then you also have to take care of escaping "/" ... its a minefield)

P.S. AWK and SED are available from the GNU project - if you need PC-compatible versions.
Go to Top of Page

dlorenc
Posting Yak Master

172 Posts

Posted - 2010-05-26 : 10:34:17
Kristen...well said...

My current approach is import into a staging table (all fields are nvarchar)...

Now I need to write the 'parsing' routine....

thanks for the help to validate a solid approach...(and yes, I prefer the sql approach to screwing with excel or text editors...
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2010-05-26 : 11:13:03
"Now I need to write the 'parsing' routine...."

Off the top of my head:

Add a column for "Error Message" - NULL= Data is OK

Then things like:

UPDATE U
SET Col1 = NullIf(RTrim(Col1), '')
FROM MyTable AS U
WHERE Col1 LIKE '% ' OR Col1 = '' -- End in space, or is completely empty (then convert to NULL)

UPDATE U
SET ErrMsg = COALESCE(ErrMsg+'; ', '') + 'Col1=Invalid date'
FROM MyTable AS U
WHERE U.Col1 IS NOT NULL AND IsDate(U.Col1) = 0

...

SELECT PKCol1, PKCol2, ..., ErrMsg
FROM MyTable
WHERE ErrMsg IS NOT NULL -- Rows with errors
Go to Top of Page
   

- Advertisement -