Author |
Topic |
dlorenc
Posting Yak Master
172 Posts |
Posted - 2010-05-25 : 10:09:13
|
I have a undelimited (fixed width) .txt file that is an export from a mainframe DB...I want to import it into sql using ssis. This will be done weekly.what process/tools would you use?...So far, this is my best solution:1. use excel to import the text file into a csv..adding the delimiters by hand (ouch, I wish this could be automated). Format the field to align to the field types in the target sql DB.2. ssis, use the excel connector, and the excel source widget, use data conversion widget for any fields that the csv didnt store correctly.3. Result is the .txt data moved into the target sql DB.Issues - I have 138 delimiters (parse the txt import in .csv to seperate into fields)...is there a way to automate the delimiters in a .csv import?; apply the formating (txt to integer/datetime). |
|
Transact Charlie
Master Smack Fu Yak Hacker
3451 Posts |
Posted - 2010-05-25 : 10:14:07
|
I don't have to do this but I'd use bcp and a format file:Start reading here.http://msdn.microsoft.com/en-us/library/ms188609.aspxIn particular: the section starting:To import data from data files with fixed-length or fixed-width fields, you must use a format file. For more information, see Understanding XML Format Files and Sample XML Format Files.Charlie===============================================================Msg 3903, Level 16, State 1, Line 1736The ROLLBACK TRANSACTION request has no corresponding BEGIN TRANSACTION |
|
|
webfred
Master Smack Fu Yak Hacker
8781 Posts |
Posted - 2010-05-25 : 10:18:12
|
I would import the file via import wizard (using flatfile connection with fixed width) into a staging table and then via select and convert into the destination table.If it is always the same fileformat (same fields, same width) then I would save the import as an SSIS package for future use. No, you're never too old to Yak'n'Roll if you're too young to die. |
|
|
dlorenc
Posting Yak Master
172 Posts |
Posted - 2010-05-25 : 10:24:28
|
yes, I see that bcp seems to be the right tool...but all the example I have found related to a delimited text file...what would you recommend I do for an undelimited text file?...delimit it?... |
|
|
dlorenc
Posting Yak Master
172 Posts |
Posted - 2010-05-25 : 10:28:52
|
webfred - with 138 fields, the import wizard is a bit of a pain in the tush to use....1. I parse the undelimited file in the connector, use the flat file source to bring it in, then I need to convert the txt fields into the target sql types (datetime, int, nvarchar)?...without the data conversion on every field, it seems to throw errors when I try to match up fields (field size is different in the flat file import and the target sql field)...is this better than bcp?or should I use a text editor and insert a delimiter in the text file first?...(sed?..awk? if I was in unix..?) |
|
|
Kristen
Test
22859 Posts |
Posted - 2010-05-25 : 13:48:01
|
Excel will, potentially, change the formatting of the data (strip any leading zeros off, convert 6 digit numbers into dates, convert domain names into hyperlinks, that sort of thing, and I hate it when I am given data that has been "collected" in ExcelI would use BCP to import fixed length records (or CSV for that matter).I personally only use SSIS for one-off jobs. I find it easier to log faults, and then diagnose them, using BCP.I agree with "pull it into a staging file and then sort it out into the target tables". Just pulling each column into a "generously wide" VARCHAR column means that the data, whatever errors it may have, will always import to the staging table, and then the "parsing" routine can react gracefully to badly formatted data (even if it evolves over time to recognise the various SNAFUs that arise). SSIS is just going to dump the whole lot on the floor and chuck its toys out of the pram when that happens! |
|
|
Kristen
Test
22859 Posts |
Posted - 2010-05-25 : 13:51:53
|
Ah ... just send the P.S. about AWK / SED ... blast-from-the-past.We used to use AWK for data conversion, but the scripts became horribly complicated as they had to fully process each row as-it-arrived.In SQL you can do multiple whole-table passes to check for "Is this a date" type stuff, and multiple, rather than single, passes doing UPDATES etc. to step-by-step get the data into the correct shape. I have to say, now (looking back) I far prefer the SQL way.If you have skills in AWK it would do to convert the fixed length to CSV (but I would import it as a fixed-length file, rather than converting it - in CSV what do you do with embedded commas? Double quotes? then what do you do with embedded quotes? double-them-up? what about embedded line-breaks? Convert to "/n" - OK then you also have to take care of escaping "/" ... its a minefield)P.S. AWK and SED are available from the GNU project - if you need PC-compatible versions. |
|
|
dlorenc
Posting Yak Master
172 Posts |
Posted - 2010-05-26 : 10:34:17
|
Kristen...well said...My current approach is import into a staging table (all fields are nvarchar)...Now I need to write the 'parsing' routine....thanks for the help to validate a solid approach...(and yes, I prefer the sql approach to screwing with excel or text editors... |
|
|
Kristen
Test
22859 Posts |
Posted - 2010-05-26 : 11:13:03
|
"Now I need to write the 'parsing' routine...."Off the top of my head:Add a column for "Error Message" - NULL= Data is OKThen things like:UPDATE USET Col1 = NullIf(RTrim(Col1), '')FROM MyTable AS UWHERE Col1 LIKE '% ' OR Col1 = '' -- End in space, or is completely empty (then convert to NULL)UPDATE USET ErrMsg = COALESCE(ErrMsg+'; ', '') + 'Col1=Invalid date'FROM MyTable AS UWHERE U.Col1 IS NOT NULL AND IsDate(U.Col1) = 0...SELECT PKCol1, PKCol2, ..., ErrMsgFROM MyTableWHERE ErrMsg IS NOT NULL -- Rows with errors |
|
|
|