Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
Author |
Topic |
SergioM
Posting Yak Master
170 Posts |
Posted - 2012-06-06 : 18:08:29
|
I am working with flat files & excel files provided by several different distributors, so the quality of the data that I receive varies widely. So far, I have mapped everything properly, selected the data, that I wanted, set the datatypes, and **I have only imported the information that I want.**However, it has been suggested to me that the appropriate way is to import the flat file (completely) and then to filter out what I don't want as well as correct the datatypes later on. Do you have a preference? Is there a large benefit for one over the other? |
|
robvolk
Most Valuable Yak
15732 Posts |
Posted - 2012-06-06 : 18:21:29
|
The full import method may be more accommodating of changes in file layout. It's also usually easier to add new data sources, if you design generic import/staging tables. Without knowing how you do the selective method (SSIS? Pre-parse files? Magic?) it's hard to say which is better, but if the method works for you and you're happy with it then it's fine. |
|
|
yosiasz
Master Smack Fu Yak Hacker
1635 Posts |
Posted - 2012-06-07 : 11:59:16
|
classic ETL best practice (industry standard) that said you will have to do your thing1. extract and ingest to staging table(s) irregardless of source and type 2. Transform and clean up3. Then load to destination table. I would prefer this.<><><><><><><><><><><><><><><><><>If you don't have the passion to help people, you have no passion |
|
|
nigelrivett
Master Smack Fu Yak Hacker
3385 Posts |
Posted - 2012-06-07 : 12:49:48
|
I would do the import into a single column table and work on that in stored procedures if the data is bad and not too big. Easy to do a bulk insert for this - makes a difference if there might be missing row delimitters but still possible. Otherwise it might be better to use SSIS to parse the file and dump failed rows to an error table.Depends a lot on how much data and what types of errors you are expecting to handle.==========================================Cursors are useful if you don't know sql.SSIS can be used in a similar way.Beer is not cold and it isn't fizzy. |
|
|
SergioM
Posting Yak Master
170 Posts |
Posted - 2012-06-08 : 10:36:50
|
Great! I still don't know which way is best for me, but this gives me a few recourses. Moreover, I was curious to know what the industry standard is for this (as I'm sure I'm not the first guy to receive bad data) and now I know. Thanks everyone! |
|
|
nigelrivett
Master Smack Fu Yak Hacker
3385 Posts |
Posted - 2012-06-08 : 11:32:10
|
There isn't really an industry standard. Depends on how bad the data is and how forgiving you want the process to be.I usually work on the priciple that "bad data gets a file rejection" - but that often ends up with "bad data gets a file rejection apart from these errors which get automatically fixed or ignored".==========================================Cursors are useful if you don't know sql.SSIS can be used in a similar way.Beer is not cold and it isn't fizzy. |
|
|
visakh16
Very Important crosS Applying yaK Herder
52326 Posts |
Posted - 2012-06-08 : 12:23:55
|
we also do full dump of data onto single long varchar field table and then run a set of rules set in system against them to check and extract valid data amongst them. Depending on how complex your rules you can even build a customised rules engine by means of regular expression patterns and match against them to find out valid/invalid rows. You may capture invalid row details separately and feed it back to do some check and do reactive corrections if you want.------------------------------------------------------------------------------------------------------SQL Server MVPhttp://visakhm.blogspot.com/ |
|
|
|
|
|
|
|