Procedural Question (Not technical)

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

SQL Server 2008 Forums

SSIS and Import/Export (2008)

Procedural Question (Not technical)

Author

Topic

SergioM
Posting Yak Master

170 Posts

Posted - 2012-06-06 : 18:08:29

I am working with flat files & excel files provided by several different distributors, so the quality of the data that I receive varies widely.

So far, I have mapped everything properly, selected the data, that I wanted, set the datatypes, and **I have only imported the information that I want.**

However, it has been suggested to me that the appropriate way is to import the flat file (completely) and then to filter out what I don't want as well as correct the datatypes later on.

Do you have a preference? Is there a large benefit for one over the other?

robvolk
Most Valuable Yak

15732 Posts

Posted - 2012-06-06 : 18:21:29

The full import method may be more accommodating of changes in file layout. It's also usually easier to add new data sources, if you design generic import/staging tables. Without knowing how you do the selective method (SSIS? Pre-parse files? Magic?) it's hard to say which is better, but if the method works for you and you're happy with it then it's fine.

yosiasz
Master Smack Fu Yak Hacker

1635 Posts

Posted - 2012-06-07 : 11:59:16

classic ETL best practice (industry standard) that said you will have to do your thing

1. extract and ingest to staging table(s) irregardless of source and type
2. Transform and clean up
3. Then load to destination table. I would prefer this.

<><><><><><><><><><><><><><><><><>
If you don't have the passion to help people, you have no passion

nigelrivett
Master Smack Fu Yak Hacker

3385 Posts

Posted - 2012-06-07 : 12:49:48

I would do the import into a single column table and work on that in stored procedures if the data is bad and not too big. Easy to do a bulk insert for this - makes a difference if there might be missing row delimitters but still possible.
Otherwise it might be better to use SSIS to parse the file and dump failed rows to an error table.

Depends a lot on how much data and what types of errors you are expecting to handle.

==========================================
Cursors are useful if you don't know sql.
SSIS can be used in a similar way.
Beer is not cold and it isn't fizzy.

SergioM
Posting Yak Master

170 Posts

Posted - 2012-06-08 : 10:36:50

Great! I still don't know which way is best for me, but this gives me a few recourses. Moreover, I was curious to know what the industry standard is for this (as I'm sure I'm not the first guy to receive bad data) and now I know. Thanks everyone!

nigelrivett
Master Smack Fu Yak Hacker

3385 Posts

Posted - 2012-06-08 : 11:32:10

There isn't really an industry standard. Depends on how bad the data is and how forgiving you want the process to be.
I usually work on the priciple that "bad data gets a file rejection" - but that often ends up with "bad data gets a file rejection apart from these errors which get automatically fixed or ignored".

==========================================
Cursors are useful if you don't know sql.
SSIS can be used in a similar way.
Beer is not cold and it isn't fizzy.

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2012-06-08 : 12:23:55

we also do full dump of data onto single long varchar field table and then run a set of rules set in system against them to check and extract valid data amongst them. Depending on how complex your rules you can even build a customised rules engine by means of regular expression patterns and match against them to find out valid/invalid rows. You may capture invalid row details separately and feed it back to do some check and do reactive corrections if you want.

------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources