Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
Author |
Topic |
sql_er
Constraint Violating Yak Guru
267 Posts |
Posted - 2013-02-15 : 17:02:12
|
Hi,We have the following scenario: We receive CSV files every month for which SSIS packages were built to process the data. The following problems occur from time to time:1. The structure of the CSV file changed (e.g. column added or removed)2. There were no footers in the data, but now footers started to appear3. Date format changed (e.g. used to be mm/dd/yyyy, but became mm.dd.yyyy)4. Number format changed (e.g. from 2000 to 2,000)Currently we have person who manually opens each file, and using our "validation document" validates to ensure none of these or similar problems occur. We would like to move away from this manual process if possible and are looking for suggestions.I understand that items 3. and 4. could be caught by loading data into a staging table with VARCHAR data types, and performing validation before moving it any further.Item 2 is a bit questionable (meaning depending on the footer size SSIS load could fail or not).Item 1, however, is a sure fail of the SSIS package that directly loads the data into a table.Thus I feel the two possible options are:1. Create a custom script that will run through the file, row by row, apply all the necessary validations and report an error or continue if all checks out2. Use some 3rd party tool to validate the files (semi-manually) before kicking off the SSIS processing.My questions are:1. If you've had encountered a similar problem, how did you resolve it? If you did build a custom script, could you share, or do you know of some Framework that was built that could be used somewhat as plug and play?2. Does anyone know of good 3rd party tool(s) to assist in this process?Thanks in advance! |
|
yosiasz
Master Smack Fu Yak Hacker
1635 Posts |
Posted - 2013-03-08 : 11:17:59
|
1. you could create a tmp table dynamically based on dynamic csv file2. This one it tough3. Load it like you said to varchar4. Load it like you said to varcharQuestion for you:1. How many csv providers do you have? Internal or external?2. Are you a consumer or are you a service providerValidation of CSV files are a nightmare and endless tweaks since you cannot possibly know what csv providers can mess up.We provide product hosting for contributors and have made it so that csv providers meet rigid csv formatting, if not we still runit through ssis and if it does not meet formatting move it to Archived and send report of offensive files.You will be wasting precious time to account for every possible deviation. Also if at all opt for other data format such as xml (my fav)Hope this helps<><><><><><><><><><><><><><><><><>If you don't have the passion to help people, you have no passion |
|
|
|
|
|
|
|