Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2005 Forums
 SSIS and Import/Export (2005)
 Looking for a simple to use big data ETL tool

Author  Topic 

SlinginParts
Starting Member

18 Posts

Posted - 2012-06-27 : 15:29:57
Well,
I talked to my data vendor and they basically offered no support for SSIS for their data. Right now I'm working on trying to import a 3gb file that is throwing all sorts of errors. It was recommended that I use a third party ETL tool. The question now is which one?
I really only need it to do one thing, take all this data and put it into SQL. I had a very small chunk of the data, 1\128 of the data and I couldn't get SSIS to even eat that. As a form of desperation I turned it into a CSV, imported it into ACCESS, outputted as an MDB and it went in on the small side. So as ghetto as that was it seemed to work, then when I tried to apply that same process to the larger side it choked trying to eat it all as one giant file.
I would prefer something free, but if I have to pay then I have to pay. A lot of these packages are way more complex than I want or need. Any recommendations would help.

robvolk
Most Valuable Yak

15732 Posts

Posted - 2012-06-27 : 15:40:47
quote:
I really only need it to do one thing, take all this data and put it into SQL
Well that's certainly what SSIS can do, but that's about the vaguest possible way of describing your situation.
quote:
As a form of desperation I turned it into a CSV, imported it into ACCESS
How did you manage to turn it into a CSV file? If you did that, guess what? SSIS can import that into SQL Server. If you got it into Access then SSIS->SQL Server should be easy.
quote:
It was recommended that I use a third party ETL tool
Who recommended that? Which tool(s) did they recommend? Does the person making the recommendation understand that SSIS IS an ETL tool?
quote:
Right now I'm working on trying to import a 3gb file that is throwing all sorts of errors
It would help if you posted some of the errors and sample lines that generate them. Plus some clean lines that import successfully. SSIS can handle much larger than 3 GB files.
Go to Top of Page

SlinginParts
Starting Member

18 Posts

Posted - 2012-06-27 : 16:05:30
quote:
Originally posted by robvolk

quote:
I really only need it to do one thing, take all this data and put it into SQL
Well that's certainly what SSIS can do, but that's about the vaguest possible way of describing your situation.
quote:
As a form of desperation I turned it into a CSV, imported it into ACCESS
How did you manage to turn it into a CSV file? If you did that, guess what? SSIS can import that into SQL Server. If you got it into Access then SSIS->SQL Server should be easy.
quote:
It was recommended that I use a third party ETL tool
Who recommended that? Which tool(s) did they recommend? Does the person making the recommendation understand that SSIS IS an ETL tool?
quote:
Right now I'm working on trying to import a 3gb file that is throwing all sorts of errors
It would help if you posted some of the errors and sample lines that generate them. Plus some clean lines that import successfully. SSIS can handle much larger than 3 GB files.



I'm aware that SSIS is an ETL tool however my vendor has refused to support both SQL and SSIS. They have other clients which also have the same problems as me and their recommendation was to use a third party tool to import it.

I think I may scramble the file contents and then upload it and see if someone else can help crack the code here.
Go to Top of Page

SlinginParts
Starting Member

18 Posts

Posted - 2012-06-27 : 17:44:41
I'm chasing down another alternative to using a 3rd party ETL by trying to find an editor which can handle this file. There are plenty of text editors, but I'm trying to figure out how to assign it the rules that this file is supposed to be following anyways. Once I do that I suspect I should be able to import it into SQL fine.
Go to Top of Page

robvolk
Most Valuable Yak

15732 Posts

Posted - 2012-06-27 : 17:58:42
Are you dealing with lines that:

- contain garbage data?
- contain malformed or missing columns?
- contain additional columns?
- are missing delimiters?

There are several ways to scrub these kinds of lines, they just require more than a simple data transformation.
Go to Top of Page

SlinginParts
Starting Member

18 Posts

Posted - 2012-06-27 : 22:35:16
quote:
Originally posted by robvolk

Are you dealing with lines that:

- contain garbage data?
- contain malformed or missing columns?
- contain additional columns?
- are missing delimiters?

There are several ways to scrub these kinds of lines, they just require more than a simple data transformation.



Yes. Here is the documented form of the data as specified by the vendor. However, it does not seem that all the data conforms to this. So for instance, the title column, one of them may have 270 characters. These issues are numerous just by the nature of the size of the data. There are 5 primary tables, and they are in the 10s of millions of rows.

CREATE TABLE [dbo].[EFC] (
[ItemID] decimal(18,0),
[Title] varchar(255),
[Subtitle] varchar(255),
[StartDate] datetime,
[EndDate] datetime,
[DurationDays] decimal(18,0),
[BidCode] decimal(18,0),
[CurrencyCode] decimal(9,0),
[Cat1] decimal(12,0),
[Cat2] decimal(12,0),
[SellerID] decimal(18,0),
[StartPrice] decimal(15,2),
[ReservePriceFlag] char(1),
[CurrentPrice] decimal(15,2),
[BINPrice] decimal(15,2),
[GMV] decimal(15,2),
[QTY_Sold] decimal(9,0),
[QTY_Available] decimal(9,0),
[BINFlag] varchar(1),
[Seller_Country_ID] decimal(9,0),
[Ended_Flag] varchar(1),
[RelistID] decimal(18,0),
[TrsFlag] varchar(1),
[MobilePurchaseFlag] varchar(1),
[MobileOSFlag] varchar(255),
[MobileListingFlag] varchar(1),
[MobileListingOSFlag] varchar(255)
)

I'm not a programmer so this is causing trouble for me, I'm already learning SQL just to get this done because every DBA I've found has bailed on me for the project, either paid or unpaid. I guess I was really naive to think I was going to be able to marry these huge data sources without issue. An experienced DBA with a telecom background was impressed with the size and complexity of the data. So I'm really just having to learn this myself, given the fact I can't get anyone to stick around.

It took me about 1 day to get into SQL and start writing queries, but the other ancillary learning topics are quiet large. It seems there are a ton of brute and stupid ways to do things and you have to learn your away around doing them right!

Either way, I wanted to preface that since "write a perl script" is probably more difficult than you think for me. Unless that's what it takes, then I'll have to do it!

Here is the first two lines

ItemID|Title|Subtitle|StartDate|EndDate|DurationDays|BidCode|CurrencyCode|Cat1|Cat2|SellerID|StartPrice|ReservePriceFlag|CurrentPrice|BINPrice|GMV|QTY_Sold|QTY_Available|BINFlag|Seller_Country_ID|Ended_Flag|RelistID|TrsFlag

110301180360|NEW Chevrolet Blazer Silverado Suburban Tahoe Key Blank||18-OCT-08 06:20:20|31-JAN-12 06:20:20|1200|9|1|40016|0|97733630|3.29|N|3.29|0.00|2220.75|675|782|N|1|N|110290786003|Y


They are delete delimited, but that character doesn't pop up in here.
Go to Top of Page

SlinginParts
Starting Member

18 Posts

Posted - 2012-06-27 : 22:44:08
Thinking I'm probably going to have to figure out how to write a script for this so I'm searching for VBA examples since I have VS10P for SSIS anyways.
Go to Top of Page

robvolk
Most Valuable Yak

15732 Posts

Posted - 2012-06-28 : 11:24:56
What do you do with data that's too long? Truncate it to fit? Or not import it at all?

I'm not clear on why your vendor's data is a problem. Didn't they provide the database schema? Or was that built internally? If their data doesn't fit their schema, the problem is on their side.
quote:
They are delete delimited, but that character doesn't pop up in here.
Never heard of "delete delimited", do you have an ASCII value for that?
Go to Top of Page

SlinginParts
Starting Member

18 Posts

Posted - 2012-06-28 : 12:27:38
quote:
Originally posted by robvolk

What do you do with data that's too long? Truncate it to fit? Or not import it at all?

I'm not clear on why your vendor's data is a problem. Didn't they provide the database schema? Or was that built internally? If their data doesn't fit their schema, the problem is on their side.
quote:
They are delete delimited, but that character doesn't pop up in here.
Never heard of "delete delimited", do you have an ASCII value for that?



Their data doesn't match their schema. ASCII character 127! Delete but also known as rubout ;)

Despite the data not matching, from what I gather its basically "you have the data, deal with it."

Problem I'm running into mostly is because I'm a really really really small player as a buyer of their data. They're mostly banking on the fact that I'll be a subscriber but I think they also want to make sure I'm not going to be too much trouble.

Normally with most kinds of data you have several companies offering it, not in this case. And the data is too valuable for me to get any sense of pride about what I should be getting.
Go to Top of Page

robvolk
Most Valuable Yak

15732 Posts

Posted - 2012-06-28 : 13:15:04
If you have the option, I'd modify the table schema to accommodate the crap they're sending you. I'd also suggest looking at their license agreement or having someone legal look at it. They're obviously selling a defective product to you.

If that's not an option, you can certainly try the Perl or VBScript route to clean up the file. You can still use SSIS to manage importing the data, with the option of redirecting the problematic rows (error output) to another table for further examination and cleaning. It's not difficult to do. You certainly don't need (and won't benefit from) a 3rd party ETL tool.
Go to Top of Page

SlinginParts
Starting Member

18 Posts

Posted - 2012-06-28 : 15:12:15
Yeah I wish there was a better way around it, but when you have a company that sells data to universities, government and intel and they're willing to send some to me, I'm not complaining!
Go to Top of Page
   

- Advertisement -