Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 General SQL Server Forums
 New to SQL Server Programming
 SQL Server parallelism Vs ETL tool parallelism

Author  Topic 

sura
Starting Member

10 Posts

Posted - 2011-07-03 : 21:57:24
Hi there,

I am from ETL background, and not so good about SQL Server parallelism capability. But I am in the position to compare merits and demerits of SQL vs Datastage (ETL tool from IBM) approach.

In ETL I can create 100 jobs which will run almost at the same time and finish. So all the jobs are going to run in parallel. But if I want to do it in SQL server I am not sure how this can be possible!!

What I need to know is, how efficient SQL Server is to handle the below given scenario?

OS Windows
DB SQL Server 2008 R2

Example 1
Pull the data from SQL Server DB and load into SQL Server DB (source to target)
I have 100 tables from the source SQL Server DB. I need to pull the data from source, find the changes (delta) by using check sum based on Key columns and load the delta data (newly changed data) into the target SQL Server table, using SQL Server procedures.

Example 2
Pull the data from Postgre SQL DB
I have 110 tables from Postgres SQL DB. I need to pull the data, find the changes (delta) by using check sum based on Key and load the delta data (newly changed data) into the target SQL Server table, using SQL Server procedures.

Example 3
Pull the data from Oracle DB
I have 75 tables from Oracle DB. I need to pull the data, find the changes (delta) by using check sum based on Key and load the delta data (newly changed data) into the target SQL Server table, using SQL Server procedures.

Questions:

1. Did the SQL Server will pull (Consider all the example) the tables in Sequence (take the first table, load and then another...) or is it capable to run in parallel?

1a. If it is capable, how to find, how many tables it is pulling at a time?

2. How efficient SQL Server is to pull the data other than SQL Server DB, change the data types (Consider date and other data types) and load into SQL Server?

It would be great if any one comment in this.

Thanks in advance,
SQL Server Starter

Lumbago
Norsk Yak Master

3271 Posts

Posted - 2011-07-04 : 04:55:58
Your trying to compare apples with oranges here. SQL Server is a *database engine* and not an ETL tool. SQL Server does have an ETL tool called SQL Server Integration Services but I have no idea how that compares with IBMs tools.

When it comes to your questions of running things in parallel sql server is very capable, but getting things to run in parallel without using SSIS is cumbersome. When a procedure is executed it connects to the database engine and everything that procedure does it does in sequence using the same connection (with some exceptions). Getting several procedures to run in parallel means you have to open several different connections at the same time and the easiest way of doing this is to use SSIS or to chedule the procedures to run at the same time using SQL Server Agent.

Regarding the efficiency of ETL transformations you'll basically have to try. But SQL Server does hold the current world record for moving data through an ETL process -> http://blogs.msdn.com/b/sqlperf/archive/2008/02/27/etl-world-record.aspx

- Lumbago
My blog-> http://thefirstsql.com/2011/02/07/regular-expressions-advanced-string-matching-and-new-split-function-sql-server-2008-r2/
Go to Top of Page

sura
Starting Member

10 Posts

Posted - 2011-07-05 : 21:26:16
Thanks for the reply.

Sorry, I know that I am trying to compare apple with orange. But sometime need to compare the efficiency the way how these two will handle the same situation. This will help to choose the best approach to rech the goal.

Again Thanks for the comments.

Cheers,

Go to Top of Page

nigelrivett
Master Smack Fu Yak Hacker

3385 Posts

Posted - 2011-07-05 : 22:16:56
Don't know about 2 but for example 1 and 3 it will probably depend more on the experience of the implementer than the product.
Also efficiency probably isn't the most important factor - cost of implementation and mainenance would outweigh it - these would both be higher to get the same performance with less experienced staff, sometimes orders of magnitude.
That's without considering network and disk configurations.


==========================================
Cursors are useful if you don't know sql.
SSIS can be used in a similar way.
Beer is not cold and it isn't fizzy.
Go to Top of Page
   

- Advertisement -