Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
Author |
Topic |
Ali Chadorbaf
Starting Member
44 Posts |
Posted - 2008-03-26 : 15:46:27
|
Hi,We use lookups to join a few huge tables in SSIS (each has more than 40 million rows). The process took almost two days to complete when we select partial load on lookups. It stops/locks if we select full load on lookups. We have a 32bit server so SSIS uses only 2-3GB of available memory no matter how big RAM we have. It seems the best solution for my problem is to move to 64bit server so SSIS uses up to 16GB of Ram. For now I am researching for a remedy solution to get better performance from our current environment while we are waiting for the big server. I’d like to hear your thoughts and options that may improve the performance of our package. Dose partitioning help? What else could be helpful? |
|
nr
SQLTeam MVY
12543 Posts |
Posted - 2008-03-26 : 15:58:00
|
Can you do a sorted merge rather than a lookup? Be careful about ssis and sql server sort orders.Otherwise how about loading the data to a staging table then doing the lookup on sql server.==========================================Cursors are useful if you don't know sql.DTS can be used in a similar way.Beer is not cold and it isn't fizzy. |
 |
|
Ali Chadorbaf
Starting Member
44 Posts |
Posted - 2008-03-26 : 18:24:38
|
Thanks so much for the input. For using merage, I should get familiar to it to see if that's help.From your other clue to cross large data over two servers for two times, I am not sure that may help to get better performance( just a thought; not sure). I have to do some testing on both of your thoughts. Thanks once again and still I'd like to hear other ideas.More info about our case:We are working in data warehousing that is dealing with large amount of data The staging and datamart servers are two separate physical servers. In this process, the SSIS package reads source data from staging (server1) then uses lookups in datamart (server2) to find the keys and insert the results into datamart Fact table. We do not use dblink because that is too slow for large data transformation and Microsoft suggested using SSIS to get better performance. |
 |
|
nr
SQLTeam MVY
12543 Posts |
Posted - 2008-03-26 : 21:42:29
|
I would add a staging database to the datamart server or keep a copy of the PK's (or natural keys or whatever you need) on the staging server (depends on how you are using them).If possible I try to keep a copy of the presentation table data on a repository - then all the processing can be done there and only the data needed is transferred to the presentation. Depends a lot on the architecture.Doing thins you can end up using SSIS just to transfer data which is already flagged for transfer. SSIS is good at that sort of thing and you can do a lot of the manipulation in SQL which is easy - means you don't get stuck trying t figure out how to do things (but you can do that at your leisure to see if it's more efficient).==========================================Cursors are useful if you don't know sql.DTS can be used in a similar way.Beer is not cold and it isn't fizzy. |
 |
|
Ali Chadorbaf
Starting Member
44 Posts |
Posted - 2008-03-27 : 12:51:47
|
nr,Good point!. We are working/testing to created a little database on the server2( the one that datamarts are located) and copy only needed columns from staging(server1) to there in order to assemble fact table (however it takes times to copy large amount data crossing the servers; but still may give us better speed). Then assemble the fact table out there and finally feed the datamart Fact table.I'd like to keep this subject open to hear more ideas.Thanks once again for your help. |
 |
|
|
|
|