Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2005 Forums
 SSIS and Import/Export (2005)
 SSIS perf tuning - tables of 15M+ rows

Author  Topic 

Wanderer
Master Smack Fu Yak Hacker

1168 Posts

Posted - 2008-02-28 : 16:28:15
The challenge: I have to extract and convert data between 2 SQL server systems - only 4 tables on the source systems, 8 tables on the target system. Source tables have between 5,000 rows and 16,000,000 rows. For most of the tables (for example Customer, which goes into 4 target tables), there will be 1 row in target tables for each row in the mapped source system table - so my 13.5M customer rows will end up as around 40M rows across the 4 target tables. So far, so good. But - this is a 24x7 online retail web-site, and to get the data across as a clean process, we require the smallest possible duration.

I have progressed on the customer migration, and am testing on a test environment (2xdual core HT processors, 4 GB ram) which was 2.15 million rows. Live environment is likely to be a 4xdual core with 8-16 GB ram.

I am trying to optimize the extract data flow, and have read the SSISperfTuning doc. I am now trying to put that into practice.
I have a row size of approx 340 bytes, so based on that, and my test environment of 2.15 million rows, I work out at around 700 MB ram required to buffer the data. That is a factor of 7 times greater than the max buffer space for a data flow of 100 MB, which it seems, means I should divide the base MaxBufferRows (10000) by 7 to go down to 1400 rows?

I see a LOT of the following messages in my progress, when running with default settings:
[DTS.Pipeline] Information: The buffer manager detected that the system was low on virtual memory, but was unable to swap out any buffers. 30 buffers were considered and 30 were locked. Either not enough memory is available to the pipeline because not enough is installed, other processes are using it, or too many buffers are locked.

The design of the data flow at the moment is:


..........................................|--target table 1
SOURCE SP ---- MULTICAST---|--target table 2
..........................................|--target table 3
..........................................|--target table 4

any thoughts on Buffer tweaking, corrections to my assumption and other hints/techniques?


*##* *##* *##* *##*
Chaos, Disorder and Panic ... my work is done here!

Wanderer
Master Smack Fu Yak Hacker

1168 Posts

Posted - 2008-03-02 : 13:25:33
Some feedback....

Well, the problems APPEAR to have been relating to the server I was using. I was trying to do this on a test server, that had 4 Core, 4 GB ram. When I switched to a DEV server that is built in-line with our new live server configurations (similar to what th target servers will be) - 16 cores 16 GB ram, and not a single 'buffer manager' informational message. This may not help anyone else reading this problem, since not everyone is going to be able to jump onto a box with 4x the ram, but I thought I'd let you know.

Good news for me is that the live environment, when I have to run this conversion, will likely have 64 GB ram (Dell r900, 16 Cores, 64 GB ram - 6 of them in a 4 active, 2 passive cluster farm - that's some NICE ahrdware :D)

*##* *##* *##* *##*
Chaos, Disorder and Panic ... my work is done here!
Go to Top of Page

jezemine
Master Smack Fu Yak Hacker

2886 Posts

Posted - 2008-03-02 : 22:24:20
can I please have those 6 servers? drooling at the thought of pegging all 16 cpus with some number crunching...




elsasoft.org
Go to Top of Page

rmiao
Master Smack Fu Yak Hacker

7266 Posts

Posted - 2008-03-02 : 22:38:30
What's pagefile size on the server?
Go to Top of Page

mwjdavidson
Aged Yak Warrior

735 Posts

Posted - 2008-03-03 : 05:18:39
Have a look at the CPU usage during the multicast. Since this is synchronous transform, the same buffers are used for all outputs rendering it essentially single-threaded. Sticking an asynchronous transform (i.e. a Union All) into each path will start a new execution tree for each, each one using a making use of a separate processor.
See the following for more info:
[url]http://www.sqlis.com/182.aspx[/url]

NB. This workaround is no longer necessary in SS 2008.


Mark
Go to Top of Page

Wanderer
Master Smack Fu Yak Hacker

1168 Posts

Posted - 2008-03-03 : 05:55:44
quote:
Originally posted by mwjdavidson

Have a look at the CPU usage during the multicast. Since this is synchronous transform, the same buffers are used for all outputs rendering it essentially single-threaded. Sticking an asynchronous transform (i.e. a Union All) into each path will start a new execution tree for each, each one using a making use of a separate processor.
See the following for more info:
[url]http://www.sqlis.com/182.aspx[/url]

NB. This workaround is no longer necessary in SS 2008.


Mark



thanks MArk - I am aware of this thecnique, but not that link. My 'simplistic' design in the text was difficuly enough - I didn't cover that. The actual design has extended now, and I have 2 multicasts, and 9 union alls - 8 of them to try and ensure we get seperate threads, and one of them re-unioning 4 of the threads (this may go as I am currently doing 4 pivot's, the unioning the results - I may try and move that up the the sourcing SP to see if that improves performance)

*##* *##* *##* *##*
Chaos, Disorder and Panic ... my work is done here!
Go to Top of Page

Wanderer
Master Smack Fu Yak Hacker

1168 Posts

Posted - 2008-03-03 : 05:58:33
quote:
Originally posted by rmiao

What's pagefile size on the server?



On the dev server, with 16 GB ram, fixed size 16 GB (on the "D Drive" as opposed to the OS "C Drive").

*##* *##* *##* *##*
Chaos, Disorder and Panic ... my work is done here!
Go to Top of Page

mwjdavidson
Aged Yak Warrior

735 Posts

Posted - 2008-03-05 : 05:06:03
quote:
quote:
--------------------------------------------------------------------------------
Originally posted by mwjdavidson

Have a look at the CPU usage during the multicast. Since this is synchronous transform, the same buffers are used for all outputs rendering it essentially single-threaded. Sticking an asynchronous transform (i.e. a Union All) into each path will start a new execution tree for each, each one using a making use of a separate processor.
See the following for more info:
http://www.sqlis.com/182.aspx

NB. This workaround is no longer necessary in SS 2008.


Mark

--------------------------------------------------------------------------------



thanks MArk - I am aware of this thecnique, but not that link. My 'simplistic' design in the text was difficuly enough - I didn't cover that. The actual design has extended now, and I have 2 multicasts, and 9 union alls - 8 of them to try and ensure we get seperate threads, and one of them re-unioning 4 of the threads (this may go as I am currently doing 4 pivot's, the unioning the results - I may try and move that up the the sourcing SP to see if that improves performance)

*##* *##* *##* *##*
Chaos, Disorder and Panic ... my work is done here!

I didn't consider the limitations of the 'ASCII ETL Documenter'!

Mark
Go to Top of Page
   

- Advertisement -