Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2005 Forums
 SSIS and Import/Export (2005)
 Bulk inserts - performance and indexes!

Author  Topic 

duncanwill
Starting Member

20 Posts

Posted - 2011-05-09 : 05:43:59
Hi,

I am trying to fathom out the 'right' way of performing a very specific import task in SQL 2005 and SSIS in terms of performance. My development solution works but I am concerned about performance and thinking how best the process could be optimised to balance the insert and search requirements.

The scenario and current solution is:

-- Daily import from between 1 and 2 thousand CSV files (using SSIS "For Loop")

-- This creates between 1 and 2 million rows of data that I am currently putting into a non-indexed staging table. The reason for this is that this allows a quick load and also because the next step is to amalgamate some of the rows of data prior to (or infact during) the main insert to the 'live' table and also to create another normalised table for some of the data.

-- The final insert is currently performing a "SELECT INTO" executed from a stored procedure into the main live table. This creates around 3 to 5 hundred thousand inserts.

-- Data in the live table is retained for 90 days prior to being archived - so this insert represents 1/90th of the live tables data

-- The live table is used to produce search queries for a GUI - therefore there are indexes on the live table to enable this to work efficiently to the end user.


So my 'problem' is essentially trading off import performance against 'user front end' performance. I need the indexes in place to make the search useable, but at the same time this slows the import process. Currently, in DEV, the import process runs reasonbly well with the indexes in place - the load time varies between 1 and 2 hours. This is acceptable but I always want to improve on things! Also the live environment may well be slower - so an optimised process would be good to have from the start.

I am dabbling with the idea of dropping the indexes on the live table prior to the insert and then recreating them. I have read however that this can be of benefit only if the inserts are a large percentage of the total data in teh target table (in my case this will not be true as it is 1/90th of the target data).

The other thing i have noticed in the world of Google is the notion of logging and the effect that this may have on the insert performance. As I sit firmly on the Developer side of the fence (rather than DBA) I have very little knowledge about logging options - would there be anything in this that I could change as part of my insert process?

Finally - I wondered if my entire process could perhaps be improved on by either changing the SSIS package or indeed using a different method entirely? For example, currently I loop around all import files and insert into the staging table first and then do the final insert in one go. I assume that this would be more efficient than treating each file as a discrete operation and import each one seperately?

I realise that I can benchmark quite a lot of this to 'suck it and see' - but my main interest lies in what is 'theoretically' the best way to handle such a process... so any thoughts or pointers or discussions would be interesting to hear.

Cheers!

dinakar
Master Smack Fu Yak Hacker

2507 Posts

Posted - 2011-05-09 : 14:08:37
getting all the data from CSVs into staging tables(s) is a good idea so you can do a batch insert into your live tables.

how much data is in the live tables typically before you do the insert from staging table?

when you rebuild the indexes, there is a table lock and access to the tables itself is prevented. so I am not sure if this is acceptable to your users?
if that is not an option you can try rebuilding the index online specify ONLINE=ON option during index rebuild.. this will place a momentory lock on the pages just before the index pages are switched out but the index rebuild itself is done online on a separate copy..

if online indexing doesnt help much you can try dropping the indexes, do your insert then rebuild the indexes.

Dinakar Nethi
************************
Life is short. Enjoy it.
************************
http://weblogs.sqlteam.com/dinakar/
Go to Top of Page

duncanwill
Starting Member

20 Posts

Posted - 2011-05-09 : 17:07:47
Thanks for the reply - at least it sounds like i am on the right tracks.

Actually the import processing package is performed overnight 'out of hours'- so theoretically no problem if the aplication is unavailable for a short while. I am just trying to present the 'best' and most efficient solution so that any problems in the live environment with performance are minimised and to avoid conflicts with any other processes that may be running.

The daily insert will be roughly half a million records into a table of 45 million records.

Will a batch insert be more efficient - or would the benfit of a batch insert be seen in if the application is in current use?



Go to Top of Page

dinakar
Master Smack Fu Yak Hacker

2507 Posts

Posted - 2011-05-09 : 17:51:44
I am not sure how long your rebuild will take and if you can afford the extra downtime to rebuild the indexes on the table(s).
I would try the following: (1) manually run Update Statistics and monitor performance and (2) if you see performance degradation then rebuild the indexes.

As for batch inserts, do split the inserts into smaller batches so you dont have a long running transaction - perhaps a couple of thousands at a time..

Dinakar Nethi
************************
Life is short. Enjoy it.
************************
http://weblogs.sqlteam.com/dinakar/
Go to Top of Page
   

- Advertisement -