Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
Author |
Topic |
pachouds
Starting Member
3 Posts |
Posted - 2011-05-26 : 21:57:44
|
Hi guysI have been struggling with this one for a few weeks now and don’t seem to be getting anywhere.I am working on an application which will continuously bulk load data into a SQL SERVER 2000 database 24/7. Once the data is loaded it is used as read only and never modified. It will be a huge amount of data, probably more than 1 billion records for a 60 day period. Here are my requirements:1. I need to keep this data for a rolling 60 day period (data is deleted/truncated/dropped once older than 60 days).2. Data needs to be bulk loaded into this table very efficiently3. The latest data (loaded in the last 2 days) is summarised and inserted into summary tables every couple of hours so that the summary data is up to date (within a few hours)4. The new data is summarised based on the date it is was inserted and this summary data is inserted into summary tables 5. The summaries are quite often large group by’s queries6.Users need to be able to be able to run reports and query all the detailed data (last 60 days) using an application which requires this data to be one table or view.7.Every night at midnight data older than 60 days is deleted/truncated/droppedMy thoughts are to have 60 database tables which hold data based on the day that it was loaded. Every night at midnight the current table would be renamed. A new current table would be created and the table that is older than 60 days is dropped.When the current table is created there is a default column called created_date which will have a default value set to the date the table was created.This date is required so that I can easily summarise the data in this table into the summary tables. All summary data will be aggregated with the created_date. The current table needs to be summarised 3-5 times during the day and at midnight I will re-summarise the last 2 tables that were created.I would then create a view of all 60 tables including the current which can be queried by users.Questions:1.Is it possible to created a partitioned view using the created_date?2.If it is not possible to create a partitioned view, could I used the created_date as an index on the view?3.Would it be pointless to index the created_date in each table (as it is only one constant value) so the view could be queried by created_date?4.Can the current table which is being bulk loaded into be in the view?5. Would it be possible to reference the view when summarising the detailed data where the created_date > today -2 etc?6.Is there a better structure to use for what I am trying to accomplish?Thanks heaps in advance for any input!Best regardsSam |
|
tkizer
Almighty SQL Goddess
38200 Posts |
|
pachouds
Starting Member
3 Posts |
Posted - 2011-05-26 : 22:33:51
|
Hi TaraYeah, I think so.The application that I am working on is to replace an existing app that is doing this task using SQL Server 2000. I would like to come up with a solution using 2000, but it also would be a prime opportunity to upgrade. As and aside, would this be much easier to do with 2005 or 2008?ThanksSam quote: Originally posted by tkizer SQL Server 2000? Wow, this is going to be tough.Tara KizerMicrosoft MVP for Windows Server System - SQL Serverhttp://weblogs.sqlteam.com/tarad/Subscribe to my blog
|
|
|
Lumbago
Norsk Yak Master
3271 Posts |
Posted - 2011-05-27 : 10:06:38
|
This is gonna be tough but also FUN! I love performance tuning in limited environments :)I've done something remotely similar before for a phone company I used to work for, and what we did was basically partition the data like you say in to a number of tables (we had one for every 10 mill records) with a constraint on the ID-column of each table, so i.e. on one of the tables we had a constraint like this "ID >=10000000 AND ID < 20000000". Then we added a UNION ALL view with all active partitions and every time we wanted to add new partitions or delete old ones we just altered the view (and created/dropped table partitions of course). You can see a VERY simple sample here: http://thefirstsql.com/2010/06/18/partitioning-the-old-fashion-way/We also added a metadata table that had information about the first and last ID, CreateTime value etc in each partition that we used for joining as the ID was usually synthetic and hence didn't have any business value. As far as I remember it worked out halfay decent...- LumbagoMy blog-> http://thefirstsql.com/2011/02/07/regular-expressions-advanced-string-matching-and-new-split-function-sql-server-2008-r2/ |
|
|
tkizer
Almighty SQL Goddess
38200 Posts |
|
pachouds
Starting Member
3 Posts |
Posted - 2011-05-29 : 20:02:19
|
Thanks for the input Lumbago!I guess the guts of my problem is that there is an application that is going to load data into a database table 365 days a year 24 hours a day (except) midnight when I intend to do the table switch. I need this process to be automatic, with a stored proc that runs at midnight to drop view, create table, rename table, create view. I also need the inserted date in the tables (to be unique to each table) so that I can use this to delete and add summary data. In the view this inserted_date also needs to be used so the users can selected the amount of data they wish to query. So I need this data to be indexed. It would seem silly to index a column in a table that has only one value in it, but there are going to be 60 tables in the view and selecting from the view using this date should speed up those queries? It doesn't seem that I can have a partitioned view using a default date as the partitioning column. I also need to bulk load into the current table which I am not sure is possible in a partitioned view.Thanks againRegardsSamquote: Originally posted by Lumbago This is gonna be tough but also FUN! I love performance tuning in limited environments :)I've done something remotely similar before for a phone company I used to work for, and what we did was basically partition the data like you say in to a number of tables (we had one for every 10 mill records) with a constraint on the ID-column of each table, so i.e. on one of the tables we had a constraint like this "ID >=10000000 AND ID < 20000000". Then we added a UNION ALL view with all active partitions and every time we wanted to add new partitions or delete old ones we just altered the view (and created/dropped table partitions of course). You can see a VERY simple sample here: http://thefirstsql.com/2010/06/18/partitioning-the-old-fashion-way/We also added a metadata table that had information about the first and last ID, CreateTime value etc in each partition that we used for joining as the ID was usually synthetic and hence didn't have any business value. As far as I remember it worked out halfay decent...- LumbagoMy blog-> http://thefirstsql.com/2011/02/07/regular-expressions-advanced-string-matching-and-new-split-function-sql-server-2008-r2/
|
|
|
Lumbago
Norsk Yak Master
3271 Posts |
Posted - 2011-05-30 : 02:23:43
|
Some comments that might open a few doors:- you can do an BULK INSERT/INSERT INTO a view which means that you can actually change the tables in the view while the inserts stays the same, which in essense means that you can insert into different tables every day by using the same view name.- if you do the bulk loading to a table that is not a part of the reporting/grouping queries, you can load data to one table while the others are being queried. Then, when the loading is finished, you alter the reporting/grouping view so it also includes the newly loaded data- if you ALTER the views instead of DROP/CREATE you don't need a maintenance window (the downtime will be milliseconds)Here is the route I would probably take (or at least try to take):Prerquisites:- 60 tables (MyTable1 -> MyTable60)- 2 views (MyReportingView - queried by end users, MyInsertView - used for bulk loading)- for the sake of the example, let's say that you want to load data in to MyTable431. Alter MyReportingView by removing MyTable43 from the UNION ALL query 2. Truncate or drop/createMyTable432. Alter MyInsertView by replacing the table MyTable42 with MyTable433. BULK INSERT into MyInsertView (which will actually insert data in to MyTable434. When the BULK INSERT is finished, add MyTable43 to MyReportingView againBy doing things like this you will "fake" adding and removing partitions to your table and it should work pretty well and without downtime (at least in theory )- LumbagoMy blog-> http://thefirstsql.com/2011/02/07/regular-expressions-advanced-string-matching-and-new-split-function-sql-server-2008-r2/ |
|
|
|
|
|
|
|