Creating a Data Mart through replication

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

SQL Server 2008 Forums

Replication (2008)

Creating a Data Mart through replication

Author

Topic

NifflerX
Starting Member

29 Posts

Posted - 2012-06-05 : 14:21:24

Hello,

I've got a database that is part of a stand alone application living on its own server. I'd like to access that database for use with my general reporting database server. Both databases are running SQL 2008 R2. I've tried setting up a linked server from my reporting database to the standalone application, and the connection works but whenever I try to do anything more than query a table directly the speed slows down considerably. I've also tried caching the data locally on the reporting database, and when I do that I get great speed, because everything is local, but I have issues with the caching. So far my caching is done with stored procedures that clear and reload local copies of the tables in the stand alone database.

What'd I'd like to do is use replication to create a replica of the standalone application database on my reporting server and then query that. So the replication would be one-way only and that way I'd never hit the stand alone server with any of my reporting queries.

Is something like that possible? In all my reading about replication I've seen that the destination database is usually in a state where you can't query against it. It sits there in a warm state waiting for the replication link to break, or be stopped, and then the database can be brought live. Is it possible to have a database that is both receiving replicated data, and still available for regular T-SQL queries?

If something like that is not possible, does anyone have any ideas on other things I can try to get similar results? Thank you so much.

-NifflerX

nigelrivett
Master Smack Fu Yak Hacker

3385 Posts

Posted - 2012-06-06 : 04:21:57

How are you accessing the database? It sounds like the reporting app is getting the data then doing the joins itself - could be that it's just slow or there is a lot of data in the resultset.

Copying the data locally might be a good idea but I would find out what is causing the problem first.
If you always use stored procedures as the source for reports then it will be obvious where the issue is and it gives you more flexibility - also means that you might not need to copy all the data as the SP can extract data from some remote sources - even a file if needed.

You are probably thinking of log shipping where the database needs to be in standby mode to query - then the recovery is reversed to apply more logs. For replication the statements are applied against the destination and committed so the database is queryable. If you have a lot of batch processing or full table updates on the source then replication might not keep up - the source needs to be designed for it and there ae often tables that need to be copied outside replication. I would look at simpler solution first - like a backup and restore if it's just a daily static copy, copies of changed rows if less latency.

==========================================
Cursors are useful if you don't know sql.
SSIS can be used in a similar way.
Beer is not cold and it isn't fizzy.

NifflerX
Starting Member

29 Posts

Posted - 2012-06-06 : 14:18:06

Hi,

Thank you for the response. You are correct, the reporting database is grabbing the data from the stand alone server via a linked server and the joining it with data it has local. I think part of the slowness is that when pulling the data via a linked server the SQL engine is not able to do the same type of optimization.

As far as trying to determine what is causing the issues, usually when I look at the Actual Execution Plan, the objects with the highest percentages are the remote calls from via the linked server, which is why I started there. I don't know how to read the Actual Execution plans all the well, so if I should be looking at more than just the percentages, please let me know.

You are correct, I was thinking of log shipping, and as I looked more into publishing and subscribing I found that I can publish tables from the standalone database base and subscribe to them on my reporting database which gives me a readable local copy. I've just got that set up so I don't know if it fixes the speed issues, but is at least a promising start. Unfortunately I can't use a backup and restore method because the data needs to be as close to real-time as possible. The 5 minute window I was creating with the stored procedure cache method was acceptable, but much more than that would not be. I think I should be okay because the standalone database is not a very high transaction database, and only does major batch processing during off hours, when the publish/subscription would have much more time, 30 minutes, to catch up. So as long as the link didn't break, I think it would catch up okay.

Do you think I should continue to pursue the publish/subscribe method I'm currently working on, or are there things I'm missing that make this a bad idea? Are there other, better, ways of doing what I'm trying to do? Thank you again for your replies.

-NifflerX

nigelrivett
Master Smack Fu Yak Hacker

3385 Posts

Posted - 2012-06-07 : 04:12:46

Are these ad hoc queries? If yoou want to allow the developers to do whatever they wish then you have no real option other than to copy the data (or at least the large tables) as yoou can't optimise their processes.
If they are releasing processes then yoou can enforce the queries to be placed in stored procedures and optimise them before release. In this case yoou can limit the data that is copied - probably most of the time yoou can extract the data when needed (don't join to the remote server - copy data to one or the other then join, often best to copy the smaller tables to the remote server). A join will often table scan to get the data needed rather than just the rows needed so yoou will need to split up queries. I would look at that before trying to implement replication or anything else as that would be a big admin overhead and affect the source server.

==========================================
Cursors are useful if you don't know sql.
SSIS can be used in a similar way.
Beer is not cold and it isn't fizzy.

NifflerX
Starting Member

29 Posts

Posted - 2012-06-07 : 16:57:03

Hi,

Thank you very much for the response. Unfortunately almost all of the queries being done are views, and asking the developers to change all their code is not an option, so I need to get the data to them using views. So far in my testing the publication/subscription has actually worked quite well. One of the sticking points was that the standalone server uses a different collation, but once I resolved that I've seen speed increases in almost all the queries.

So my current testing setup is publication/subscription to my reporting database, and the views created on top of the published tables that the developers use in their queries. So far so good, but does anyone see a potential issue here that I'm missing? Am I using publication/subscription in ways they are intended to be used? Thank you very much.

-NifflerX

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources