Replication vs. T-SQL MERGE? - SQL Server Forums

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

SQL Server 2008 Forums

Replication (2008)

Replication vs. T-SQL MERGE?

Author

Topic

malachi151
Posting Yak Master

152 Posts

Posted - 2012-02-16 : 06:54:18

So I'm looking into the best way to implement replication from our production environment to our internal data warehouse.

I've never setup replication before, but for the life of me, having looked into this,the first question that jumps out is, why can't I just create a Job that runs T-SQL MERGE statements using linked servers?

I'd want to start with a snapshot of course, but after that we are looking for a nightly update of the warehouse with the transactional data.

Also, and this may be a factor as well, the production server is SQL 2005, but the data warehouse server is SQL 2008 R2.

robvolk
Most Valuable Yak

15732 Posts

Posted - 2012-02-16 : 07:56:29

quote:
why can't I just create a Job that runs T-SQL MERGE statements using linked servers?

You can. Be sure to watch the glaciers as they fly by while it runs.

The last things you want to do over linked servers are JOINs, MERGE, or UNION/INTERSECT/EXCEPT. The underlying architecture of linked servers will convert these operations to cursors of the worst kind. You would quite literally be better off copying all the production data over to the data warehouse server and processing the MERGE locally.

The huge benefit of replication for your scenario is that it will only copy changes. MERGE would have to evaluate all rows in both sets.

If you don't want to deal with replication and its limitations, you can log ship your production database to your data warehouse server in standby mode, and then perform a MERGE. It's still likely that replication will be more efficient overall, but it's another option.

malachi151
Posting Yak Master

152 Posts

Posted - 2012-02-16 : 08:03:08

quote:
Originally posted by robvolk

quote:
why can't I just create a Job that runs T-SQL MERGE statements using linked servers?
You can. Be sure to watch the glaciers as they fly by while it runs.

The last things you want to do over linked servers are JOINs, MERGE, or UNION/INTERSECT/EXCEPT. The underlying architecture of linked servers will convert these operations to cursors of the worst kind. You would quite literally be better off copying all the production data over to the data warehouse server and processing the MERGE locally.

The huge benefit of replication for your scenario is that it will only copy changes. MERGE would have to evaluate all rows in both sets.

If you don't want to deal with replication and its limitations, you can log ship your production database to your data warehouse server in standby mode, and then perform a MERGE. It's still likely that replication will be more efficient overall, but it's another option.

Thanks, this certainly helps get me pointed in the right direction.

So essentially, MERGE should only be used between tables on the same instance, got it...

robvolk
Most Valuable Yak

15732 Posts

Posted - 2012-02-16 : 08:11:55

I should clarify something. MERGE can be efficient if you have a datetime or identity column that can be used in the WHERE clause to limit the rows being processed. You'd have to track the last time the warehouse was updated. If you already have a trigger that updates a ModifiedDate column or similar, that would be the best option. But if you don't then it's not worth adding just to perform a MERGE.

malachi151
Posting Yak Master

152 Posts

Posted - 2012-02-16 : 10:52:34

quote:
Originally posted by robvolk

I should clarify something. MERGE can be efficient if you have a datetime or identity column that can be used in the WHERE clause to limit the rows being processed. You'd have to track the last time the warehouse was updated. If you already have a trigger that updates a ModifiedDate column or similar, that would be the best option. But if you don't then it's not worth adding just to perform a MERGE.

The publishing database is basically insert only, no updates, and the tables all have a sequential primary key, so a MERGE would be pretty simple. All it would have to do is INERT records WHEN NOT MATCHED BY TARGET, that's pretty much it...

russell
Pyro-ma-ni-yak

5072 Posts

Posted - 2012-02-16 : 11:11:25

Replication was invented for this and is going to almost certainly be the most efficient way.

malachi151
Posting Yak Master

152 Posts

Posted - 2012-02-16 : 11:42:21

quote:
Originally posted by russell

Replication was invented for this and is going to almost certainly be the most efficient way.

Yeah, the only complicating factor here is that we have multiple separate databases with identical schema's (each client has their own database), and there is no unique id in the tables across databases, i.e. there are overlapping primary keys.

We want to pull the data from each client into a single data warehouse, but in order to do that we will have to append a client id column to the data. I figured this would be easy using T-SQL MERGE, but I'm not sure if this will complicate things using Replication.

I know you can replicate from multiple sources into a single source, but how does that work if there are overlapping primary keys?

russell
Pyro-ma-ni-yak

5072 Posts

Posted - 2012-02-16 : 14:03:15

You would need to manage the primary keys so they don't collide. Or add a column to each table indicating which server "owns" the data, to "uniqueify" it.

But for a data warehouse, I'd say neither replication nor merge are the right solution. Instead, why not a nightly ETL?

malachi151
Posting Yak Master

152 Posts

Posted - 2012-02-16 : 16:30:55

quote:
Originally posted by russell

You would need to manage the primary keys so they don't collide. Or add a column to each table indicating which server "owns" the data, to "uniqueify" it.

But for a data warehouse, I'd say neither replication nor merge are the right solution. Instead, why not a nightly ETL?

Looks like things are turning in the SSIS direction. I've only used SSIS for manually run ETL processes, running them through the BIDS, but I assume running them from SQL Agent isn't a big problem...

russell
Pyro-ma-ni-yak

5072 Posts

Posted - 2012-02-17 : 07:15:07

Not a big problem at all. In fact, that's the normal (meaning most common) way they are run.

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources