Author |
Topic |
kzakoski
Starting Member
6 Posts |
Posted - 2011-07-12 : 12:52:17
|
I'm encountering a strange issue with a SQL 2005 Transactional publication with two subscriptions, one subscriber being a SQL 2005 server and the other a SQL 2008 R2 server with transactions being applied over a WAN from our London Datacenter (publisher) to our US Datacenter (subscribers).The SQL 2008 R2 server will eventually replace the SQL 2005 server, but I need to run in parallel until our UAT teams signs off on the new server.The Issue:The publication was dropped and recreated and the two subscriptions added at the same time. Snapshot ran without any issues and replication monitor showed tables being BCP'd on both servers. Process was expected to run approx. 4-5 days pushing data using test case results. However, connectivity was lost to the Distributor for approx. 90 minutes about 8 hrs into the process. Connectivity was re-established and BCP transfers in the event viewer looked like it was successfully pushing data but not showing any row count info. After running a couple of days like this, the SQL 2008 R2 subscription starting reporting 'Subscription Expired' messages, but subscription expiration is set to 14 days and the entire subscription was only a couple of days old. This expired SQL 2008 subscription appeared to be holding up the entire publication. Once it was dropped, the SQL 2005 subscription successfully completed the BCP process and now the SQL 2008 R2 subscription will need to be recreated.I'm still trying to hunt down the connectivity issue but I'm baffled as to why the 2008 subscription would expire prior to the expiration period and then eventually hold up the other subscription until it was dropped. Has anyone else encountered any issue like this or is this a SQL 2008 R2 configuration I missed? |
|
russell
Pyro-ma-ni-yak
5072 Posts |
Posted - 2011-07-12 : 16:03:07
|
What do you see in the event logs, sql server logs and distribution..msrepl_errors?Sure the expiration is 14 days?Perhaps the 2008 R2 box encountered a faatal error. |
|
|
kzakoski
Starting Member
6 Posts |
Posted - 2011-07-13 : 15:28:01
|
quote: Originally posted by russell What do you see in the event logs, sql server logs and distribution..msrepl_errors?Sure the expiration is 14 days?Perhaps the 2008 R2 box encountered a faatal error.
Not much is posted in the event logs or msrepl_errors. It lists the communication link failure around the time the connection was dropped and the distributor timing out. Connections appear to be re-established and then about 6hrs later log entries appear stating the SQL 2008 subscription has been marked inactive and must be reinitialized. Expiration was set to 336 hrs (14 days). Publication and Subscription was added via a script and not GUI so I know the settings were exactly the same for both subscriptions. |
|
|
russell
Pyro-ma-ni-yak
5072 Posts |
Posted - 2011-07-13 : 21:10:30
|
What's the history and publication retention settings?See this blog entry |
|
|
Lumbago
Norsk Yak Master
3271 Posts |
Posted - 2011-07-14 : 06:47:34
|
How big is the database / snapshot? Have you considered creating the subscriber database using a backup of the publisher instead of a snapshot? If the replicated database is large it will save you *massive* amounts of time... I have a recepie (how do you spell that anyway?) that I keep in my code repository that I can post if it's relevant for you. Unfortunately I have no clue about the stuff you're really asking for.- LumbagoMy blog-> http://thefirstsql.com/2011/07/08/how-to-find-gaps-in-identity-columns-at-the-speed-of-light/ |
|
|
russell
Pyro-ma-ni-yak
5072 Posts |
Posted - 2011-07-14 : 08:43:37
|
That's a good point. I would consider 4-5 days synch time unacceptable. |
|
|
Lumbago
Norsk Yak Master
3271 Posts |
Posted - 2011-07-14 : 08:55:11
|
Steps to set up replication using backup files instead of snapshots1. Run the Create Publication wizard on the publisher. Make sure NOT to create a snapshot2. Once the wizard has finished, right-click the new publication, choose Properties and make sure that the "Allow initialization from backup files" is set to True3. Disable the "Distribution clean up: distribution" job to make sure that no commands are deleted from MSrepl_commands before the entire backup/restore operation is complete4. Create a full database backup to disk and make sure that the file is available for the publisher until the entire replication setup is finished. 5. Transfer the backup file to the subscriber server and restore it there. If you do regular backups to disk make sure to use one that was taken *after* the publication was created. Also restore trans log backups taken after the full backup. The last backup should be restored using WITH RECOVERY, and keep in mind that a more recent backup saves you time in the syncing process when the replication is being initialized.6. Disable all triggers on the subscriber database: EXEC sp_msforeachtable 'ALTER TABLE ? DISABLE TRIGGER all'7. Disable all constraints on subscriber database: EXEC sp_msforeachtable 'ALTER TABLE ? NOCHECK CONSTRAINT ALL' 8. Run the following script ON THE PUBLISHING SERVER IN THE PUBLISHING DATABASE to enable the replication: EXEC sp_addsubscription @publication ='myPublication', --> Name of the publication @subscriber = 'myserver.myDomain.com', --> Fully qualified name or IP of subscriber server @destination_db ='MySubscriberDB', --> Name of the database you just restored (doesn't have to be the same as the publisher) @sync_type = 'initialize with backup', --> no need to change this @backupdevicetype = 'disk', --> no need to change this @backupdevicename = 'F:\backupfile.bak' --> Pointer to the last backupfile that was restored, but from the folder on the on the publishing server. If you restored trans logs also the last translog file is what you need to put here 9. Enable the "Distribution clean up: distribution" job again- LumbagoMy blog-> http://thefirstsql.com/2011/07/08/how-to-find-gaps-in-identity-columns-at-the-speed-of-light/ |
|
|
kzakoski
Starting Member
6 Posts |
Posted - 2011-07-14 : 12:15:02
|
Yes, 4-5 days is unacceptable to us as well. Prior to a recent WAN change, this process would take approx. 6-8 hours to complete and that's pushing appox. 75 gig of data across the pond. The WAN change took away our Riverbed WAN accelerator which we believe is the issue causing the huge difference in the process. We're now in the process of convincing management that they were needed and that's another 3 to 6 month battle to resolve and another $100,000 added to the bottom line.In the meantime, I'm kind of stuck with the process. The Backup restore method was also used in the past prior to installing the Riverbed accelerators. However, in this case it gets a lot more complicated because I'm creating a central global data repository where 3 regional datacenters are replicating data into a single database using different schemas to separate the data in the database.For example, a region 1 table (dbo.xyzTable) is replicated to the global repository with a new schema owner as [region1.xyzTable], region 2's copy of the same table is replicated to the repository as [region2.xyxTable] and so on. In the repository, a view is created as [dbo.xyxTable] which unions the regional tables together to give me global representation of the table. It allows our front-end application to run against this global data store without maintaining two different binaries because it thinks the database schema is the same as the regional DB.The process has been fully tested for a year, if not more, in our QA environment and out performs other models, like using separate DB's using the backup restore method. The current WAN issue definitely exposes us to a wider window for errors to occur, but that aside, I'm more concerned as to why my new SQL 2008 R2 subscriber is failing and expiring well before the expiration window and if anyone's run into similar issues. I have run across some blurbs about 2008's merge replication changing the default to 1 day, but haven't been able to confirm that for regular transactional replication and haven't found the setting if there is one.Since posting this, I've successfully sync'd my SQL 2005 instance, but continue to struggle with getting my new 2008 environment up and running using the same publication. |
|
|
kzakoski
Starting Member
6 Posts |
Posted - 2011-07-14 : 12:27:28
|
quote: Originally posted by russell What's the history and publication retention settings?See this blog entry
Transactoin retention: 0-72 hrsHistory retention: 48 hrs |
|
|
russell
Pyro-ma-ni-yak
5072 Posts |
Posted - 2011-07-14 : 12:42:35
|
Change them to exceed the longest possible outage. |
|
|
kzakoski
Starting Member
6 Posts |
Posted - 2011-07-14 : 14:32:47
|
quote: Originally posted by russell Change them to exceed the longest possible outage.
We bummped these up to 96 hours which should be plenty of time for our team to discover the outage and respond to it. Going through our timeline, it appears we may have hit the 72 hour limit but can't explain why both subscriptions didn't get flagged and expire.Since it's a production publication, I'm going to be forced to create a second publication for the 2008 subscription. Not my choice, but easier to monitor and fix should it continue to fail. |
|
|
russell
Pyro-ma-ni-yak
5072 Posts |
Posted - 2011-07-14 : 14:49:35
|
How is 2 publications easier to monitor than one?Are both subscribers over the WAN? |
|
|
kzakoski
Starting Member
6 Posts |
Posted - 2011-07-15 : 12:17:54
|
It's not, management just doesn't want anything to hose up the current production environment now that we have it back in sync should the 2008 subscription fail and cause use another 4-5 days to sync data again. As I stated before, this 2008 environment is new and is being stagged to run in parallel with our old 2005 instance so any production outage needs to be avoided. Yes, both subscriptions are over the WAN. |
|
|
|