| Author |
Topic |
|
G²
Starting Member
6 Posts |
Posted - 2003-04-14 : 12:42:22
|
| For the past 6-8 months we have been experiencing a problem with an application that our company developed for internal use that utilizes MS SQL2000, MS Host Integration Server and an AS400 (DB2). We have been working with Microsoft , but to no avail. There does not seem to be anyone at MS with expertise in all areas of this problem. SQL Specs - Clustered SQL 2000 Servers - Both nodes are Compaq Prolient ML370 - 4GB of memory - dual 1.2 GHZ Xeon Processors - Windows Advanced Server SP3 - SQL 2000 SP3 Problem - We are executing a VB6 program every ½ hour on the half hour that fires off a stored procedure on SQL Server2000. This stored procedure is primarily cursor driven and as the main cursor loops through the available records entered by our data entry department - it will call subsequent stored procedures that also use cursors to loop through records. Ultimately we are gathering values and storing them in host variables used to build insert statements across platforms to an AS400 DB2 environment. In short, our data entry unit now writes to SQL Server only and we backfill our legacy AS400 through this automated program. In an attempt to give this automated backfill process a degree of commitment control, we employ an HIS Server and the provider used in the VB6 program is SQLOLEDB.1. We use linked servers configured on SQL Server to the AS400 and the OPENQUERY function when writing data to the AS400.Our success rate is intermittent and confusing. As we filter through our business logic, we can implicitly rollback when necessary, but we seem to be plagued by rollbacks that are not called implicitly. When the program reattempts these records, as it will, they can be committed at some later point in the day with no change to the data. No change to the data types of the host variables, no change in length, no changes whatsoever. Since implementing the above scenario we also experiencing a situation where the SQL server stops functioning and we have to fail to the other node in the cluster. We don't know if this is related to the situation above, but simply failing to the other node makes the server functional again. Here is the error in the log we receive:CImageHelper::Init () Failed load of imagehlp.dll - Error Code 0x7e***Symbol load failed - Proceed to dump without any symbols*Stack Dump being sent to Z:\Program Files\Microsoft SQL Server\MSSQL\log\SQL00950.dmp* ********************************************************************************* BEGIN STACK DUMP:* 02/21/03 16:00:20 spid 70** Exception Address = 069C960B* Exception Code = c0000005 EXCEPTION_ACCESS_VIOLATION* Access Violation occurred writing address FFFFFFF8* Input Buffer 92 bytes -* EXEC USP_WINS_BACKFILL 'PAC6271557','01/27/03'* ** MODULE BASE END SIZE** Edi: 03C7F928: 069B1544 03C4D008 4F540000 00000003 00000000 00000000 * Esi: 03C470F0: 03C7F928 49000000 204F544E 48544150 4757442E 30323150 * Eax: FFFFFFFF: * Ebx: 00000000: * Ecx: 03C470F0: 03C7F928 49000000 204F544E 48544150 4757442E 30323150 * Edx: 00000000: * Eip: 069C960B: * Ebp: 095E3918: 069B1298 00000000 09524F80 00000001 00000000 03CC0112 * SegCs: 0000001B: * EFlags: 00010246: 00650074 0033006D 003B0032 003A0043 0057005C 004E0049 * Esp: 051F7AC0: 00000000 00000025 00000005 03C470F0 00450050 004E004E * SegSs: 00000023: * ******************************************************************************** -------------------------------------------------------------------------------* Short Stack Dump* 069C960B Module(UNKNOWN+00000000)* ----------------------------------- This happens with no real pattern, it can happen once a week or once a day. The reason we possibly relate this problem to the one above is because on days with heavy rollbacks and developers performing heterogeneous ad hoc queries using the same linked severs we are bringing down SQL Server We seem to get the largest grace period when we in fact reboot one of the servers before seeing this condition again. If anyone has any ideas as to what may be causing this, it would be greatly appreciated. TIAThanks,Greg |
|
|
tkizer
Almighty SQL Goddess
38200 Posts |
Posted - 2003-04-14 : 13:01:41
|
| Have you checked Microsoft's knowledge base (support.microsoft.com)? That's the first place that you will want to start to figure out why you are getting this access violation. It is very hard for us to help on this because access violations occur for many, many, many different reasons. You will want to search the knowledge base on "c0000005 EXCEPTION_ACCESS_VIOLATION". You will find lots of articles come back from this search. You will have to weed through the results to figure out which ones apply to your environment. If you still can't figure it out, post the stored procedure code and we'll have a look at it. Tara |
 |
|
|
G²
Starting Member
6 Posts |
Posted - 2003-04-14 : 13:17:18
|
| I have been working with Microsoft for at least six months. I have been through several layers of their support. This issue has been passed from the SQL team to the HIS team and back again. Now I am currently working with a Solutions Integration Engineer. I cannot say for sure that the Access Violation has anything to do with the rollbacks. They are probably two separate issues because we can go a couple of weeks without receiving an Access Violation. The rollbacks, on the other hand, happen fairly regularly. I will try to post the SP to see if that sheds any light on the problem. Thanks,Greg |
 |
|
|
robvolk
Most Valuable Yak
15732 Posts |
Posted - 2003-04-14 : 22:59:36
|
| I assume that during the six months MS was helping you that someone mentioned linked servers and remote procedure calls. These would let you perform batch operations without cursors, and you can probably eliminate the HIS layer as well. The transactions would be controlled entirely by SQL Server and the Distributed Transaction Coordinator, which is less likely to exert too much control over the transaction(s). I personally have never been fully satisfied with trying to have an external process/server handle transaction control for a database server. I've seen and heard too many complaints about how it either doesn't work at all or performs much more poorly than having the database do it all. I'm not sure if you have any alternatives in your situation, but if the current setup is not working and you've done all that you can, it may be time to move onto another method or setup.It goes without saying that cursors are notoriously poor performers, nested cursors are even worse, and putting the whole thing inside a transaction, especially one controlled outside the SQL Server, is pretty much guaranteed to grind to a halt. I wouldn't be surprised if the access violation error is due to excessive resources consumed by the cursor(s) and the transaction manager(s) that are running during the process.Another idea to consider is exporting the SQL Server data out to a file of some kind, and running whatever file import utilities you have available for your DB2 machine. There's no point in applying business logic on a row-by-row basis at the moment of import into DB2, when the data can be filtered, manipulated, and validated on SQL Server beforehand, then exported and imported into DB2 (indeed, if people are doing data entry to SQL Server, all of this validation should be done at that stage so that it is perfectly valid in SQL Server alone)HTH |
 |
|
|
G²
Starting Member
6 Posts |
Posted - 2003-04-15 : 11:44:19
|
| Rob,Thanks for your feedback regarding our post. You should know that we opted for using the HIS Server because Microsoft told us that was our only option in order to gain commitment control in the DB2 environment. Are you aware of another mechanism for rolling back transactions on the AS400 in the event our logic fails on SQL Server, without using the HIS Server? The nested cursor environment was the result of the relationship between all the related tables in this process. There are three primary tables that the cursor loops are predicated on. Table A has a one:many relationship to table B, and table B has a one:many relationship to Table C. We opted for cursors because we can’t process the data in batch, we need to read each individual record in Table C for its parent in Table B, and do this for each record in Table B that has a parent in Table A. We also have stand alone processes that use linked servers and HIS and do not involve cursors that are also subject these errors. |
 |
|
|
X002548
Not Just a Number
15586 Posts |
Posted - 2003-04-15 : 12:13:19
|
Again, why can't this be 3 SQL Queries set up as views, and then unload from them and copy to DB2 and have DB2 do a load?You mention:quote: As we filter through our business logic, we can implicitly rollback when necessary, but we seem to be plagued by rollbacks that are not called implicitly.
If you know a condition to cause a rollback scenario (because of business logic?) Why attempt the insert at all.Can't you create a result set that you know won't cause a failure?Yeah, love to see the sproc.Brett8-) |
 |
|
|
G²
Starting Member
6 Posts |
Posted - 2003-04-15 : 14:32:22
|
| There are actually several stored procedures involved with this process. I’m not sure that I can post them all to this forum. However, if anyone emails me at [url]goodge@penn-america.com[/url], I would be happy to send them for your review. If this process could be done a better way, we would really love to hear it. I know that there are a lot of extremely knowledgeable people on this board and being able to get someone else’s perspective would very helpful.Thanks,Greg |
 |
|
|
X002548
Not Just a Number
15586 Posts |
Posted - 2003-04-15 : 16:07:35
|
Greg, I've gotta admit, I've never seen this before. You check @@TRANCOUNT on the way in to the sproc because you don't know if you've got a transaction hanging out there? My first suggestion would be to make your transactions AIR-TIGHT.I try to keep for transaction short (where possible) When you do the following below (from your code), do you know what you're rolling back? BEGIN TRANSACTION UPDATE pdecpage SET transstep = 'ROLLBACK', winstrans_trdatetime = GETDATE() WHERE policy = @policy AND effdte = @effdte COMMIT TRANSACTION --SET NOCOUNT ON SET XACT_ABORT ON IF @@trancount <> 0 ROLLBACK BEGIN TRANSACTION I'll keep looking, but theres a lot of stuff here. Are all of the sprocs meant to synch up DB2? Also, theres a lot of business logic involved in this layer. Seems to me you're doing more than synching.Mayb if you just add a batch window datetime column to each table and just bcp unload since the last time the window was shut?Brett8-)<edit> to fix display </edit>Edited by - robvolk on 04/15/2003 21:42:38 |
 |
|
|
G²
Starting Member
6 Posts |
Posted - 2003-04-16 : 11:01:32
|
| Brett,We’re fairly confident that the transactions are tight. In looking at various texts and web references when first writing this, it was suggested that we double check the trans count before continuing with a new transaction. Apparently it can get hosed up, and since it governs the rollback call, we simply wanted to be certain.That piece of code you’re looking at is the only place we check it, right after we establish a default transstep field value of ‘Rollback’ in the event there is a failure in the middle of a run. Thanks,Greg |
 |
|
|
X002548
Not Just a Number
15586 Posts |
Posted - 2003-04-16 : 14:16:11
|
| So you have no nested transaction...simple I like (not always possible).What about the row added timestamp column and a batch window? Would that work?Brett8-) |
 |
|
|
G²
Starting Member
6 Posts |
Posted - 2003-05-08 : 17:07:38
|
| I apologize for taking so long to respond. It looks like we have finally fixed this issue. Because we are using Host Integration server, there is a HIS piece on each of the two SQL servers in our cluster. I applied HISSP1 on our HIS server a while ago, but never applied it to the SQL servers. Once I applied the SP and a couple of post SP1 fixes. All of our issues disappeared. Thanks for everyone's input.Greg |
 |
|
|
|