[Progress Communities] [Progress OpenEdge ABL] Forum Post: Retry Question - user was not disconnected properly?

Status
Not open for further replies.
D

dbeavon

Guest
Given that OE clients are allowed to start transactions and keep them open indefinitely, we are occasionally in the habit of killing long-running client transactions with proshut commands. We have even introduced some auto-killing scripts, for better or worse. This past week when I was out of the office, there was a user that we had attempted to kill (via an auto-killing script that detects long-running transactions). But the user was never fully killed, and it seemed to continue to hold onto resources, and it may have even caused the entire database to eventually crash. Here are the logs that show the crashing of the database. P26345 was a remote server, and user 1532 was a PASOE client-server connection: [2019/02/14@12:31:38.457-0500] P-26345 T-1 F SRV 11: (6495) Out of free shared memory. Use -Mxs to increase. [2019/02/14@12:31:38.457-0500] P-26345 T-1 I SRV 11: (5028) SYSTEM ERROR: Releasing regular latch. latchId: 17 [2019/02/14@12:31:38.457-0500] P-26345 T-1 I SRV 11: (5029) SYSTEM ERROR: Releasing multiplexed latch. latchId: 11916353 [2019/02/14@12:31:38.457-0500] P-26345 T-1 I SRV 11: (2522) User 1532 died holding 2 shared memory locks. [2019/02/14@12:31:38.457-0500] P-26345 T-1 I SRV 11: (2520) Stopped. [2019/02/14@12:31:38.468-0500] P-5097 T-1 I RPLS 63: (453) Logout by root on batch. [2019/02/14@12:31:38.472-0500] P-26212 T-1 I SRV 10: (2520) Stopped. [2019/02/14@12:31:38.475-0500] P-5120 T-1 I BIW 67: (2520) Stopped. [2019/02/14@12:31:38.475-0500] P-5120 T-1 I : (453) Logout by root on batch. [2019/02/14@12:31:38.480-0500] P-16375 T-1 I SRV 48: (2520) Stopped. [2019/02/14@12:31:38.483-0500] P-5123 T-1 I AIW 68: (2520) Stopped. [2019/02/14@12:31:38.483-0500] P-5123 T-1 I : (453) Logout by root on batch. [2019/02/14@12:31:38.494-0500] P-8276 T-1 I OEMAGE126: (453) Logout by DB_Agent on batch. [2019/02/14@12:31:38.510-0500] P-5111 T-1 I APW 65: (453) Logout by root on batch. [2019/02/14@12:31:38.527-0500] P-12441 T-27035 I SRV 5: (453) Logout by APP_Olm on . [2019/02/14@12:31:38.529-0500] P-21175 T-1 I APPSRV119: (453) Logout by APP_ArMgmt on batch. [2019/02/14@12:31:38.531-0500] P-24198 T-1 I SRV 37: (2520) Stopped. [2019/02/14@12:31:38.544-0500] P-16463 T-1 I SRV 55: (2520) Stopped. [2019/02/14@12:31:38.546-0500] P-5117 T-1 I APW 66: (453) Logout by root on batch. [2019/02/14@12:31:38.560-0500] P-5085 T-1 I BROKER 0: (15192) The database will complete shutdown within approximately 60 seconds. [2019/02/14@12:31:38.560-0500] P-5085 T-1 I BROKER 0: (2249) Begin ABNORMAL shutdown code 2 About fifteen minutes earlier was the point in time when our automated script had tried to kill that long-running transaction (user 1532 connected via client-server). Here is that part of the log. [2019/02/14@12:14:39.820-0500] P-9056 T-1 I SHUT 163: (452) Login by root on batch. [2019/02/14@12:14:39.826-0500] P-9056 T-1 I SHUT 163: (-----) User 1532 disconnect initiated [2019/02/14@12:14:39.826-0500] P-9056 T-1 I SHUT 163: (453) Logout by root on batch. [2019/02/14@12:14:39.826-0500] P-26345 T-1 I SRV 11: (1166) Server disconnecting user 1532. [2019/02/14@12:14:39.827-0500] P-26345 T-1 I SRV 11: (2252) Begin transaction backout. At most, the attempt to kill the user was responsible for the eventual crash. In the least, the attempt to kill the user was unsuccessful. We can see that fifteen minutes later this user was still holding memory locks. I should also mention that between the time that we attempted to kill this user, and the time the database crashed, we had lots of "lock table overflow" errors in the logs as follows: [2019/02/14@12:20:58.155-0500] P-8795 T-1 I SRV 29: (915) Lock table overflow, increase -L on server ... [2019/02/14@12:20:58.383-0500] P-5144 T-1 I ABL 70: (915) Lock table overflow, increase -L on server ... [2019/02/14@12:20:59.481-0500] P-5660 T-1 I ABL 96: (915) Lock table overflow, increase -L on server ... [2019/02/14@12:21:00.528-0500] P-5703 T-1 I ABL 98: (915) Lock table overflow, increase -L on server I'm guessing that something bad happened during the attempt to disconnect the user from the database. I am not familiar with some of these messages in the logs but they seem pretty scary, eg: "Out of free shared memory. Use -Mxs to increase". and "SYSTEM ERROR: Releasing regular latch. latchId: 17 " I'm trying to find KB articles to figure out what actually went wrong. If anyone can interpret that from the logs above, I would be happy to know. We are fairly new to the use of "remote servers". We currently do not use the -Mxs parameter to start the database. I had hoped that disconnecting client-server users was going to be safer than disconnecting shared-memory users! Should we avoid auto-killing the long-running transactions when they are associated with a remote client-server user?

Continue reading...
 
Status
Not open for further replies.
Top