Locks From Dead Users - Resolve Limbo Transactions

pinne65

Member
Progress 10.2B03 / RHEL 5.11 - yep I know both are antient. Webe upgrading to OE 11.3 in Dec but I can't wait that long.

We have a situation where a SELF service user got disconnected and left a transaction with subsequent locks. Proshut disconnect doesn't let us finish him off.

1. His associated /u/progress/dlc/bin/_progres process lingers with Parent of PID 1 (one) - I've toying with the thought of totally clobber (kill -9) that process (not 1 but the child _progres). But I'm hesitant.

I thought a better way would to be to rollback his transaction using promon's "Resolve 2PC Limbo Transactions" option.
The record locks seems to indicate the transaction is in limbo:
Usr Name Chain # Rec-id Table Lock Flags Tran State Tran ID
354 dank REC 9422 64335249 154 EXCL L Active 196001859
354 dank REC 19990 83078865 154 EXCL L Active 196001859
354 dank REC 20785 10902328 154 EXCL L Active 196001859
354 dank REC 20787 10902330 154 EXCL L Active 196001859

But

Transaction Control:
Usr Name Trans Login Time R-comm? Limbo? Crd? Coord Crd-task
354 dank10 196001859 11/10/16 09:55 no no no 0

says it's not in Limbo.


2. Looks like we don't have 2PC enabled. So I guess this is not an option. Or will the Resolve option still work without any problems?

Any suggestions on how get rid of the locks / rollback the transaction other than restarting the database are greatly appreciated.
 

Rob Fitzpatrick

ProgressTalk.com Sponsor
We have a situation where a SELF service user got disconnected
His associated /u/progress/dlc/bin/_progres process lingers with Parent of PID 1
If the user is actually disconnected from the database, and not just flagged for disconnection, then you shouldn't see it listed in promon 1 1. How did it disconnect -- programmatically? If it is flagged for disconnection it will not show up in a proshut user list but it will still appear in promon 1 1.

The concern with killing the client process (aside from using the -9 switch, which you really shouldn't if there's any option) is that if the client holds locks on any shared resources or latches then killing it will very likely cause the database to shut down abnormally. However, if promon 1 1 shows the client to be disconnected then I would expect the watchdog, or in its absence the broker, to roll back the client's transaction and release any locks such as record locks that the client previously held, once the client no longer exists in the process list. If the client is disconnected it should be safe to kill it. But don't use -9.

What do you see in the database log relative to this client? Are you running the watchdog?

Looks like we don't have 2PC enabled. So I guess this is not an option. Or will the Resolve option still work without any problems?
I've never used 2PC. My understanding is that those 2PC features in promon are only relevant if your database is already enabled for 2PC. Check promon R&D 1 11 to see if it is. Or check your structure to see if you have a transaction log area. But it doesn't sound like this is relevant here.
 

pinne65

Member
Thanks for responding!

Not sure how the use got disconnected. His ssh session was gone but his login was still listed for each of the databases encompassing the system.

[2016/11/10@09:55:41.235-0800] P-19600 T-2032285504 I ABL 354: (708) Userid is now dank. - LOGON
.
.
[2016/11/10@11:40:08.140-0800] P-26222 T-798841280 I SHUT 824: (-----) User 354 disconnect initiated - NOT SURE what happened here, he might just have closed his session
.
.
.
[2016/11/10@12:13:23.294-0800] P-5777 T--1793238592 I BROKER 0: (-----) Sending signal 12 to user 354 - TRYING to proshut
[2016/11/10@12:13:38.516-0800] P-5777 T--1793238592 I BROKER 0: (-----) Sending signal 12 to user 354

He is listed in PM 1.1
354 dank SELF/ABL -- 0 2 196001859 19600 177 0 11/10/16 09:55

We got the WDOGs running

Our maintenance window is starting in an hour - so I'll just go ahead and restart the databases.
 

Rob Fitzpatrick

ProgressTalk.com Sponsor
The watchdog won't clean up after the process if it's still in the OS process list. If the client is not in a state where its signal-handler can fire (e.g. waiting on an I/O completion, making a system call, something like that) then it may not immediately disconnect.
 

TomBascom

Curmudgeon
What James meant to say is that you should be targeting 11.6 for your upgrade rather than 11.3.

You absolutely *should* upgrade.
 

cj_brandt

Active Member
We used to have the issue you have described, after upgrading to 11.5.1 I don't think we have had one in 16 months.

Check the OS to see if the process had spawned another process and it is waiting on that to complete. If so terminating the child process can allow the parent to exit on its own.
 
Top