Database crash.....

Casper

ProgressTalk.com Moderator
Staff member
Hi all,

I have a strange problem, one of our customers database just crashed.
The following are the entries in the logfile which lead to that crash.
Usr 27 and 28 are batch programs which are run.

The same hting happened last night. Apparantly (I was told today) the 2 programs which are run have had a 'slight' modification.
I read in KB P88837 that error 6495 can happen beacuse of a rollback from a locktable overflow. Is this what is happening here?

The -L on this particular server is 204800 (this seemed to me already quite large for our application).

My first guess would be that the problem is somehwere in the changed programs (not such a wild guess I believe). But since I never encountered this error, maybe on e you guys can tell me if there could be something else wrong as well?

What I did so far:
Since this happened last night as well I checked today wether there is a large transacation introduced. But I checked that and that didn't seem to be the issue. (I logged string transaction in various places where there should be no transaction and all logging returned no).

Can it be that transaction doesn't always work or should I dive deeper in this somewhat epic program?

T-1 I Usr 27: (452) Login by dosbvd on batch.
T-1 I Usr 27: (12699) Database dossier Options:
T-1 I Usr 27: (453) Logout by on batch.
T-1 I Usr 27: (452) Login by dosbvd on batch.
T-1 I Usr 27: (12699) Database dossier Options:
T-1 I Usr 28: (452) Login by dosbvd on batch.
T-1 I Usr 28: (12699) Database dossier Options:
T-1 I Usr 28: (915) Lock table overflow, increase -L on server
T-1 I Usr 28: (2252) Begin transaction backout.
T-1 I Usr 28: (2253) Transaction backout completed.
T-1 I Usr 28: (453) Logout by on batch.
T-1 F Usr 27: (6495) Out of free shared memory. Use -Mxs to increas
T-1 I Usr 27: (5028) SYSTEM ERROR: Releasing regular latch. latchId:
T-1 I Usr 27: (5029) SYSTEM ERROR: Releasing multiplexed latch. latc
T-1 F Usr 27: (5026) User 27 died holding 2 shared memory locks.
T-1 I Usr 27: (439) ** Save file named core for analysis by Progres


TIA,
Casper
 

bjag

New Member
Hi, Casper

I'm still a relative newbie at Progress, but I did have this error recently (w/o the line that says "Out of free shared memory. Use -Mxs to increas..."). To remedy the situation, all I did was increase the size of the lock table. But, your lock table is already humongous! I recently bumped ours up from 16,000 to 20,000 per the "Locking table high water mark" that is displayed in the promon utility. At almost 300 GB database size and up to 800 simultaneous users, we do a lot of table locking. The fact that your customer's locking table is more than 10 times the size of ours - and still not big enough - is suspect. I'm not sure how much you delve into the customer's software, but you might want to zero-out the locking table high water mark, and run one of these suspect programs, to get a feel of how many locks it's doing and see if that number is reasonable, based on what the progam is doing. You could have some unnecessary locking going on. Good luck! (or, should I say "good LOCK") :D
 

Casper

ProgressTalk.com Moderator
Staff member
Thanks for your reply.

I forgot to post the answer here as well, but the problem turned out to be a combination between a programming error and a bug (introduced in 10.1B) from Progress.

Because of the programming error many records where created which already existed. The undo of this caused a lot of so called PURG locks in the Lock table. They are supposed to be cleaned up after the undo, but here the bug jumps in: They are not cleanud up.

It is supposed to be fixed in 10.1C and a hotfix for 10.1B SP3 (SP309). I didn't test this, I changed the program and everything works out fine now....

The lock table is this high because of import programs we have. All the files which are imported are placed in tables before further processing. This is happening in one big transaction. Plans have already been made to change this, unfortunately there are many programs and I can't seem to get this high on the priority list at the company where I work. (Well I can, but then I have to do it myself... :)).

So we are working on that.....:D

Regards,

Casper.
 
Top