Webspeed Broker Interference??

whablow

New Member
I have some very strange behaviour with out live environment, and seem to have stumbled across a possible indication of something.....

I'll start by explaing the setup we have. We have 2 live servers, one is hosting he DB and the other is hosting the Code and the WebSpeed Brokers for our interntal Call Centre. Further to that we have a 3rd server which sits in the DMZ, facing out to the world, and this has a copy of the Code and another set of WebSpeed Brokers.....connecting to the same DB as our Call Centre.

Now the code in places isnt the best, and there are certain screens/functions that are heavy. However nothing has changed recently in the code, so although its a bit cack, it has been for ages! Anyway, the problem is every so often we get one or many locked Brokers. In the logs we see the old classic "WTA: npp_send() failed while sending terminate message!" error, the agent is then permanenlty at "Busy" and in the DB we have a load of records that that user was accessing locked permanently.

This has began escalating over the last few days and has hit critical point a few times where all the agents have locked and the system has pretty much gone down.

Now, this is where it gets weird. Today during testing etc and bringing it all back up, I left the external server on the DMZ turned off. Well the server was on, but the brokers on that server were off..........and there were no problems. I left it like this for over an hour. I turned it back on, and within the next half hour there were 3 or 4 instances of locked agents again. But they are locked on the internal brokers, not the external one. So when the external brokers are turned on, the internal ones start to lock up!

Does any of this madness ring a bell with anyone??

Many Thanks

Ste
 

Stefan

Well-Known Member
I am going to make some assumptions based on your madness:

1. database is OpenEdge
2. share-locks are ruling the database
3. users on the external web server use different applications than the internal web server
4. one or more of the external web server apps is / are littering share-locks around causing the rest to go down since they are trying to lock the locked records and end up timing out
5. the external web server address is being used more often / by a new user who does things differently

-> monitor the _lock table
-> check the apps used most frequently on the external web server for exclusive-locks going out of scope resulting in them being downgraded to share-locks
 
Top