Question CPU Spike 100%

After restarting the server and app servers, our Windows 2008 Server will run for about 10-15 minutes until the _proapsv.exe processes usually two take up all 100% of the CPU.

We do not have any services or processes running out of the ordinary and this database has been running solid for many years now.

Just starting Monday a couple app servers have been hogging up all the CPU usage. I'm not sure if the database may need to be defragmented, it's only 10GB.

OpenEdge 10.1B HF33

Thanks in advance.
 

Cringer

ProgressTalk.com Moderator
Staff member
What's the AppServer doing at the time? Has anything changed in the AppServer code recently? I'd be looking for handles getting instantiated and not cleaned up.
 

Rob Fitzpatrick

ProgressTalk.com Sponsor
our Windows 2008 Server will run for about 10-15 minutes until the _proapsv.exe processes usually two take up all 100% of the CPU.
Do you mean the AppServer agents take all CPU cycles? Or each of the two Appserver agents takes 100% of one core? If you have more than two cores in this machine I assume you mean the latter, as the AVM is single-threaded. Do the AppServers and their code reside on the DB server?

You want to know what the agents are doing, i.e. what code they're running. With 10.1B your diagnostic options are limited. You can't use Client Statement Caching, as that was added in 10.1C. I don't think you have the ability to run proGetStack either, though if you do that will give you a stack trace of the client so you can see what code it's running when it's spiked.

You can at least gather CRUD stats, assuming -tablerangesize/-indexrangesize are set appropriately for your schema, either using your own code or ProTop. That will tell you which tables and indexes you're hitting, and when, which may give you a clue about what code is running. Using -logentrytypes is also helpful, though be cautious in production as some of them can give you very verbose logs. Although I don't remember which of those are available in 10.1B. Check your version of the Debugging and Troubleshooting manual for details.

It is also possible to enable profiling with AppServers, but again that can result in a lot of data in a hurry, especially in a case of CPU spikes. At the OS level, keep an eye on perfmon as that will tell you which resources are being used most.

As Cringer said, look for changes. New code, different client startup parameters, propath changes, missing r-code and doing session compiles are a few common client problems. Also, double-check the DB and make sure it wasn't restarted with different params, and didn't have any params changed online (e.g. via promon).
 
Thanks for the great feedback.

The triggers seems to be our front-end software Vantage when trying to complete a job or end an operation. This is only happening with a select group of jobs.

I ran a IDXBUILD and don't think we need to run a dump and load. Not sure what else we can do and why these jobs are the only jobs giving us trouble?

Thanks again.
 

Cringer

ProgressTalk.com Moderator
Staff member
IDXBUILD and D&L are not going to help you in the slightest.
Have you looked at what has changed in these jobs? Are they instantiating handles without clearing them up. One simple catch all would be to insert an unnamed widget pool declaration into the top of each .p that is run on the AppServer. Any handle based objects that are then created in that .p will then be scoped to the widget pool and will be cleared up when the .p goes out of scope. If that clears it up then find the culprit code and deal with the handles it creates.
Also check for client side code that instantiates AppServer sessions and doesn't clear them down.
 

Rob Fitzpatrick

ProgressTalk.com Sponsor
IDXBUILD and D&L are not going to help you in the slightest.
I think we may need more context to be able to make the case so definitively. D&L and idxbuild are by no means silver bullets for all performance problems. But they are tools in the toolbox and for some jobs they are the right tool.

For example, if the DB was created under 9.x or some other old version and hasn't been dumped and loaded in a decade, I'd say it's appropriate to do it again. Or if a very large table was recently purged to 10% of its former size (logically), then its index blocks will be very sparsely populated. That could lead to suboptimal caching performance until the indexes are compacted or rebuilt. But detailed information is needed to assess such things. And if, as techjohnny said, the indexes have been rebuilt, it won't hurt performance unless the index packing factor was set too high and record updates are now causing large numbers of block splits.

All that said, after re-reading the original post this has the feel of an application or configuration issue rather than a back-end issue.

Not sure what else we can do and why these jobs are the only jobs giving us trouble?
The triggers seems to be our front-end software Vantage when trying to complete a job or end an operation.
Without knowing what the jobs do, particularly their completion, it is hard to say where to look. For example, does completion involve network I/O? Maybe there is high latency or packet loss. Does it involve writing a flat file? Maybe there is a physical volume with bad sectors, or a flaky network file system mount, or a caching disk controller whose battery is no longer recharging so the controller turned off caching and that killed throughput.

If source code access is available then this type of investigation will be a lot more straightforward. If it is a third-party application then they should be engaged for support.

Another option is to upgrade your licenses to a more modern release. At least 10.2B08, or preferably a later 11.x release. The former shouldn't even require a recompile or DB change. The latter requires both but obviously positions you much better for the future and has many more years of future support available. Either one offers a lot of diagnostic capabilities that are unfortunately not available in 10.1B.
 

TheMadDBA

Active Member
You need to get to the bottom of what changed before you start flipping switches. Did you install a new version of your application? Any patches? Etc.

10.1B was the first release for _UserTableStat and _UserIndexStat so that can help a lot. Download and install ProTop. It will make it much easier for you to get a little insight into where these appserver processes are spending their time.

Also check the appserver logs for any error messages.
 

TomBascom

Curmudgeon
Are the users doing something new?

Sometimes "the change" isn't code or the system but rather a new behavior on the part of the users.
 

cj_brandt

Active Member
As suggested, the _UserTableStat and _UserIndexStat would be a good place to start looking. You will probably have to set up the DB so more than the first 50 tables will be monitored and that will require a restart.

If these can identify a table that is read millions of times then it would be a place to start looking.

Also changing the logging level of the app server would allow you to see what code is being run when the CPU spike occurs. I don't remember if 10.1B supports dynamic changes in logging, I'll guess not.

Service Pack 3 of 10.1B offered some nice enhancements - I don't know what the latest SP of 10.1B was, but 03 helped us significantly until we went on to 10.1c.
 
Fixed. The developers of the software had a file WRITE.r that was causing our mfgsys.bi file to grow very large. Turns out it was triggered when certain users went to complete an operation for a job.

The software we are using is Vantage 8.03

Thanks for all the feedback! You definitely were right that rebuilding the index or dump and load would not have made any difference.

Regards.
 
Top