CPU being devoured

UKStratfordMike

New Member
My background : Progress V8 (and before)/ Unix. Some DBA knowledge. Ancient IT guy brought out of (happy) retirement to help a company with performance problems with new app. Being told all sorts of conflicting things by suppliers.

Site : New Dell Server 16 procs (4 dedicated to Progress env) , 4Gb physical memory (6Gb virtual) , SAN, Windows and VMware
Software house's package running under V9.1D


Symptoms:
1) When Progress broker kicks in, uses around 25% CPU then beds in at around 5%
2) During day – see big spikes in CPU utilisation. Have tracked these down using Promon to ODBC SQL queries and Crystal. (Incredibly high request/DB read ratios). Can fix this
3) Overnight batch application devours CPU, low IO and runs like a dog. Eats around 45%-50% of processor availability
memory – using physical acc to vmware

Supplier input
1) Vmware and infrastructure confirm all set up OK
2) End application supplier (software house) suggesting it's SAN, VMware, but not their App. They set up Progress parms

Diagnostics checked so far

APWs (X4) running 99%-100% - AOK. BIW similar. No checkpointing.

Where now?

Seem to remember things like un-indexed temp tables causing massive hike in CPU util.
Has been a suggestion that Application is riddled with triiggers but we have no visibilty of source and I'm not sure it it's relevevant

Can you help?

Any ideas what's burning cpu and how can track?
 

TomBascom

Curmudgeon
Problem #1 is that 9.1D is ancient, obsolete and unsupported. This is particularly problematic with regards to ODBC. Suppliers who deliberately deploy 9.1D on new hardware shouldn't be trusted.

As I mentioned in another post this morning if you're going to be using SQL you need to be regularly running UPDATE STATISTICS.

If you are running Crystal for reporting purposes you should also be setting the isolation level to READ UNCOMMITTED.

(You can get more details regarding all that in the SQL forum.)

Since you have APWs running we can rule out the "Workgroup performance sucks on multi-core processors" issue.

No checkpointing? That seems highly unlikely ;) What is the interval between checkpoints?

You can determine if there are a lot of triggers by looking at the data dictionary. You don't need the source for that.

You say that the vendor setup the parameters. The vendor has shown that they cannot be trusted by deploying 9.1D on a new system. So I suggest that we focus on the startup parameters. Both client and server parameters would be of interest.

I wouldn't rule out the VMWare or SAN bits -- those are often issues too. VMWare does carry a performance penalty and SANs are frequently poorly configured for databases. But you need to get the application under control before you can start tackling that.
 

UKStratfordMike

New Member
Hi Tom

Thanks for prompt response.

Will need to get the params to you tomorrow. I am 150 miles away from client site and may need to get them from supplier - we do have some R&D stats from Promon though.

Re triggers in data dictionary, sorry but my lack of knowledge immedately exposed by you. Could you tell me where to look?

Along same lines, I take it there are other tables in data dictionary. Is there a dummies guide (which i admit to being) so that we can have a look around?

Thanks again:D
 

TomBascom

Curmudgeon
From the dictionary:

Database -> Reports -> Trigger

Warning -- PROMON stats are going to be most meaningful if they are relatively short interval samples taken during a busy period. 200 hour averages are pretty much useless.
 

UKStratfordMike

New Member
Background - Monitoring vst by our expert highlighted one table with 1.8 Bn (US) requests in 1 hour. Request/actual DB read ratio around 161k:1

App vendor had been investigating, 'added some indexes to a table' something about a trigger and throughput increased 30 (yes 30) fold. Offending table had around 151k rows. (Table scan caused but no/duff indexing? Progress servicing requests from own memory areas? Hence massive request to read ratio, CPU burn, terrible throughput etc)

All now OK - thanks for help
 
Top