_mprosrv -m1 high CPU usage

troup1998

New Member
On our Windows Server 2012R2 64bit w/85GB memory the CPU has gotten pegged. we have Openedge 11.7.3 64bit. The _mprosrv -m1 processes end up eating about 12% until we get to 100%. Seems to happen on just 1 db(blocksize 8192 84GB in size). We adjusted the -B to 40GB thinking it would utilized more memory but it seems to have caused the CPU to peg.

I've reduced the -B to 500,000 * 8192 = 4GB and put the -B2 to 200000 = 16GB but we have nothing allocated to B2 yet.

this happened when we had the spin lock set to 25000. I just set spin lock 50,000 on all 5 databases.

We run webspeed on this server too with shared memory connections. The _progres processes for webspeed don't peg the CPU.
Webspeed Parameter file
-db F:\Progdb\OrangeDB -ld sao9 -Mm 16384 -U ws48 -P ws48
-db F:\Progdb\TablesDB -U ws48 -P ws48 -Mm 16384
-db F:\Progdb\ssdb -Mm 16384
-db F:\Progdb\icjis -Mm 16384
-db F:\progdb\DP -Mm 16384
-h 10
-yy 1930 # set offset date
-yr4def # use 4 digit year default to output a four digit year for EXPORT,MESSAGE,PUT UNFORMATTED
-inp 16384 # max no. of chars per stmt (default=4096; max=32000)
-tok 1600 # maximum number of tokens per statement (default=1024)
-T C:\Temp # Directory for temporary user files
-c 1000 # Index cursors
-s 256 # stack size (vms set to 30); default = 40;
-rereadnolock
-lkwtmo 30
-Bt 16384
-l 32768
-rand 2

the remote processes connecting are generally report programs and app servers. They use the following parameter file.
-db OrangeDB -S OrangeDB_Prod -H wsprod1.sao9.org -ld sao9 -Mm 16384
-db TablesDB -S TablesDB_Prod -H wsprod1.sao9.org -U report -P report -Mm 16384
-db ssdb -S ssdb_Prod -H wsprod1.sao9.org -Mm 16384
-db icjis -S icjis_Prod -H wsprod1.sao9.org -Mm 16384
-db DP -S DP_Prod -H wsprod1.sao9.org -Mm 16384
-Bp 16 # Private Buffers added 12/13/2019
-b # Initiate a batch session
-yy 1930 # set offset date
-h 10 # maximum number of databases
-inp 16384 # max no. of chars per stmt (default=4096; max=32000)
-tok 1600 # maximum number of tokens per statement (default=1024)
-T C:\Temp # Directory for temporary user files
-c 1000 # Index cursors
-s 256 # stack size (vms set to 30); default = 40;
-rereadnolock
-Bt 16384
-l 16384
-rand 2

wsprod1Processes.JPG
wsprod1CPU.JPG
Any ideas why the CPU gets pegged? Why would adjusting -B up cause the CPU to get pegged?
 

Attachments

  • wsprod1Processes.JPG
    wsprod1Processes.JPG
    255.1 KB · Views: 1

troup1998

New Member
The database configuration follows:
Progress OpenEdge Release 11.7 build 1592 SP03 on WINNT .
Server started by proman on batch.
Started using pid: 4968.
Large database file access has been enabled.
Encryption enabled: 0
Multi-tenancy enabled: 0
Table Partitioning enabled: 0
Authentication Gateway enabled: 0
LRU mechanism enabled.
Parameter File: Not Enabled.
Created shared memory with segment_id: 1
Before-Image Cluster Size: 524288.
Before-Image Block Size: 8192.
After-image Management Archival Directory List (-aiarcdir): Not Enabled
Create After-image Management Archival Directory(s) (-aiarcdircreate): Not Enabled
After-image Management Archival Interval (-aiarcinterval): -1
Number of After-Image Buffers (-aibufs): 64
After-Image Stall (-aistall): Not Enabled
Starting index number for statistics range (-baseindex): 1
Starting table number for statistics range (-basetable): 1
Starting index number per user for statistics range (-baseuserindex): 1
Starting table number per user for statistics range (-baseusertable): 1
Number of Before-Image Buffers (-bibufs): 64
BI File Threshold Stall (-bistall): Disabled.
BI File Threshold size (-bithold): 0.0 Bytes
Database Blocksize (-blocksize): 8192
BIW writer delay (-bwdelay): 0
Allowed index cursors (-c): 1004
CDC cache size (-cdcsize): 200k bytes
SSL Certificate Store Path (-certstorepath): Not Enabled
Character Set (-cpinternal): ISO8859-1
Physical Database Name (-db): F:\Progdb\OrangeDB
Diagnostic directory (-diagDir): Not Enabled
Diagnostic events value (-diagEvent): LockTable:0,BiThold:0,SysErr:0
Diagnostic event level (-diagEvtLevel): 0
Diagnostic field separator (-diagFS): ' '
Diagnostic data format (-diagFormat): csv
Diagnostic pause length (-diagPause): 0
Diagnostic prefix value (-diagPrefix): diagEvent_
Direct I/O (-directio): Enabled
Database Type (-dt): PROGRESS
Encryption cache size (-ecsize): 1000
Group delay (-groupdelay): 10
Hash Table Entries (-hash): 740951
Buffer pool hash table latch percentage (-hashLatchFactor): 10
Crash Recovery (-i): Enabled
Number of indexes included in statistics collection (-indexrangesize): 50
TCP/IP Version (-ipver): IPV4
SSL Key Alias Name (-keyalias): Not Enabled
Database log file archive directory (-lgArchiveDir): Not Enabled
Archive database log file before truncation (-lgArchiveEnable): Not Enabled
Frequency of database log file truncation (-lgTruncateFrequency): -1
The maximum size the database log file can grow before truncation (-lgTruncateSize): 0MB
Time of day for the database log file truncation (-lgTruncateTime): 00:00
Limit .lg file payload (-limitLgPayload): Not Enabled
Lock table hash table size (-lkhash): 1237
Original Lock Release Algorithm (-lkrela): Not Enabled
Number of LRU force skips (-lruskips): 100
Number of LRU2 force skips (-lru2skips): 0
Maximum Area Number (-maxArea): 32000
Size of JTA transaction table (-maxxids): 100
Maximum Port for Auto Servers (-maxport): 3300
Minimum Port for Auto Servers (-minport): 3000
Multi-tenancy partition cache size (-mtpmsize): 1024
Use muxlatches (-mux): 1
Maximum Number of Users (-n): 251
Minimum time to nap at first -spin exhaustion (-nap): 10
Maximum time to nap at -spin exhaustion (-napmax): 250
SSL No Host Verify (-nohostverify): Not Enabled
Disable LRU mechanism (-nolru): Not Enabled
SSL No Session Cache (-nosessioncache): Not Enabled
SSL No Session Reuse (-nosessionreuse): Not Enabled
Login Governor (-nGovernor): 0 of 251
Number of checkpoint statistics to record (-numCheckpointStats): 32
Omit .lg file messages (-omitLgMsgs):
Storage object cache size (-omsize): 1024
Database Service Manager - IPC Queue Size (-pica): 64.0 KBytes
Shared memory segments locked (-pinshm): Not Enabled
Use pollset mechanism for client/server (-pollset): Not Enabled
Delay first prefetch message (-prefetchDelay): Not Enabled
Prefetch message fill percentage (-prefetchFactor): 0
Minimum records in prefetch msg (-prefetchNumRecs): 16
Suspension queue poll priority (-prefetchPriority): 0
APW queue scan cycle time in milliseconds (-pwqdelay): 100
APW minimum queue length before write (-pwqmin): 1
APW buffer scan cycle time in seconds (-pwsdelay): 1
APW maximum number of buffers to scan per cycle (-pwscan): 4166
APW maximum number of buffers to write per cycle (-pwwmax): 25
Before-Image File I/O (-r -R): Reliable
Record free chain search depth factor (-recspacesearchdepth): 5
Security cache size (-secsize): 512
Number of Semaphore Sets (-semsets): 3
SSL Session Timeout (-sessiontimeout): 0
Maximum Shared Memory Segment Size (-shmsegsize): 32768 Mb
Current Spin Lock Tries (-spin): 50000
SSL Encryption for TCP/IP connections (-ssl): Not Enabled
STS Debug Logging Level default (-stslogginglevel): 0
Number of tables included in statistics collection (-tablerangesize): 50
User Notification Time (-usernotifytime): 0 seconds
Number of indexes per user included in statistics collection (-userindexrangesize): 50
Number of tables per user included in statistics collection (-usertablerangesize): 50
Area block consistency check (-AreaCheck): Not Enabled
Number of Database Buffers (-B): 500000
Number of Alternate Database Buffers (-B2): 2000000
Maximum private buffers per user (-Bpmax): 64
Database block consistency check (-DbCheck): Not Enabled
Database Service Manager - Service(s) to start (-DBService): Not Enabled
Enhanced Read-Only mode (-ERO): Not Enabled
Force Access (-F): Not Enabled
Before-Image Truncate Interval (-G): 0
Host Name (-H): WSPROD1
Index block consistency check (-IndexCheck): Not Enabled
Current Size of Lock Table (-L): 8192
Lock Governor (-LGovernor): 0%
Maximum Number of Clients Per Server (-Ma): 8
Memory overwrite check (-MemCheck): Not Enabled
Delay of Before-Image Flush (-Mf): 3
Minimum Clients Per Server (-Mi): 1
Message Buffer Size (-Mm): 16384
Maximum Number of Servers (-Mn): 146
Servers per Protocol (-Mp): 0
Maximum Servers Per Broker (-Mpb): 100
Excess Shared Memory Size (-Mxs): 132
Network Type (-N): TCP
Server network message wait time (-Nmsgwait): 2
Pending client connection timeout (-PendConnTimeout): 0
Service Name (-S): 2560
Broker server group support (-ServerType): ABL
SQL Server Max Open Cursors (-SQLCursors): 0
SQL Server Stack Size (-SQLStack): 0
SQL Server Statement Cache Size (-SQLStmtCache): 0
Size [1K byte units] of SQL Server temp table buffer (-SQLTempStoreBuff): 0
Size [1K byte units] of SQL Server temp table disk storage (-SQLTempStoreDisk): 0
Size [1K byte units] of SQL Server temp table data page (-SQLTempStorePageSize): 0
Authorized data truncation (-SQLTruncateTooLarge): OFF
SQL Autonomous Schema Update (-SQLWidthUpdate): OFF
Record block consistency check (-TableCheck): Not Enabled
TXE Lock retry limit (-TXERetryLimit): 0
TXE Commit lock skip limit (-TXESkipLimit): 10000
Database connections are not allowed at this time.
Database connections have been enabled.
Login by proman on batch.
Started for 2501 using TCP IPV4 address 0.0.0.0, pid 7552.
This is an additional broker for this protocol.
This broker supports SQL server groups only.
 

TomBascom

Curmudgeon
The most likely cause is that you have a "rapid reader" banging away at some table. Probably a smallish table that completely fits in memory. (This anti-pattern is incredibly common in the wild.) Less likely, but still very possible, is a large table with a very poor index choice. (These show up as high CPU utilization when -B is large enough to eliminate most IO operations.)

I suggest that you download ProTop (it is free) at ProTop. The realtime character client works just fine on Windows and will probably pretty quickly show you what is driving your activity. The web based portal will show trending (but is limited to just a day or so in the free version).

To get meaningful information about table and index activity you need -tablerangesize and -indexrangesize, you have the default values of 50 so you won't be able to see what tables are in use until you change those and restart.

A casual review of your posted startup parameters is worrisome. You do not appear to be running after-imaging (so your data recoverability is at risk), you are using -directio and, in spite of being on Windows and having increased -Mm, you are not running with any of the -prefetch* parameters set to non-default values. You would clearly benefit from some experienced help.
 

Rob Fitzpatrick

ProgressTalk.com Sponsor
Other thoughts:
  • You should review your client/server and user-related parameter values; they don't make sense.
    • -n (total connections allowed): 251
    • primary broker remote connections allowed (-Mpb * -Ma): 800
    • secondary (SQL) broker connections allowed: configuration unknown
    • # of shared memory connections: unknown but non-zero
    • primary broker's servers (-Mpb): 100
    • primary broker's server ports (-maxport - -minport + 1): 301
  • Given a DB size of 84GB, there must be some reasonable amount of transaction activity. The default BI cluster size of 512 KB and block size of 8 KB are probably inappropriate. I like to start at 16 MB and 16 KB respectively.
  • Your primary buffer pool has gone from 40 GB to 4 GB. You have 16 GB allocated to alternate buffer pool but no assignments. You would likely benefit from increasing -B. It is unclear whether you would benefit from using -B2 and assigning objects without seeing your CRUD stats.
  • What was the thinking behind this?
    "-Bp 16 # Private Buffers added 12/13/2019 "
  • If you have (I am guessing) a remote 4GL user count somewhere in the low 200s, 100 4GL servers is more than you need. The are consuming RAM and CPU cycles.
 

troup1998

New Member
Thank you Tom & Rob. We'll look into protop. AI is on our to do list.
the -B2 is hopefully going to be used soon, plus we'll increase -B again, we just need to solve this "rapid reader/index issue" then we can hopefully utilize memory better.
 

Rob Fitzpatrick

ProgressTalk.com Sponsor
AI is on our to do list.
It shouldn't be.

"On our to do list" generally means "someday, but not today". I understand you may have change-management processes to follow, but they should have a provision for quickly making emergency fixes. Enabling AI should be classified as an emergency fix. You don't even need downtime. Literally all you need to do is add extents online and run an online backup.

If you wonder why I am insistent about this, let me share an anecdote. A while back, a client of mine wasn't using AI. On a site visit I explained why they urgently needed it and asked them to let me enable it. They declined. Their reason was that they had an upcoming DR test, in two months' time. All the procedure documentation was already written and they felt this change would be too much for them to accommodate. They had failed their DR test the past three years and they "needed a win with their board", so they said. They would implement AI after the DR test. One month later they had a problem which, long story short, caused them to cold-boot their DB server and that corrupted the BI file of one of the production databases. If they had had AI files to recover with, there would have been minimal data loss. As it was, they had no AI and so lost several hours of transactions, together with several hours of downtime while we reconstructed what we could from the dead database. One of my post-incident remediations was to enable AI.

Think about that. Their priority was to achieve "success" on a limited and entirely synthetic test (a simulation of a recovery with up to 24 hours of data loss!), as opposed to achieving actual reduction of RPO and protection of the business.

Please, don't wait to enable AI.
 

troup1998

New Member
Please, don't wait to enable AI.
I'm going to do that now. actually, I had the AI structure built long ago in our development and didn't enable it. Can you enable it online? I don't see an option to do so with rfutil. Looks like I'll need to bring down the DB to enable.
 
Top