Question Performance metrics

ron

Member
RH - OE10.2B

I have started looking at a very busy s l o w system. I am sure that most of the slowness is caused by the disk config ... which has everything in one big RAID set (0+1). And by "everything" I mean "everything" - including ai and bi.

My question is - are there any particular o/s and/or db metrics that would highlight the fact that the ai and bi configuration is dragging-down the performance? Or is it one of those things that you can really only prove by changing it?

Ron.
 

TheMadDBA

Active Member
One big RAID 10 doesn't always mean bad. I have run several databases in excess of 2TB with thousands of users like that. It is also the preferred disk setup for some very large Oracle databases... as in hundreds of TB.

I would suggest looking at promon R&D 2 1 for issues first and/or download ProTop to show most of the important metrics.

For the OS side iostat is your friend. There are several options to show disk/adapter activity and wait/queue times.

Also you can post the promon sample (5 minutes or so during a busy period) and we can help point out any issues. The first section of the DB startup log will also show us a lot of configuration information that would be helpful to diagnose.
 

Rob Fitzpatrick

ProgressTalk.com Sponsor
There are certainly worse situations to be in than having all your DB files in a RAID 10 array. So I wouldn't automatically conclude that I/O throughput is the bottleneck.

In Red Hat you can use iotop or nmon to look at disk usage. In the database, it would help to have more information about your very busy system. Is it very busy with reads, and very few writes? Is it very busy with writes? If the latter, your BI or AI configuration could be a bottleneck. If it is very read-heavy then buffer or LRU latch contention could be a bottleneck.

  • What are -aibufs and -bibufs set to?
  • What are BI block size and cluster size?
  • How many APWs are running?
  • Are you using -directio?
  • Are you running a BIW and an AIW?
  • During periods of heavy transaction activity, how often are you checkpointing?
  • What are your checkpoint sync times, on average?
  • How many BI buffers flushed at checkpoint do you have in a day?
  • How many empty buffer waits do you have in a day?
The answers are found in promon, proutil describe, and your startup parameters.
 

cj_brandt

Active Member
I like to look at the Sync times for the database checkpoints - R&D 3 4
I like to look at the bi stats - R&D 2 5
I like to know how many disk reads / sec the db is doing. 5 - summary screen.

sometimes people use the bigrow test on an empty database to see how long it takes the disk to allocate some space.

create an empty db with an 8k blocksize.
set a 32mb clustersize
bigrow 4 to allocate 4 clusters and grow the bi file.
look at the db log to see how long it took to allocate the space. You want something under 20 seconds, my system takes about 7 seconds and hosts a 1,400 gb OE system just fine - also all extents on a single RAID 10 system.
 

ron

Member
The figures ...

-aibufs = 600
-bibufs = 600
BI block size = 16K
cluster size = 16 MB
APWs = 6
-directio = no
AIW and BIW = yes/yes
Checkpoints about every 2 to 4 minutes
BI buffers flushed each day at checkpoint = 820,000 (roughly)
Empty buffer waits = 0

DB block size = 8K
DB size = 160 GB
-B = 128000
Total RAM = 250 GB
max users ~ 800

I have only just started looking at this server. I think 600 for aibufs and bibufs is a bit over the top - but they won't slow-down anything. -B at 128000 looks to me to be very lean - I'm thinking about bumping that up times 8.
 

TheMadDBA

Active Member
With that much ram you could certainly bump up the -B. But if it using that much memory already you might just have some coding/design issues to deal with. Just make sure you actually have free memory before you make any drastic increases. Red Hat will grab all of the available memory for file system caching by default.

I don't see a -spin value. You need to have that set to some reasonable value (5000 would be a good starting point).

BI cluster size needs to go up to get rid of those buffers flushed.

How does the logical/physical read ratio look with the current -B setting?
 

ron

Member
-spin is 50000

promon shows "Buffer Hits" to be fairly constant at about 98.2% - which is a hit ration of about 50:1

I'll do an analysis of memory usage next week - but my initial feeling is that there is quite enough memory to allow -B to go up by a factor of about 8.
 

Cringer

ProgressTalk.com Moderator
Staff member
promon shows "Buffer Hits" to be fairly constant at about 98.2% - which is a hit ration of about 50:1

That's pretty bad actually. So a -B will probably help. Is your -tablerangesize parameter set adequately? If so you can look at the table read stats for the database. Any tables that are reasonably small and have very high reads and very little writes (along with all the indexes for that table) can be loaded into -B2 (assuming you're using Type II storage areas). Whilst you're at it, load the whole schema area into -B2 as well. Make sure you size -B2 to have enough space for what you're loading.
That will mean they are always cached, but will also mean you're not muddying the primary cache with them.
With that in mind I think -LRUSkips is available in 10.2B. Set that to around 50.
 

Rob Fitzpatrick

ProgressTalk.com Sponsor
No, -B2 was part of 10.2B FCS. -lruskips was added in SP06. That's why ABP is covered in the 10.2B documentation and later features like -lruskips and idxbuild enhancements are not.
 

cj_brandt

Active Member
Your buffers flushed seems way to high if you are hitting a checkpoint every couple minutes. Were you able to check the sync times on the checkpoints ?

adding the memory buffers should help reduce the burden on the disk.
 

TheMadDBA

Active Member
I "think" it just came with 10.2B.

I have never went into production with an unpatched Progress release so it is possible I am wrong :)
 

Rob Fitzpatrick

ProgressTalk.com Sponsor
ABP is documented in the NeRF manual for 10.2B which means it was in FCS. And there's no mention of it in the SP02 release notes.
 
Top