Question Performance metrics

ron · May 15, 2015

RH - OE10.2B

I have started looking at a very busy s l o w system. I am sure that most of the slowness is caused by the disk config ... which has everything in one big RAID set (0+1). And by "everything" I mean "everything" - including ai and bi.

My question is - are there any particular o/s and/or db metrics that would highlight the fact that the ai and bi configuration is dragging-down the performance? Or is it one of those things that you can really only prove by changing it?

Ron.

TheMadDBA · May 15, 2015

One big RAID 10 doesn't always mean bad. I have run several databases in excess of 2TB with thousands of users like that. It is also the preferred disk setup for some very large Oracle databases... as in hundreds of TB.

I would suggest looking at promon R&D 2 1 for issues first and/or download ProTop to show most of the important metrics.

For the OS side iostat is your friend. There are several options to show disk/adapter activity and wait/queue times.

Also you can post the promon sample (5 minutes or so during a busy period) and we can help point out any issues. The first section of the DB startup log will also show us a lot of configuration information that would be helpful to diagnose.

Rob Fitzpatrick · May 15, 2015

There are certainly worse situations to be in than having all your DB files in a RAID 10 array. So I wouldn't automatically conclude that I/O throughput is the bottleneck.

In Red Hat you can use iotop or nmon to look at disk usage. In the database, it would help to have more information about your very busy system. Is it very busy with reads, and very few writes? Is it very busy with writes? If the latter, your BI or AI configuration could be a bottleneck. If it is very read-heavy then buffer or LRU latch contention could be a bottleneck.

What are -aibufs and -bibufs set to?
What are BI block size and cluster size?
How many APWs are running?
Are you using -directio?
Are you running a BIW and an AIW?
During periods of heavy transaction activity, how often are you checkpointing?
What are your checkpoint sync times, on average?
How many BI buffers flushed at checkpoint do you have in a day?
How many empty buffer waits do you have in a day?

The answers are found in promon, proutil describe, and your startup parameters.

cj_brandt · May 15, 2015

I like to look at the Sync times for the database checkpoints - R&D 3 4
I like to look at the bi stats - R&D 2 5
I like to know how many disk reads / sec the db is doing. 5 - summary screen.

sometimes people use the bigrow test on an empty database to see how long it takes the disk to allocate some space.

create an empty db with an 8k blocksize.
set a 32mb clustersize
bigrow 4 to allocate 4 clusters and grow the bi file.
look at the db log to see how long it took to allocate the space. You want something under 20 seconds, my system takes about 7 seconds and hosts a 1,400 gb OE system just fine - also all extents on a single RAID 10 system.

ron · May 15, 2015

The figures ...

-aibufs = 600
-bibufs = 600
BI block size = 16K
cluster size = 16 MB
APWs = 6
-directio = no
AIW and BIW = yes/yes
Checkpoints about every 2 to 4 minutes
BI buffers flushed each day at checkpoint = 820,000 (roughly)
Empty buffer waits = 0

DB block size = 8K
DB size = 160 GB
-B = 128000
Total RAM = 250 GB
max users ~ 800

I have only just started looking at this server. I think 600 for aibufs and bibufs is a bit over the top - but they won't slow-down anything. -B at 128000 looks to me to be very lean - I'm thinking about bumping that up times 8.

TheMadDBA · May 15, 2015

With that much ram you could certainly bump up the -B. But if it using that much memory already you might just have some coding/design issues to deal with. Just make sure you actually have free memory before you make any drastic increases. Red Hat will grab all of the available memory for file system caching by default.

I don't see a -spin value. You need to have that set to some reasonable value (5000 would be a good starting point).

BI cluster size needs to go up to get rid of those buffers flushed.

How does the logical/physical read ratio look with the current -B setting?

ron · May 15, 2015

-spin is 50000

promon shows "Buffer Hits" to be fairly constant at about 98.2% - which is a hit ration of about 50:1

I'll do an analysis of memory usage next week - but my initial feeling is that there is quite enough memory to allow -B to go up by a factor of about 8.

Cringer · May 18, 2015

TheMadDBA said:
BI cluster size needs to go up to get rid of those buffers flushed.

And when that's done you can probably trim down your APWs as 6 seems excessive.

Cringer · May 18, 2015

ron said:
promon shows "Buffer Hits" to be fairly constant at about 98.2% - which is a hit ration of about 50:1

That's pretty bad actually. So a -B will probably help. Is your -tablerangesize parameter set adequately? If so you can look at the table read stats for the database. Any tables that are reasonably small and have very high reads and very little writes (along with all the indexes for that table) can be loaded into -B2 (assuming you're using Type II storage areas). Whilst you're at it, load the whole schema area into -B2 as well. Make sure you size -B2 to have enough space for what you're loading.
That will mean they are always cached, but will also mean you're not muddying the primary cache with them.
With that in mind I think -LRUSkips is available in 10.2B. Set that to around 50.

TomBascom · May 18, 2015

Naked 10.2b lacks -B2 and -lruskips. You need at least service pack 6. SP8 is preferred.

Cringer · May 18, 2015

Ah I'd forgotten -B2 is a 'recent' addition. Thanks Tom.

Rob Fitzpatrick · May 18, 2015

No, -B2 was part of 10.2B FCS. -lruskips was added in SP06. That's why ABP is covered in the 10.2B documentation and later features like -lruskips and idxbuild enhancements are not.

cj_brandt · May 18, 2015

Your buffers flushed seems way to high if you are hitting a checkpoint every couple minutes. Were you able to check the sync times on the checkpoints ?

adding the memory buffers should help reduce the burden on the disk.

Rob Fitzpatrick · May 18, 2015

ron said:
BI buffers flushed each day at checkpoint = 820,000 (roughly)

Was this number extrapolated from buffers flushed per second (i.e. multiplied by 86400)? Or did total buffers flushed change by that much in one day?

TomBascom · May 18, 2015

Didn't B2 come with service pack 2?

TheMadDBA · May 18, 2015

I "think" it just came with 10.2B.

I have never went into production with an unpatched Progress release so it is possible I am wrong

Rob Fitzpatrick · May 18, 2015

ABP is documented in the NeRF manual for 10.2B which means it was in FCS. And there's no mention of it in the SP02 release notes.

TomBascom · May 19, 2015

I seem to be wrong! I'll try to remember that next time.

Question Performance metrics

ron

Member

TheMadDBA

Active Member

Rob Fitzpatrick

ProgressTalk.com Sponsor

cj_brandt

Active Member

ron

Member

TheMadDBA

Active Member

ron

Member

Cringer

ProgressTalk.com Moderator

Cringer

ProgressTalk.com Moderator

TomBascom

Curmudgeon

Cringer

ProgressTalk.com Moderator

Rob Fitzpatrick

ProgressTalk.com Sponsor

cj_brandt

Active Member

Rob Fitzpatrick

ProgressTalk.com Sponsor

TomBascom

Curmudgeon

TheMadDBA

Active Member

Rob Fitzpatrick

ProgressTalk.com Sponsor

TomBascom

Curmudgeon

Similar threads