Periodic freeze of database

us055410 · Oct 30, 2013

and tools that are able to recognize them.

What -
°°ÿ9.1E 003634125 4GL Development SystemThu Apr 13 13:39:05 2006
°°ÿ9.1E 003634126 Query/RESULTSThu Apr 13 13:39:05 2006
°°ÿ9.1E 003634127 Enterprise DBThu Apr 13 13:39:05 2006
°°ÿ9.1E 003634128 WebSpeed Trans SvrThu Apr 13 13:39:05 2006
°°ÿ9.1E 003634129 Progress AppServerThu Apr 13 13:39:05 2006
°°ÿ9.1E 003651785 Fathom Repl PlusThu Apr 27 16:12:10 2006

Mfg/pro EB2 SP1

SunOS mfgdbp01 5.10 Generic_147440-25 sun4u sparc SUNW,SPARC-Enterprise

Problem -
Periodically / various times of the day / every few weeks / a database will go into a freeze state. Typically users complain that the screen is blank or the transaction they were in won't move or other users cannot log into the database. We user Powerterm to connect to the server. I have looked at the server / database with iostat /sar / promon / protop / etc... and cannot detect the exact cause.

Question -
Are there other tools available to help me analyze the problem? Are there any other recommendations you have or could suggest? TIA

TomBascom · Oct 30, 2013

"Periodic freezes" are the hallmark of long checkpoints.

You have an "Enterprise" license -- are you running one or two APWs, a BIW and an AIW process?

What is your bi cluster size set to?

Is replication plus actually configured and running? Or is that just a license that you own but do not use? If it is running -- what is -pica set to?

us055410 · Oct 30, 2013

TomBascom said:
"Periodic freezes" are the hallmark of long checkpoints.

You have an "Enterprise" license -- are you running one or two APWs, a BIW and an AIW process?

6 7489 Fri Sep 27 21:51:51 2013 apw no
7 7478 Fri Sep 27 21:51:51 2013 DB_Agent batch no
8 7521 Fri Sep 27 21:51:51 2013 apw no
9 7529 Fri Sep 27 21:51:51 2013 apw no
10 7537 Fri Sep 27 21:51:51 2013 biw no
11 7570 Fri Sep 27 21:51:51 2013 wdog no

What is your bi cluster size set to?
TRUNC_PARAM='-C truncate bi -bi 16384 -biblocksize 8'
DB01='/pro/ep/db/smfgeb2 -directio -spin 40000 -B 270000 -bibufs 25 -L 320000 -H mfgdbp01 -S 4232 -Mf 0 -groupdelay 1 -n '$USERS

Is replication plus actually configured and running? Or is that just a license that you own but do not use? If it is running -- what is -pica set to?

database replication is not currently running.

TomBascom · Oct 30, 2013

What about the rest of the questions?

us055410 · Oct 30, 2013

TomBascom said:
What about the rest of the questions?

There's an expand button below what you can see. Near the bottom. There lies the rest...

TomBascom · Oct 30, 2013

Answers aren't usually embedded in the quoted text. I missed them. Sorry about that.

You have none of the helper processes running.

This is probably your biggest problem.

You need to start an APW, a BIW and an AIW (you *do* have after-imaging enabled don't you?)

WDOG would be nice but isn't critical.

The cluster size of 16384 is probably a good start if it is actually set to that.

-directio is probably not helping.

-spin 40,000 is probably high -- but it also probably isn't relevant to the issue at hand.

Likewise -B 270,000 is probably ok.

You should not be setting -Mf 0 nor -groupdelay.

-bibufs could be higher. Try 200.

You should also have -aibufs at the same value -- I'm beginning to think after-imaging might not be running. Is your data not worth protecting?

us055410 · Oct 30, 2013

TomBascom said:
What about the rest of the questions?

Could you not see all these?
"Periodic freezes" are the hallmark of long checkpoints.
You have an "Enterprise" license -- are you running one or two APWs, a BIW and an AIW process?
6 7489 Fri Sep 27 21:51:51 2013 apw no
7 7478 Fri Sep 27 21:51:51 2013 DB_Agent batch no
8 7521 Fri Sep 27 21:51:51 2013 apw no
9 7529 Fri Sep 27 21:51:51 2013 apw no
10 7537 Fri Sep 27 21:51:51 2013 biw no
11 7570 Fri Sep 27 21:51:51 2013 wdog no
What is your bi cluster size set to?
TRUNC_PARAM='-C truncate bi -bi 16384 -biblocksize 8'
DB01='/pro/ep/db/smfgeb2 -directio -spin 40000 -B 270000 -bibufs 25 -L 320000 -H mfgdbp01 -S 4232 -Mf 0 -groupdelay 1 -n '$USERS
Is replication plus actually configured and running? Or is that just a license that you own but do not use? If it is running -- what is -pica set to?

us055410 · Oct 30, 2013

us055410 said:
Could you not see all these?
"Periodic freezes" are the hallmark of long checkpoints.
You have an "Enterprise" license -- are you running one or two APWs, a BIW and an AIW process?
6 7489 Fri Sep 27 21:51:51 2013 apw no
7 7478 Fri Sep 27 21:51:51 2013 DB_Agent batch no
8 7521 Fri Sep 27 21:51:51 2013 apw no
9 7529 Fri Sep 27 21:51:51 2013 apw no
10 7537 Fri Sep 27 21:51:51 2013 biw no
11 7570 Fri Sep 27 21:51:51 2013 wdog no
What is your bi cluster size set to?
TRUNC_PARAM='-C truncate bi -bi 16384 -biblocksize 8'
DB01='/pro/ep/db/smfgeb2 -directio -spin 40000 -B 270000 -bibufs 25 -L 320000 -H mfgdbp01 -S 4232 -Mf 0 -groupdelay 1 -n '$USERS
Is replication plus actually configured and running? Or is that just a license that you own but do not use? If it is running -- what is -pica set to?

We have these AI files active -
-rw-r--r-- 1 root root 409600000 Oct 30 16:45 smfgeb2.a12
-rw-r--r-- 1 root root 409600000 Oct 30 16:35 smfgeb2.a11
-rw-r--r-- 1 root root 409600000 Oct 30 16:05 smfgeb2.a10
-rw-r--r-- 1 root root 409600000 Oct 30 15:35 smfgeb2.a9
-rw-r--r-- 1 root root 409600000 Oct 30 15:05 smfgeb2.a8
-rw-r--r-- 1 root root 409600000 Oct 30 14:35 smfgeb2.a7
-rw-r--r-- 1 root root 409600000 Oct 30 14:05 smfgeb2.a6
-rw-r--r-- 1 root root 409600000 Oct 30 13:35 smfgeb2.a5
-rw-r--r-- 1 root root 409600000 Oct 30 13:05 smfgeb2.a4
-rw-r--r-- 1 root root 409600000 Oct 30 12:35 smfgeb2.a3
-rw-r--r-- 1 root root 409600000 Oct 30 12:05 smfgeb2.a2
-rw-r--r-- 1 root root 409600000 Oct 30 11:35 smfgeb2.a1
-rw-r--r-- 1 root root 409600000 Oct 30 11:05 smfgeb2.a16
-rw-r--r-- 1 root root 409600000 Oct 30 10:35 smfgeb2.a15
-rw-r--r-- 1 root root 409600000 Oct 30 10:05 smfgeb2.a14
-rw-r--r-- 1 root root 409600000 Oct 30 09:35 smfgeb2.a13

TomBascom · Oct 30, 2013

On the bright side after-imaging is running.

On the not so bright side you don't have an AIW running and -aibufs isn't being set.

You need to start an APW, a BIW and an AIW. Execute the following:

probiw /pro/ep/db/smfgeb2
proaiw /pro/ep/db/smfgeb2
proapw /pro/ep/db/smfgeb2

(You can do this online -- no need to restart the db...)

And add those commands to your startup script.

Increase -bibufs and -aibufs the next chance you get to adjust parameters.

Rob Fitzpatrick · Oct 30, 2013

Based on the provided proshut output I believe the background processes are already running, apart from the AIW. Though we haven't established whether Fathom is in use (and the value of -pica if it is).

TomBascom · Oct 30, 2013

Where do you see proshut output?

Rob Fitzpatrick · Oct 30, 2013

In the third post. It's easy to miss as it isn't within CODE tags. Here it is cleaned up:

Code:

6    7489   Fri Sep 27 21:51:51 2013    apw              no
7    7478   Fri Sep 27 21:51:51 2013    DB_Agent batch   no
8    7521   Fri Sep 27 21:51:51 2013    apw              no
9    7529   Fri Sep 27 21:51:51 2013    apw              no
10   7537   Fri Sep 27 21:51:51 2013    biw              no
11   7570   Fri Sep 27 21:51:51 2013    wdog             no

us055410 · Oct 31, 2013

I received this from one of my colleagues -

You may want to share more information.
The file system is ZFS
Questions:
· Should the file system block size and the bi block size match?
· Should other types of Progress files use another ZFS block size?
· Are there other Progress file types that would benefit from specific file system block sizes?

I know Oracle databases so here is an Oracle example:
· .dbf files – typically 8k or 16k – the ZFS block size matches the data file size
· Archive log files – sequential writes – the ZFS block size is set to the maximum ZFS write size (128k)
· Control and redo files - sequential reads & writes – the ZFS block size is set to the maximum ZFS write size (128k)
· Essentially, every file type and usage characteristic has a separate file system with separate settings.

us055410 · Oct 31, 2013

Rob Fitzpatrick said:
Based on the provided proshut output I believe the background processes are already running, apart from the AIW. Though we haven't established whether Fathom is in use (and the value of -pica if it is).

Fathom Replication is not. Fathom Management(which doesn't really impact this) is.

us055410 · Oct 31, 2013

Rob Fitzpatrick said:

In the third post. It's easy to miss as it isn't within CODE tags. Here it is cleaned up:

Code:

6    7489   Fri Sep 27 21:51:51 2013    apw              no
7    7478   Fri Sep 27 21:51:51 2013    DB_Agent batch   no
8    7521   Fri Sep 27 21:51:51 2013    apw              no
9    7529   Fri Sep 27 21:51:51 2013    apw              no
10   7537   Fri Sep 27 21:51:51 2013    biw              no
11   7570   Fri Sep 27 21:51:51 2013    wdog             no

Will the AIW have an impact on database performance?

Rob Fitzpatrick · Oct 31, 2013

If you're using after imaging and you have an Enterprise database license you should always run the AIW. If we have seen all of your DB startup parameters then you're using the defaults for -aibufs and -bibufs for whatever your version of Progress is. You should increase both.

Rob Fitzpatrick · Oct 31, 2013

us055410 said:
I received this from one of my colleagues -

You may want to share more information.
The file system is ZFS
Questions:
· Should the file system block size and the bi block size match?
· Should other types of Progress files use another ZFS block size?
· Are there other Progress file types that would benefit from specific file system block sizes?

I know Oracle databases so here is an Oracle example:
· .dbf files – typically 8k or 16k – the ZFS block size matches the data file size
· Archive log files – sequential writes – the ZFS block size is set to the maximum ZFS write size (128k)
· Control and redo files - sequential reads & writes – the ZFS block size is set to the maximum ZFS write size (128k)
· Essentially, every file type and usage characteristic has a separate file system with separate settings.

Your file system block size should never be larger than your database block size. If it is, performance will be poor as you will be doing far too much physical I/O for the amount of logical I/O required by your application.

Rob Fitzpatrick · Oct 31, 2013

Do you know who chose your database startup parameters? Can you ask them why they specify -Mf and -groupdelay?

TomBascom · Oct 31, 2013

I see it now -- with it unlabelled and embedded like that (and missing all of the headings and OE10 data columns) I took it for some sort of report output with "no" meaning that the respective helpers were not running.

I wouldn't be so sure that Fathom Management has no impact -- it builds a huge trending database and is usually something of a pig from a performance standpoint.

An AIW will have a positive impact. Lack of an AIW will have a negative impact. Ditto BIW. But they seem to be running.

Regarding: ZFS -- I have never seen excellent performance with ZFS. I have seen really bad performance. Mostly it is just "meh". Lots of rope to hang yourself with. What sort of disk subsystem is it on top of? What kind of RAID is being used?

The "bi block size" can be larger than the fs block size without a major problem (same for the ai block size). But the db block size should be the same.

I know Oracle databases so here is an Oracle example:
· .dbf files – typically 8k or 16k – the ZFS block size matches the data file size

This would be "data extents" in Progress speak. The *.d# files.

· Archive log files – sequential writes – the ZFS block size is set to the maximum ZFS write size (128k)
· Control and redo files - sequential reads & writes – the ZFS block size is set to the maximum ZFS write size (128k)

bi files, *.b# = undo log, ai files, *.a# = redo log

Setting these things up as you do for Oracle would be the correct thing to do.

With regards to parameter settings -- showing the contents of config files and variables shows what you think they are. Showing the 50 to 75 lines of the .lg file that immediately follows the most recent db restart (search for message "(333)") will show what they actually are. They values in use don't always match expectations. Post the startup message within CODE tags and it will be readable. The output of "proutil dbname -C describe" is also useful.

The best thing that you could do for performance would be to upgrade to OpenEdge 10.2B service pack 7 or better and convert your db to properly configured type 2 storage areas. v9 is ancient, obsolete and unsupported...

Periodic freeze of database

us055410

New Member

TomBascom

Curmudgeon

us055410

New Member

TomBascom

Curmudgeon

us055410

New Member

TomBascom

Curmudgeon

us055410

New Member

us055410

New Member

TomBascom

Curmudgeon

Rob Fitzpatrick

ProgressTalk.com Sponsor

TomBascom

Curmudgeon

Rob Fitzpatrick

ProgressTalk.com Sponsor

us055410

New Member

us055410

New Member

us055410

New Member

Rob Fitzpatrick

ProgressTalk.com Sponsor

Rob Fitzpatrick

ProgressTalk.com Sponsor

Rob Fitzpatrick

ProgressTalk.com Sponsor

TomBascom

Curmudgeon

Similar threads