Progress (UNIX 8.2C) Fails at Startup

matt_

New Member
I have an offline backup script that runs at night, and for the past few days only, Progress has failed to start up normally after files have been copied to a backup folder. When I am notified, I am able to log into the system and then do a /progress/startup/start-all-cui which brings the system up fine. It is the same command that is ran in the backup script.

There is no indication that Progress shuts down incorrectly (no cron log output).

This is a partial cron log output of the backup script when Progress tries to start:

00:30:07 SERVER : ** The database /p_dtr/p_dbs/custom10 is in use in multi-use
r mode. (276)
00:30:07 SERVER : ** The server terminated with exit code 2. (800)
00:30:07 BROKER 0: Multi-user session begin. (333)
00:30:07 BROKER 0: Multi-user session begin. (333)
00:30:08 BROKER 1: Started for common using TCP, pid 22842. (5644)
00:30:12 BROKER 0: Multi-user session begin. (333)
00:30:20 BROKER 1: Started for tmmtmp10 using TCP, pid 21269. (5644)
00:30:27 BROKER 1: The port 10015 is already in use. (785)
00:30:27 BROKER 1: Begin normal shutdown (2248)
00:30:27 APW 2: Started. (2518)
00:30:27 APW 4: Started. (2518)
00:30:27 BIW 6: The before image writer is already executing (2516)
00:30:27 BIW ** The server terminated with exit code 0. (800)
00:30:27 APW There is no server for database /p_dtr/p_dbs/common. (1423)
00:30:27 APW ** The server terminated with exit code 2. (800)
00:30:28 WDOG 6: The watchdog is already executing. (2517)
00:30:28 BIW There is no server for database /p_dtr/p_dbs/common. (1423)
00:30:28 BIW ** The server terminated with exit code 2. (800)
00:30:28 BROKER 1: Multi-user session end. (334)
00:30:28 WDOG ** The server terminated with exit code 0. (800)
00:30:28 WDOG There is no server for database /p_dtr/p_dbs/common. (1423)
00:30:28 WDOG ** The server terminated with exit code 2. (800)
00:30:28 APW 3: Started. (2518)
00:30:28 BIW 5: The before image writer is already executing (2516)
00:30:28 BIW ** The server terminated with exit code 0. (800)
00:30:29 APW 4: Started. (2518)
00:30:29 WDOG 5: The watchdog is already executing. (2517)
00:30:29 WDOG ** The server terminated with exit code 0. (800)
00:30:29 BIW 5: The before image writer is already executing (2516)
00:30:29 BIW ** The server terminated with exit code 0. (800)
00:30:29 SERVER : ** The database /p_dtr/p_dbs/tmm15 is in use in multi-user m
ode. (276)
00:30:29 SERVER : ** The database /p_dtr/p_dbs/tmmtmp15 is in use in multi-use
r mode. (276)
00:30:29 SERVER : ** The database /p_dtr/p_dbs/custom15 is in use in multi-use
r mode. (276)
00:30:29 WDOG 5: The watchdog is already executing. (2517)
00:30:29 SERVER : ** The server terminated with exit code 2. (800)
00:30:29 SERVER : ** The server terminated with exit code 2. (800)
00:30:29 SERVER : ** The server terminated with exit code 2. (800)
00:30:29 WDOG ** The server terminated with exit code 0. (800)
00:30:29 SERVER : ** The server terminated with exit code 2. (800)
There is no server for database /p_dtr/p_dbs/tmm10. (1423)

Does anyone know how to interpret these errors, or have any insight they can provide about this? It is frustrating because this has only happened as of recent. No system changes have occured, so I'm not sure what to think about this. I have never seen these errors when starting Progress from the command-line.
 

RealHeavyDude

Well-Known Member
What does you backup script do? The error messages ( which should also be present in the database log file ) give no clear picture to me.

It would help a lot if you would post your backup script.

Regards, RealHeavyDude.
 

matt_

New Member
I'll post the script, but here is a log from tmm10.lg

Code:
 Tue Oct 26 00:30:19 2010
00:30:19 BROKER  0: Multi-user session begin. (333)
00:30:29 BROKER  1: Started for tmm10 using TCP, pid 8013. (5644)
00:30:29 BROKER  1: The port 10015 is already in use. (785)
00:30:29 BROKER  1: Begin normal shutdown (2248)
00:30:29 APW     1: Started. (2518)
00:30:29 APW     2: Started. (2518)
00:30:29 APW     3: Started. (2518)
00:30:29 APW     4: Started. (2518)
00:30:29 BIW     5: Started. (2518)
00:30:29 BIW     6: The before image writer is already executing (2516)
00:30:30 WDOG    7: Started. (2518)
00:30:30 BROKER  1: PROGRESS Version 8.2C on DEC_UNIX. (4234)
00:30:30 BROKER  1: Server started by root on batch. (4281)
00:30:30 BROKER  1: Physical Database Name (-db): /p_dtr/p_dbs/tmm10. (4235)
00:30:30 BROKER  1: Database Type (-dt): PROGRESS. (4236)
00:30:30 BROKER  1: Direct I/O (-directio): Not Enabled. (4238)
00:30:30 BROKER  1: Number of Database Buffers (-B): 24000. (4239)
00:30:30 BROKER  1: Excess Shared Memory Size (-Mxs): 16413. (4240)
00:30:30 BROKER  1: Shared Memory Page Table Entry Optimization (-Mpte); 0. (5556)
00:30:30 BROKER  1: Current Size of Lock Table (-L): 90016. (4241)
00:30:30 BROKER  1: Hash Table Entries (-hash): 6661. (4242)
00:30:30 BROKER  1: Current Spin Lock Tries (-spin): 0. (4243)
00:30:30 BROKER  1: Crash Recovery (-i): Enabled. (4244)
00:30:30 BROKER  1: Delay of Before-Image Flush (-Mf): 3. (4245)
00:30:30 BROKER  1: Before-Image File Name (-g):  (4246)
00:30:30 BROKER  1: Before-Image File I/O (-r -R): Reliable. (4247)
00:30:30 BROKER  1: Before-Image Truncate Interval (-G): 60. (4249)
00:30:30 BROKER  1: Before-Image Cluster Size: 1024. (4250)
00:30:30 BROKER  1: Before-Image Block Size: 8192. (4251)
00:30:30 BROKER  1: Number of Before-Image Buffers (-bibufs): 5. (4252)
00:30:30 BROKER  1: After-Image File Name (-a): Not Enabled. (4253)
00:30:30 BROKER  1: After-Image Stall (-aistall): Not Enabled. (4254)
00:30:30 BROKER  1: After-Image Block Size: 8192. (4255)
00:30:30 BROKER  1: Number of After-Image Buffers (-aibufs): 1. (4256)
00:30:30 BROKER  1: Maximum Number of Clients Per Server (-Ma): 4. (4257)
00:30:30 BROKER  1: Maximum Number of Servers (-Mn): 16. (4258)
00:30:30 BROKER  1: Minimum Clients Per Server (-Mi): 1. (4259)
00:30:30 BROKER  1: Maximum Number of Users (-n): 101. (4260)
00:30:30 BROKER  1: Host Name (-H): epc4000. (4261)
00:30:30 BROKER  1: Service Name (-S): tmm10. (4262)
00:30:30 BROKER  1: Network Type (-N): TCP. (4263)
00:30:30 BROKER  1: Character Set (-cpinternal): iso8859-1. (4264)
00:30:30 BROKER  1: Stream (-cpstream): iso8859-1. (4265)
00:30:30 BROKER  1: Parameter File: /progress/startup/tmm-tmm.pf. (4282)
00:30:30 BROKER  1: Minimum Port for Auto Servers (-minport): 1025. (5648)
00:30:30 BROKER  1: Maximum Port for Auto Servers (-maxport): 2000. (5649)
00:30:30 APW     2: Stopped. (2520)
00:30:30 APW     1: Stopped. (2520)
00:30:30 APW     3: Stopped. (2520)
00:30:30 APW     4: Stopped. (2520)
00:30:30 WDOG    6: The watchdog is already executing. (2517)
00:30:31 WDOG    7: Stopped. (2520)
00:30:59 BROKER  1: Sending signal 14 to 1 connected user(s). (2261)
00:30:59 BIW     5: Stopped. (2520)
00:31:00 BROKER  1: Multi-user session end. (334)
00:31:34 BROKER  1: /p_dtr/p_dbs/tmm10.lk is missing, shutting down... (4195)
00:31:34 BROKER  1: Begin ABNORMAL shutdown code 2 (2249)
00:31:35 BROKER  1: Multi-user session end. (334)

                Tue Oct 26 00:59:56 2010
00:59:56 BROKER  0: Multi-user session begin. (333)
01:00:06 BROKER  1: Started for tmm10 using TCP, pid 25074. (5644)
01:00:06 APW     1: Started. (2518)
01:00:06 APW     2: Started. (2518)
01:00:07 BIW     3: Started. (2518)
01:00:07 WDOG    4: Started. (2518)
01:00:07 BROKER  1: PROGRESS Version 8.2C on DEC_UNIX. (4234)
01:00:07 BROKER  1: Server started by root on batch. (4281)
01:00:07 BROKER  1: Physical Database Name (-db): /p_dtr/p_dbs/tmm10. (4235)
01:00:07 BROKER  1: Database Type (-dt): PROGRESS. (4236)
01:00:07 BROKER  1: Direct I/O (-directio): Not Enabled. (4238)
01:00:07 BROKER  1: Number of Database Buffers (-B): 24000. (4239)
01:00:07 BROKER  1: Excess Shared Memory Size (-Mxs): 16413. (4240)
01:00:07 BROKER  1: Shared Memory Page Table Entry Optimization (-Mpte); 0. (5556)
01:00:07 BROKER  1: Current Size of Lock Table (-L): 90016. (4241)
01:00:07 BROKER  1: Hash Table Entries (-hash): 6661. (4242)
01:00:07 BROKER  1: Current Spin Lock Tries (-spin): 0. (4243)
01:00:07 BROKER  1: Crash Recovery (-i): Enabled. (4244)
01:00:07 BROKER  1: Delay of Before-Image Flush (-Mf): 3. (4245)
01:00:07 BROKER  1: Before-Image File Name (-g):  (4246)
01:00:07 BROKER  1: Before-Image File I/O (-r -R): Reliable. (4247)
01:00:07 BROKER  1: Before-Image Truncate Interval (-G): 60. (4249)
01:00:07 BROKER  1: Before-Image Cluster Size: 1024. (4250)
01:00:07 BROKER  1: Before-Image Block Size: 8192. (4251)
01:00:07 BROKER  1: Number of Before-Image Buffers (-bibufs): 5. (4252)
01:00:07 BROKER  1: After-Image File Name (-a): Not Enabled. (4253)
01:00:07 BROKER  1: After-Image Stall (-aistall): Not Enabled. (4254)
01:00:07 BROKER  1: After-Image Block Size: 8192. (4255)
01:00:07 BROKER  1: Number of After-Image Buffers (-aibufs): 1. (4256)
01:00:07 BROKER  1: Maximum Number of Clients Per Server (-Ma): 4. (4257)
01:00:07 BROKER  1: Maximum Number of Servers (-Mn): 16. (4258)
01:00:07 BROKER  1: Minimum Clients Per Server (-Mi): 1. (4259)
01:00:07 BROKER  1: Maximum Number of Users (-n): 101. (4260)
01:00:07 BROKER  1: Host Name (-H): epc4000. (4261)
01:00:07 BROKER  1: Service Name (-S): tmm10. (4262)
01:00:07 BROKER  1: Network Type (-N): TCP. (4263)
01:00:07 BROKER  1: Character Set (-cpinternal): iso8859-1. (4264)
01:00:07 BROKER  1: Stream (-cpstream): iso8859-1. (4265)
01:00:07 BROKER  1: Parameter File: /progress/startup/tmm-tmm.pf. (4282)
01:00:07 BROKER  1: Minimum Port for Auto Servers (-minport): 1025. (5648)
01:00:07 BROKER  1: Maximum Port for Auto Servers (-maxport): 2000. (5649)
01:00:34 Usr     5: Login by root on /dev/ttyp3. (452)
01:00:41 Usr     6: Login by root on batch. (452)
01:00:46 Usr     7: Login by root on batch. (452)
01:00:54 Usr     5: Logout by root on /dev/ttyp3. (453)
01:01:05 Usr     5: Login by hosted2 on /dev/ttyp2. (452)
01:01:38 Usr     8: Login by wilsod2 on /dev/ttyp4. (452)
01:01:43 Usr     9: Login by root on /dev/ttyp3. (452)
01:01:58 WDOG    4: Disconnecting dead user 6. (2527)
01:01:58 WDOG    4: SYSTEM ERROR: User 6 died during microtransaction. (2256)
01:01:58 APW     2: Stopped. (2520)
01:01:58 APW     1: Stopped. (2520)
01:01:58 Usr     9: Logout by root on /dev/ttyp3. (453)
01:01:59 BROKER  1: Begin ABNORMAL shutdown code 2 (2249)
01:01:59 WDOG    4: Stopped. (2520)
01:01:59 Usr     5: Logout by hosted2 on /dev/ttyp2. (453)
01:01:59 Usr     8: Logout by wilsod2 on /dev/ttyp4. (453)
01:02:00 BROKER  1: Multi-user session end. (334)

                Tue Oct 26 01:07:52 2010
01:07:52 BROKER  0: Multi-user session begin. (333)
01:07:52 BROKER  0: ** The last session was abnormally terminated. (852)
01:07:52 BROKER  0: ** Any incomplete transactions are being backed out. (459)
01:08:01 BROKER  0: ** Database recovery complete. Rerun all active transactions. (36)
01:08:01 BROKER  0: 
01:08:02 BROKER  1: Started for tmm10 using TCP, pid 32350. (5644)
01:08:02 APW     1: Started. (2518)
01:08:02 APW     2: Started. (2518)
01:08:02 BIW     3: Started. (2518)
01:08:02 WDOG    4: Started. (2518)
01:08:03 BROKER  1: PROGRESS Version 8.2C on DEC_UNIX. (4234)
01:08:03 BROKER  1: Server started by root on batch. (4281)
01:08:03 BROKER  1: Physical Database Name (-db): /p_dtr/p_dbs/tmm10. (4235)
01:08:03 BROKER  1: Database Type (-dt): PROGRESS. (4236)
01:08:03 BROKER  1: Direct I/O (-directio): Not Enabled. (4238)
01:08:03 BROKER  1: Number of Database Buffers (-B): 24000. (4239)
01:08:03 BROKER  1: Excess Shared Memory Size (-Mxs): 16413. (4240)
01:08:03 BROKER  1: Shared Memory Page Table Entry Optimization (-Mpte); 0. (5556)
01:08:03 BROKER  1: Current Size of Lock Table (-L): 90016. (4241)
01:08:03 BROKER  1: Hash Table Entries (-hash): 6661. (4242)
01:08:03 BROKER  1: Current Spin Lock Tries (-spin): 0. (4243)
01:08:03 BROKER  1: Crash Recovery (-i): Enabled. (4244)
01:08:03 BROKER  1: Delay of Before-Image Flush (-Mf): 3. (4245)
01:08:03 BROKER  1: Before-Image File Name (-g):  (4246)
01:08:03 BROKER  1: Before-Image File I/O (-r -R): Reliable. (4247)
01:08:03 BROKER  1: Before-Image Truncate Interval (-G): 60. (4249)
01:08:03 BROKER  1: Before-Image Cluster Size: 1024. (4250)
01:08:03 BROKER  1: Before-Image Block Size: 8192. (4251)
01:08:03 BROKER  1: Number of Before-Image Buffers (-bibufs): 5. (4252)
01:08:03 BROKER  1: After-Image File Name (-a): Not Enabled. (4253)
01:08:03 BROKER  1: After-Image Stall (-aistall): Not Enabled. (4254)
01:08:03 BROKER  1: After-Image Block Size: 8192. (4255)
01:08:03 BROKER  1: Number of After-Image Buffers (-aibufs): 1. (4256)
01:08:03 BROKER  1: Maximum Number of Clients Per Server (-Ma): 4. (4257)
01:08:03 BROKER  1: Maximum Number of Servers (-Mn): 16. (4258)
01:08:03 BROKER  1: Minimum Clients Per Server (-Mi): 1. (4259)
01:08:03 BROKER  1: Maximum Number of Users (-n): 101. (4260)
01:08:03 BROKER  1: Host Name (-H): epc4000. (4261)
01:08:03 BROKER  1: Service Name (-S): tmm10. (4262)
01:08:03 BROKER  1: Network Type (-N): TCP. (4263)
01:08:03 BROKER  1: Character Set (-cpinternal): iso8859-1. (4264)
01:08:03 BROKER  1: Stream (-cpstream): iso8859-1. (4265)
01:08:03 BROKER  1: Parameter File: /progress/startup/tmm-tmm.pf. (4282)
01:08:03 BROKER  1: Minimum Port for Auto Servers (-minport): 1025. (5648)
01:08:03 BROKER  1: Maximum Port for Auto Servers (-maxport): 2000. (5649)
 

matt_

New Member
This is the contents of the backup script:

Code:
#!/bin/ksh
TERM="vt100-80"
export TERM
# Global Environment Settings
PATH=$HOME:/:/etc:/bin:/sbin:/usr/sbin:/progress/startup:
export PATH ENV EDITOR FCEDIT PS1

# Set Progress Variables
DLC=/progress/dlc
PROMSGS=/progress/dlc/promsgs
PROPATH=/progress/dlc:/progress/dlc/bin:/p_dtr:/p_dtr/p_dbs:/p_dtr/p_dotr:
PROCFG=/progress/dlc/progress.cfg
PROTERMCAP=/progress/startup/mod_proterm
PROEXE=${PROEXE-$DLC/bin/_progres}
export DLC PROMSGS PROPATH PROCFG PROTERMCAP PROEXE
#
# Warn users 5 minutes
#
echo '\nWarning users 5 minutes\n'
date
w
echo '** SYSTEM SHUTDOWN in 5 Minutes\nPlease logoff!' | wall
#
# Shutdown inventory processors
#
echo '\nShutdown Inventory processors\n'
date
/tmm-users/ims/stop_processors.sh
sleep 240
#
# Warn users 1 minute
#
echo '\nWarning users 1 minutes\n'
date
w
echo '** SYSTEM SHUTDOWN in 1 Minutes\nPlease logoff!' | wall
sleep 60
#
# delete old backup
#
echo '\nDeleting Old Backup\n'
date
rm -R /backup/p_dbs
rm -R /backup/bi
rm -R /dbdump/backup/p_dbs
#
# shutdown databases
#
echo '\nShutting down databases\n'
date
/progress/startup/shutdown-all-cui
sleep 60
#
# Show users
#
echo '\nUsers left logged in\n'
date
w
#
# backup data to disk
#
echo '\nBackup data to disk\n'
date
mkdir /dbdump/backup/p_dbs
mkdir /backup/bi
cp -R /p_dtr/p_dbs/k* /backup
cp -R /p_dtr/p_dbs/n* /backup
cp -R /p_dtr/p_dbs/p* /backup
cp -R /p_dtr/p_dbs/tmm10* /backup
cp -R /p_dtr/p_dbs/tmmtmp10* /backup
cp -R /p_dtr/p_dbs/c* /dbdump/backup/p_dbs
cp -R /bi/tmm10* /backup/bi
#
# start up databases again
#
echo '\nStart databases\n'
date
/progress/startup/start-all-cui
#
# Startup inventory processors
#
echo '\nStartup Inventory processors\n'
date
/tmm-users/ims/start_processors.sh
#
# backup data to tape
#
echo '\nBacking up data to tape\n'
date
/sbin/vdump -0 -u -v -f /dev/nrmt0h /backup
/sbin/vdump -0 -u -v -f /dev/nrmt0h /dbdump
#
# 10-13-03 SLG - added next line to backup CARaS data to tape every night
#
/sbin/vdump -0uvf /dev/rmt0h /progress
#
# finished with unatt backup
#
echo '\nFinished\n'
date
 

RealHeavyDude

Well-Known Member
The backkup script uses other custom scripts ( /progress/startup/shutdown-all-cui and /progress/startup/start-all-cui ) to shut down and start the databases respectively. I'll suspect that the problem lies in one or both of those scripts.

Furthermore: You do a OS copy of the database files instead of using the Progress backup utility - is there a specific reason for that? Plus with the Progress backup utility you could back the database while they're online - no need to shut them down.

Regards, RealHeavyDude.
 

matt_

New Member
Furthermore: You do a OS copy of the database files instead of using the Progress backup utility - is there a specific reason for that? Plus with the Progress backup utility you could back the database while they're online - no need to shut them down.
I have inherited this system, so I do not know of a specific reason to do an offline backup. I have heard of online backups, but am not sure how to implement it. We don't have any documentation on this version of Progress, but I do have documentation for 8.3C... I don't know what had changed between those versions.

I am interested in doing an online backup, but I don't even know the name of the binary I am looking for, nor its syntax.

These are the custom shutdown and startup scripts that we're using:

shutdown-all-cui:

Code:
#!/bin/ksh

# Change path to insure we are able to find "proserve" at system startup.
PATH="/progress/dlc/bin:$PATH"
DLC="/progress/dlc"
export PATH DLC

_mprshut -by -b /p_dtr/p_dbs/tmm10 > k.k
_mprshut -by -b /p_dtr/p_dbs/tmm15 > k.k
_mprshut -by -b /p_dtr/p_dbs/common > k.k
_mprshut -by -b /p_dtr/p_dbs/tmmtmp10 > k.k
_mprshut -by -b /p_dtr/p_dbs/tmmtmp15 > k.k
#
_mprshut -by -b /p_dtr/p_dbs/custom10 > k.k
_mprshut -by -b /p_dtr/p_dbs/custom15 > k.k
start-all-cui:

Code:
#!/bin/ksh
# Change path to insure we are able to find "proserve" at system startup.
PATH="/progress/dlc/bin:$PATH"
DLC="/progress/dlc"
export PATH DLC
#
nohup _mprosrv -b -pf  /progress/startup/tmm-tmm.pf /p_dtr/p_dbs/tmm10 -S tmm10 &
nohup _mprosrv -b -pf  /progress/startup/tmm-com.pf /p_dtr/p_dbs/common -S common &
nohup _mprosrv -b -pf  /progress/startup/tmm-tmp.pf /p_dtr/p_dbs/tmmtmp10 -S tmmtmp10 &
nohup _mprosrv -b -pf /progress/startup/tmm-cus.pf /p_dtr/p_dbs/custom10 -S custom10 &
#
# Automatically start all APWs, BIWs and WATCHDOGS
#
# wait 5 seconds for databases to start
#
sleep 5
proapw /p_dtr/p_dbs/tmm10
proapw /p_dtr/p_dbs/tmm10
probiw /p_dtr/p_dbs/tmm10
nohup prowdog /p_dtr/p_dbs/tmm10 &
proapw /p_dtr/p_dbs/common
probiw /p_dtr/p_dbs/common
nohup prowdog /p_dtr/p_dbs/common &
proapw /p_dtr/p_dbs/tmmtmp10
probiw /p_dtr/p_dbs/tmmtmp10
nohup prowdog /p_dtr/p_dbs/tmmtmp10 &
proapw /p_dtr/p_dbs/custom10
probiw /p_dtr/p_dbs/custom10
nohup prowdog /p_dtr/p_dbs/custom10 &
#
# modified for seperate pf file for play database with smaller -B
#
nohup _mprosrv -b -pf  /progress/startup/tmm-tmm-15.pf /p_dtr/p_dbs/tmm15 -S tmm15 &
nohup _mprosrv -b -pf  /progress/startup/tmm-tmp-15.pf /p_dtr/p_dbs/tmmtmp15 -S tmmtmp15 &
nohup _mprosrv -b -pf  /progress/startup/tmm-cus-15.pf /p_dtr/p_dbs/custom15 -S custom15 &
 

TomBascom

Curmudgeon
You have a number of problems.

1) Progress 8.2c is ancient, obsolete and unsupported. Actually that's an understatement.

2) The scripts are assuming that things work without actually checking them or handling errors. This is a recipe for disaster.

3) Two serious errors show in your .lg file extracts:

00:31:34 BROKER 1: /p_dtr/p_dbs/tmm10.lk is missing, shutting down... (4195)
...

And, later, after a restart:
01:01:58 WDOG 4: SYSTEM ERROR: User 6 died during microtransaction. (2256)
...

To me, these errors suggest that other processes or scripts are doing things. The first error is what it says -- someone deleted the .lk file. This will cause the db to crash. I have no idea why the .lk file was deleted (if it is in your scripts I missed it) but it's a really, really bad idea to delete the .lk file. And an even worse one to let a script dfo it.

The second error is a typical result of careless "kill" commands. Also likely from somewhere else and also not generally a good idea to script. Especially not a "kill -9".

4) The backup process does not check to see if the db is really down before it starts. Making OS level copies of a live db is a waste of time. You cannot successfully restore them or use them.

5) The command that you want for a proper offline backup is "probkup dbname dbname.pbk". This will create a backup file called "dbname.pbk", you can name it whatever you'd like. If the db is larger than 2GB you will have to get a bit fancier to break it into "extents".

6) "probkup online dbname dbname.pbk" will make an online backup.
 

matt_

New Member
I found syntax for probkup in the 8.3C documentation. I want to do an online backup, but I'm concerned with using vdump (I'm on OSF1 V4.0 1229 alpha) and probkup to write to the tape drive. I don't have a lot of diskspace to spare, and it looks like I have quite a large amount of data. I need to backup EDI/CaRAS data as well as the databases. If probkup writes to the tape and then stops there, then I may be able to vdump after what is written, but I'm not positive that they are compatible and the DEC documentation isn't much help here.

Code:
root:/p_dtr/p_dbs => ls -salFt
total 13093403
2097152 -rwxrwx---   1 root     system   2147483648 Oct 26 11:59 custom10.db*
 250016 -rwxrwx---   1 root     system    256016384 Oct 26 11:59 tmm10.d10*
 250016 -rwxrwx---   1 root     system    256016384 Oct 26 11:59 tmm10.d11*
 250016 -rwxrwx---   1 root     system    256016384 Oct 26 11:59 tmm10.d12*
 250016 -rwxrwx---   1 root     system    256016384 Oct 26 11:59 tmm10.d13*
 250016 -rwxrwx---   1 root     system    256016384 Oct 26 11:59 tmm10.d14*
1539728 -rwxrwx---   1 root     system   1576665088 Oct 26 11:59 tmm10.d15*
 250016 -rwxrwx---   1 root     system    256016384 Oct 26 11:59 tmm10.d3*
 250016 -rwxrwx---   1 root     system    256016384 Oct 26 11:59 tmm10.d9*
   2048 -rwxrwx---   1 root     system      2097152 Oct 26 11:59 common.bi*
  92160 -rwxrwx---   1 root     system     94371840 Oct 26 11:59 custom10.bi*
  51128 -rwxrwx---   1 root     system     52349842 Oct 26 11:58 common.lg*
  51008 -rwxrwx---   1 root     system     52104488 Oct 26 11:58 custom10.lg*
  21920 -rwxrwx---   1 root     system     22323936 Oct 26 11:58 tmm10.lg*
  51088 -rwxrwx---   1 root     system     52187347 Oct 26 11:58 tmmtmp10.lg*
   3456 -rwxrwx---   1 root     system      3538944 Oct 26 11:58 common.db*
 250016 -rwxrwx---   1 root     system    256016384 Oct 26 11:57 tmm10.d6*
 250016 -rwxrwx---   1 root     system    256016384 Oct 26 11:56 tmm10.d4*
 250016 -rwxrwx---   1 root     system    256016384 Oct 26 11:56 tmm10.d7*
 250016 -rwxrwx---   1 root     system    256016384 Oct 26 11:56 tmm10.d2*
 250016 -rwxrwx---   1 root     system    256016384 Oct 26 11:55 tmm10.d1*
 250016 -rwxrwx---   1 root     system    256016384 Oct 26 11:55 tmm10.d5*
 250016 -rwxrwx---   1 root     system    256016384 Oct 26 11:55 tmm10.d8*
  65024 -rwxrwx---   1 root     system     66584576 Oct 26 11:52 tmmtmp10.bi*
   4040 -rwxrwx---   1 root     system      4131586 Oct 26 11:00 common.lic*
   4040 -rwxrwx---   1 root     system      4131109 Oct 26 11:00 custom10.lic*
   3224 -rwxrwx---   1 root     system      3299147 Oct 26 11:00 custom15.lic*
   2024 -rwxrwx---   1 root     system      2070171 Oct 26 11:00 tmm10.lic*
   1912 -rwxrwx---   1 root     system      1956930 Oct 26 11:00 tmm15.lic*
   4040 -rwxrwx---   1 root     system      4133425 Oct 26 11:00 tmmtmp10.lic*
   3744 -rwxrwx---   1 root     system      3826696 Oct 26 11:00 tmmtmp15.lic*
  77568 -rwxrwx---   1 root     system     79429632 Oct 26 10:17 tmmtmp10.db*
   6736 -rwxrwx---   1 root     system      6893901 Oct 26 01:08 custom15.lg*
 250016 -rwxrwx---   1 root     system    256016384 Oct 26 01:08 tmm15.d1*
   3952 -rwxrwx---   1 root     system      4044102 Oct 26 01:08 tmm15.lg*
      1 -r--r--r--   1 root     system           38 Oct 26 01:08 tmm15.lk
1733504 -rwxrwx---   1 root     system   1775108096 Oct 26 01:08 custom15.db*
      1 -r--r--r--   1 root     system           38 Oct 26 01:08 custom15.lk
 707584 -rwxrwx---   1 root     system    724566016 Oct 26 01:08 tmm15.d13*
  29696 -rwxrwx---   1 root     system     30408704 Oct 26 01:08 custom15.bi*
 250016 -rwxrwx---   1 root     system    256016384 Oct 26 01:08 tmm15.d10*
 250016 -rwxrwx---   1 root     system    256016384 Oct 26 01:08 tmm15.d11*
 250016 -rwxrwx---   1 root     system    256016384 Oct 26 01:08 tmm15.d12*
 250016 -rwxrwx---   1 root     system    256016384 Oct 26 01:08 tmm15.d2*
 250016 -rwxrwx---   1 root     system    256016384 Oct 26 01:08 tmm15.d3*
 250016 -rwxrwx---   1 root     system    256016384 Oct 26 01:08 tmm15.d4*
 250016 -rwxrwx---   1 root     system    256016384 Oct 26 01:08 tmm15.d5*
 250016 -rwxrwx---   1 root     system    256016384 Oct 26 01:08 tmm15.d6*
 250016 -rwxrwx---   1 root     system    256016384 Oct 26 01:08 tmm15.d7*
 250016 -rwxrwx---   1 root     system    256016384 Oct 26 01:08 tmm15.d8*
 250016 -rwxrwx---   1 root     system    256016384 Oct 26 01:08 tmm15.d9*
  18944 -rwxrwx---   1 root     system     19398656 Oct 26 01:08 tmmtmp15.bi*
   9600 -rwxrwx---   1 root     system      9830400 Oct 26 01:08 tmmtmp15.db*
   7600 -rwxrwx---   1 root     system      7776924 Oct 26 01:08 tmmtmp15.lg*
      1 -r--r--r--   1 root     system           38 Oct 26 01:08 tmmtmp15.lk
     16 -rwxrwx---   1 root     system        16384 Oct 26 01:08 tmm15.db*
      8 drwxrwxrwx   2 root     system         8192 Oct 26 01:08 ./
      1 -r--r--r--   1 root     system           38 Oct 26 01:08 tmm10.lk
     16 -rwxrwx---   1 root     system        16384 Oct 26 01:07 tmm10.db*
      1 -r--r--r--   1 root     system           38 Oct 26 01:07 custom10.lk
      1 -r--r--r--   1 root     system           38 Oct 26 01:07 tmmtmp10.lk
      1 -r--r--r--   1 root     system           38 Oct 26 01:07 common.lk
      8 drwxrwxrwx  11 radley   users          8192 Oct 20 10:05 ../
      0 -rwxrwx---   1 root     system            0 Dec 28  2006 testrights*
      1 -rwxrwx---   1 root     system          507 Feb 11  2006 tmm10.st*
      1 -rwxrwx---   1 root     system          413 Apr 11  2005 protrace.14729*
      1 -rwxrwx---   1 root     system          487 Apr 11  2005 tmm15.st*
      1 -rwxrwx---   1 root     system          413 Mar 17  2005 protrace.21858*
      1 -rwxrwx---   1 root     system          242 Mar  8  2005 nohup.out*
      1 -rwxrwx---   1 root     system           59 Mar  7  2005 k.k*
      1 -rwxrwx---   1 root     system          413 Mar  1  2005 protrace.3429*
      1 -rwxrwx---   1 root     system          413 Jan 22  2004 protrace.29054*
 

matt_

New Member
3) Two serious errors show in your .lg file extracts:

00:31:34 BROKER 1: /p_dtr/p_dbs/tmm10.lk is missing, shutting down... (4195)

The file exists, and from what I see in the backup scripts, there is no command which deletes this file. I am not sure if there is a reason why Progress is telling me it's missing...

Code:
root:/ => ls -salF /p_dtr/p_dbs/tmm10.lk
1 -r--r--r--   1 root     system        38 Oct 26 01:08 /p_dtr/p_dbs/tmm10.lk
And, later, after a restart:
01:01:58 WDOG 4: SYSTEM ERROR: User 6 died during microtransaction. (2256)
...
The second error is a typical result of careless "kill" commands. Also likely from somewhere else and also not generally a good idea to script. Especially not a "kill -9".

This was me trying to figure out why /progress/dlc/bin/_mprshut is still running, when it shouldn't be. I was killing some _mprshut processes last night before the backup, in hopes that it would complete. I am starting to think has something to do with Progress not restarting correctly, but doesn't explain why I am able to log in and start-all-cui from the command line. I actually still have _mprshut processes running right now:

Code:
root        110      1  0.0 01:08:02 ??           0:00.21 /progress/dlc/bin/_mprshut /p_dtr/p_dbs/tmmtmp10 -C biw
root        277      1  0.0 01:08:02 ??           0:02.37 /progress/dlc/bin/_mprshut /p_dtr/p_dbs/common -C apw
root       2481      1  0.0 01:07:51 ??           0:03.31 /progress/dlc/bin/_mprshut /p_dtr/p_dbs/tmm10 -C apw
root       3115      1  0.0 01:08:02 ??           0:00.87 /progress/dlc/bin/_mprshut /p_dtr/p_dbs/tmm10 -C watchdog
root      11853      1  0.0 01:08:03 ??           0:00.73 /progress/dlc/bin/_mprshut /p_dtr/p_dbs/custom10 -C watchdog
root      12198      1  0.0 01:08:02 ??           0:03.36 /progress/dlc/bin/_mprshut /p_dtr/p_dbs/tmm10 -C apw
root      17349      1  0.0 01:08:03 ??           0:00.92 /progress/dlc/bin/_mprshut /p_dtr/p_dbs/tmmtmp10 -C watchdog
root      26315      1  0.0 01:08:02 ??           0:00.17 /progress/dlc/bin/_mprshut /p_dtr/p_dbs/common -C biw
root      26892      1  0.0 01:08:03 ??           0:02.75 /progress/dlc/bin/_mprshut /p_dtr/p_dbs/custom10 -C apw
root      28727      1  0.0 01:08:02 ??           0:02.54 /progress/dlc/bin/_mprshut /p_dtr/p_dbs/tmmtmp10 -C apw
root      29746      1  0.0 01:08:03 ??           0:00.22 /progress/dlc/bin/_mprshut /p_dtr/p_dbs/custom10 -C biw
root      31709      1  0.0 01:08:02 ??           0:00.93 /progress/dlc/bin/_mprshut /p_dtr/p_dbs/common -C watchdog
root      32482      1  0.0 01:08:02 ??           0:00.97 /progress/dlc/bin/_mprshut /p_dtr/p_dbs/tmm10 -C biw

I don't know how to gracefully shut these down without adversely affecting Progress.

4) The backup process does not check to see if the db is really down before it starts. Making OS level copies of a live db is a waste of time. You cannot successfully restore them or use them.

Could not agree with you more!

6) "probkup online dbname dbname.pbk" will make an online backup.

I would like to to this. Ideally I would backup to a file, and then use OSF1 vdump tool to write the data to a tape so that I can also back up CaRAS/EDI files, but I don't really have a lot of disk space to use for that. The databases are quite large, so I'm not sure what I can do there aside from writing to a NFS share somewhere, which I would have to create. My other option I'm considering is using probkup to the tape device, and then using vdump. I don't know if this would work though, or if I would be able to use prorest against a tape that has data other than what is backed up to it using probkup.

I really appreciate your assistance! The circumstances of using this outdated software and hardware are out of my control.
 

LarryD

Active Member
1. The .lk file is created on startup of the Progress db. Performing a dir list showing it there now proves nothing, as it should be there as long as the db is up and running. What RHD was telling you is someone logged in as root rm'd while the db was up and still running. The only thing other than rm'ing it is a possiblity there was a bug in proshut in 8.2C that allowed multiple shutdowns to run and somehow one of them rm'd the lk while the other proshut was running. But that's mere conjecture.

2. _mprshut is the executable that Progress runs for the bi * ai writers as well as watchdog. It is NOT anything to do with shutting down the db. Perhaps they should have named it something else, but that's been in place for as long as I can remember. Don't confuse the naming convention of the executable with the process. They will 'gracefully' go away during the actual proshut process, but until the db is fully shut down they will look exactly as you see them as running processes.

I might suggest that you consider buying a cheap 2TB USB drive, attach it to your system, then backing up the database to a file on that USB drive. Then all you have to do is to run your tape backup getting the file from the USB drive.
 

matt_

New Member
1. The .lk file is created on startup of the Progress db. Performing a dir list showing it there now proves nothing, as it should be there as long as the db is up and running. What RHD was telling you is someone logged in as root rm'd while the db was up and still running. The only thing other than rm'ing it is a possiblity there was a bug in proshut in 8.2C that allowed multiple shutdowns to run and somehow one of them rm'd the lk while the other proshut was running. But that's mere conjecture.

I was the only root user, and it wasn't me. I'm going with your conjecture.

2. _mprshut is the executable that Progress runs for the bi * ai writers as well as watchdog. It is NOT anything to do with shutting down the db. Perhaps they should have named it something else, but that's been in place for as long as I can remember. Don't confuse the naming convention of the executable with the process. They will 'gracefully' go away during the actual proshut process, but until the db is fully shut down they will look exactly as you see them as running processes.

Thanks for the insight here. Can you tell I'm new at this?

I might suggest that you consider buying a cheap 2TB USB drive, attach it to your system, then backing up the database to a file on that USB drive. Then all you have to do is to run your tape backup getting the file from the USB drive.

I would if the server had USB ports, but it's a DEC Alpha 4000.
 

RealHeavyDude

Well-Known Member
Just another thoughts from my point of view: You're sitting on ( to speak with Tom's words ) an ancient, obsolete and unsupported thing - and, to me - it was never set up in a way that holds water. For example I am missing the after image ( IIRC in V8 you had to specify the -a parameter to tell it where the AI was ... ) and as Tom suggested, no error handling whatsoever. In practice things will go wrong for reasons one can't imagine.

Here are my suggestions:

  • The documentation on the database administration you have should be sufficient for the version of Progress you run. At that time ( 1992? ) Progress didn't change the dba utilities in one major version much.
  • You can use proutil <db-name> -C busy to check the state of the database.
  • Use proserve and proshut instead of _mprosrv and _mproshut. These are actually scripts that do some things your are missing when you use the executable directly.
  • Don't know if it was available in V8, but you can try it: There is the -com option on probkup that will compress the backup. If you look into the documentation you might find other useful options if you want to backup to tape directly - but I strongly recommend you not to.
Other than that, although it won't help you, it surprises me how companies let their admins loose and things that run "somehow" are hanging by a thread for years - until problems arise ...

Maybe it would be a good idea to have an experienced consultant onsite which will develop a disaster recovery strategy with you and help you implement the thing.

I also urge you to have a look at dbappraise.com which will also provide useful information.


Heavy Regards, RealHeavyDude.
 

matt_

New Member
Thanks everyone for your recommendations. Unfortunately, I have limited resources (ridiculous, I know) at the moment, and hope to eventually replace my backup scheme with something more solid.

As a follow-up, I have come up with a script with functionality for both online and offline backups, but the script only actually implements the online backups. Comments, cursing, and criticism welcome, as always.

Code:
#!/bin/ksh
#----------------------------------
TERM="vt100-80"
export TERM
# Global Environment Settings
PATH=$HOME:/:/etc:/bin:/sbin:/usr/sbin:/progress/startup:/progress/dlc/bin:
export PATH ENV EDITOR FCEDIT PS1

# Set Progress Variables
DLC=/progress/dlc
PROMSGS=/progress/dlc/promsgs
PROPATH=/progress/dlc:/progress/dlc/bin:/p_dtr:/p_dtr/p_dbs:/p_dtr/p_dotr:
PROCFG=/progress/dlc/progress.cfg
PROTERMCAP=/progress/startup/mod_proterm
PROEXE=${PROEXE-$DLC/bin/_progres}
export DLC PROMSGS PROPATH PROCFG PROTERMCAP PROEXE

fatal(){
    # Something went wrong:
    # print an error message if provided, then exit
    # with "error" status.
   
    if [[ "$1" != "" ]] ; then
        print Something went wrong with "$1"
    else
        print "Something went wrong (Somewhere?)"
    fi
   
    # Notify someone.
    mailx -s "Fatal: Backup script" root < /dev/null
    exit 1
}


#prepare
#if [[ $? -ne 0 ]] ; then    fatal prepare ; fi

rmoldbackups
if [[ $? -ne 0 ]] ; then    fatal rmoldbackups ; fi

#shutdowndb
#if [[ $? -ne 0 ]] ; then    fatal shutdowndb ; fi

#pause

#copyfiles
#if [[ $? -ne 0 ]] ; then    fatal copyfiles ; fi

#startdbs
#if [[ $? -ne 0 ]] ; then    fatal startdbs ; fi

#startprocs
#if [[ $? -ne 0 ]] ; then    fatal startprocs ; fi


onlinebackup
if [[ $? -ne 0 ]] ; then    fatal onlinebackup ; fi

tapebackup
if [[ $? -ne 0 ]] ; then    fatal tapebackup ; fi

finished



prepare(){
    # Warn users 5 minutes
    echo '\nWarning users 5 minutes\n'
    date
    w
    echo '** SYSTEM SHUTDOWN in 5 Minutes\nPlease logoff!' | wall

    # Shutdown inventory processors
    echo '\nShutdown Inventory processors\n'
    date
    ps -ef | grep imsu | grep -v grep | awk '{print $2}' | xargs -n1 kill
    sleep 240

    # Warn users 1 minute
    echo '\nWarning users 1 minutes\n'
    date
    w
    echo '** SYSTEM SHUTDOWN in 1 Minutes\nPlease logoff!' | wall
    sleep 60
}

rmoldbackups(){
    # delete old backup
    echo '\nDeleting Old Backup\n'
    date
    #rm -R /backup/p_dbs #what is this used for??
    rm -R /backup/bi
    rm -R /dbdump/backup/p_dbs
}

shutdowndb(){
    # shutdown databases
    echo '\nShutting down databases\n'
    date
    #/progress/startup/shutdown-all-cui #uses _mprshut

    dblist="tmm10 tmm15 common tmmtmp10 tmmtmp15 custom10 custom15"

    for dbname in $dblist ; do
        checkshut $dbname
    done

    sleep 60
}

checkshut(){
    #
    # Check if busy, and then shutdown. If we can't, then force a shutdown.
        # If that fails, then fail the backup.   
    #
    print !!! Attempting shutdown of database "$1"
    proshut /p_dtr/p_dbs/$1 -by -b     # Attempt a shutdown
    sleep 5
     proutil /p_dtr/p_dbs/$1 -C busy |grep 276 >/dev/null # Check if the db is still busy
    if [[ $? -eq 0 ]] ; then
            print !!! Database still running: "$1"
        print !!! Attempting forced shutdown...
        proshut /p_dtr/p_dbs/$1 -by -F
        sleep 5
        proutil /p_dtr/p_dbs/$1 -C busy |grep 276 >/dev/null
        if [[ $? -eq 0 ]] ; then
            print !!! Forced shutdown failed !!!
            fatal checkshut_forced_"$1"
            exit 1
        fi
    else
        print !!! Shutdown "$1" successful !!!
    fi
}

pause(){
    # Show users
    echo '\nUsers logged in\n'
    date
    w
    sleep 10
}

onlinebackup(){
    #
    # online backup to /backup partition
    #
    print "Initiate online backup..."
    date
    mkdir -p /dbdump/backup/p_dbs
    mkdir -p /backup/bi

    dblist="tmm10 common tmmtmp10 custom10 custom15"

    for dbname in $dblist ; do
        ob $dbname
    done
}

ob(){
    # need to put some in one partition, and some in another.
    case $1 in
        tmm10|tmmtmp10)
            probkup online /p_dtr/p_dbs/$1 /backup/$1.pbk -com
            if [[ $? -ne 0 ]] ; then
                print Backup of "$1" failed!
                fatal ob_"$1"
                exit 1
            fi
            ;;
        common|custom10|custom15)
            probkup online /p_dtr/p_dbs/$1 /dbdump/backup/p_dbs/$1.pbk -com
            if [[ $? -ne 0 ]] ; then
                print Backup of "$1" failed!
                fatal ob_"$1"
                exit 1
            fi
            ;;
        *)
            print "Error in ob: no argument!"
            exit 1
            ;;
    esac
}

copyfiles(){
    #
    # backup data to disk
    #
    echo '\nBackup data to disk\n'
    date
    mkdir -p /dbdump/backup/p_dbs
    mkdir -p /backup/bi
    cp -R /p_dtr/p_dbs/k* /backup # k.k file
    cp -R /p_dtr/p_dbs/n* /backup # nohup.out file
    cp -R /p_dtr/p_dbs/p* /backup # protrace.* files
    cp -R /p_dtr/p_dbs/tmm10* /backup # tmm10 database
    cp -R /p_dtr/p_dbs/tmmtmp10* /backup # tmmtmp10 database
    cp -R /p_dtr/p_dbs/c* /dbdump/backup/p_dbs # common, custom10, custom15 databases and client.mon
    cp -R /bi/tmm10* /backup/bi # tmm10.b1
}

startdbs(){
    #
    # start up databases again
    #
    echo '\nStart databases\n'
    date
    #/progress/startup/start-all-cui

    nohup proserve /p_dtr/p_dbs/tmm10 -b -pf /progress/startup/tmm-tmm.pf -S tmm10 &
    nohup proserve /p_dtr/p_dbs/common -b -pf /progress/startup/tmm-com.pf -S common &
    nohup proserve /p_dtr/p_dbs/tmmtmp10 -b -pf /progress/startup/tmm-tmp.pf -S tmmtmp10 &
    nohup proserve /p_dtr/p_dbs/custom10 -b -pf /progress/startup/tmm-cus.pf -S custom10 &

    sleep 5

    proapw /p_dtr/p_dbs/tmm10
    proapw /p_dtr/p_dbs/tmm10
    probiw /p_dtr/p_dbs/tmm10
    nohup prowdog /p_dtr/p_dbs/tmm10 &
    proapw /p_dtr/p_dbs/common
    probiw /p_dtr/p_dbs/common
    nohup prowdog /p_dtr/p_dbs/common &
    proapw /p_dtr/p_dbs/tmmtmp10
    probiw /p_dtr/p_dbs/tmmtmp10
    nohup prowdog /p_dtr/p_dbs/tmmtmp10 &
    proapw /p_dtr/p_dbs/custom10
    probiw /p_dtr/p_dbs/custom10
    nohup prowdog /p_dtr/p_dbs/custom10 &

    nohup proserve /p_dtr/p_dbs/tmm15 -b -pf /progress/startup/tmm-tmp-15.pf -S tmm15 &
    nohup proserve /p_dtr/p_dbs/tmmtmp15 -b -pf /progress/startup/tmm-tmp-15.pf -S tmmtmp15 &
    nohup proserve /p_dtr/p_dbs/custom15 -b -pf /progress/startup/tmm-cus-15.pf -S custom15 &

}

startprocs(){
    #
    # Startup inventory processors
    #
    echo '\nStartup Inventory processors\n'
    date
    /tmm-users/ims/start_processors.sh
}

tapebackup(){
    #
    # backup data to tape
    #
    echo '\nBacking up data to tape\n'
    date
   
    /sbin/vdump -0 -u -v -f /dev/nrmt0h /backup
    /sbin/vdump -0 -u -v -f /dev/nrmt0h /dbdump
    /sbin/vdump -0uvf /dev/rmt0h /progress
   
}

finished(){
    #
    # finished with unatt backup
    #
    echo '\nFinished\n'
    date
}
 

RealHeavyDude

Well-Known Member
Having a quick look at your script two things come to my mind:

  1. You should never delete previous backups - you should archive them. The only good backup is a tested one. You'll never know, maybe the previous backup's predecessor is the one which you are able to restore finally. Out of experience I can tell you that, when things go wrong, they go really wrong ...
  2. Using a forced shutdown does not make sense out of a script. It should only be used in very rare cases manually by the dba. When you shutdown the database and there are clients connected that have uncommitted transactions open, the graceful shutdown process gives them time to finish the transaction. The forced shutdown just kills the clients and, again, unless you coded the software, you'll never know what this can cause to your data integrity. Therefore you should give the database a reasonable amount of time to shut down. 5 seconds seem like a rush to me - I would go for at least 30 seconds or a minute.
Heavy Regards, RealHeavyDude.
 

rajeev.babu

Member
Dear Tom,

I saw you commented about the Watchdog error in log file

And, later, after a restart:
01:01:58 WDOG 4: SYSTEM ERROR: User 6 died during microtransaction. (2256)

I came up with an issue recently for one my customer production environment went down with this error all of sudden. There was no error for last 2 years and there was no recent changes of our startup/shutdown scripts. The environment was working normally and all of suddent production database went down leaving the below error in lg file. Could you please help me why this issue happenend.

· Proddb database went down due to watch dog killed a process during microtransaction.

[2013/08/04@03:03:59.189+0200] P-16611 T--938902608 I WDOG 14: (2256) SYSTEM ERROR: User 79 died during microtransaction.

[2013/08/04@03:03:59.189+0200] P-16611 T--938902608 I WDOG 14: (-----) Sending signal 12 to user 0

[2013/08/04@03:03:59.189+0200] P-16611 T--938902608 I WDOG 14: (-----) Sending signal 12 to user 1

Progress 10.2B and QAD 2012.1 EE. Kindly help me to find the root cause.
 

TomBascom

Curmudgeon
Watchdog didn't kill anything. The message is telling you that watchdog has noticed that the process died and that it was holding a latch when it died.

Because the program that died was holding that latch nobody else is able to access the resource protected by the latch. Watchdog cannot just release the latch - it has no idea how to undo whatever the dead process was doing because that dead process took all of its state with it when it died (the information necessary to undo or complete the task was in local memory).

Thus watchdog does the only reasonable thing and shuts down the db.

There is no information shown about what happened to the original process. It was probably killed via "kill -9" which is untrapable and leaves no (direct) trace. Only indirect evidence such as what we are discussing here.
 
Top