Happy Saturday all,
As part of our ongoing effort to improve the pitiful performance of our progress solution, we have recently had to turn our attentions to our overnight process. This set of processes perform a bunch of actions against our DB in order to get us ready for the next days trading. Over time this review process has degraded in performance to the point that if there is a particularly large number of accounts to review, or if there is a large report running against the live DB, or if there is a strong northerly wind, or if Jupiter is not in the correct alignment, then the process overruns, clashes with other timed processes and all hell breaks loose.
We have received some recommendations, and after trialling a couple of them in a test environment we have also started the process of testing them against our live environment (with backout plans in place should something go awry).
As a quick overview so you have a little background:
9:30pm
12:01am
So, we have several issues here. The foremost is our rapidly degrading overall performance, but the critical issue right now is that this process must be completed correctly before the next days trading can begin.
In order to speed up the process, we have been looking at ways to speed up the probkup portion of the schedule, in the hopes that completing this sooner will subsequently speed up the account reviews that start after midnight (as the system won't also be running an online backup while the reviews are trying to run). After reading documents, forums and receiving some advice, we decided to try two things. The first is to add the -com switch to our probkup string, this has been successfully tested in our test environment, and the plan was to test it's effects on the live environment tonight. The second thing was to change the Default Configuration of or main DB, increasing the "Blocks in DB buffer" value from it's current 300000 to a new value of 400000, increasing the quantity of memory allocated to the DB as a whole, and hopefully increasing performance as a result. This change was put in place last night, and this is what happened:
Probkup process started, and continued running correctly, this was still running at midnight.
At 12:01am our "overnight.p" script attempted to run, but hung, no processing took place (I also received no log files telling me of failure, it just appeared not to run at all).
I manually tried to run the overnight.p file, when running manually I received a windows error popup stating: "Unable to attach shared memory Global\sharemem.e.vendor.db.live.pinsys.11, error 0. (1720)".
Unable to run anything required on the DB, I manually took the DB offline, and tried to start it back up (hoping that this would clear the cached memory. I then received: "Error in startup of database localhost:20931insysLive. Message: 07:36:48 BROKER : ** This process terminated with exit code 1. (8619) (JUNMsg024)"
At this point I rebooted the entire server, and on startup I changed the "Blocks in DB Buffer" value back to 300000 and started the DB, after which the DB was correctly running again and I was able to manually run the overnight process.
If you guys are able to help, I can provide any logs you might need. Warning in advance, I am a sysadmin, not a DB admin, and unfortunately I know very little about Progress, so I apologise in advance if I have to ask dumb questions to understand what I need to do.
Any suggestions or simply letting me know that I'm an idiot and the steps I've taken on this are stupid are greatly appreciated!
As part of our ongoing effort to improve the pitiful performance of our progress solution, we have recently had to turn our attentions to our overnight process. This set of processes perform a bunch of actions against our DB in order to get us ready for the next days trading. Over time this review process has degraded in performance to the point that if there is a particularly large number of accounts to review, or if there is a large report running against the live DB, or if there is a strong northerly wind, or if Jupiter is not in the correct alignment, then the process overruns, clashes with other timed processes and all hell breaks loose.
We have received some recommendations, and after trialling a couple of them in a test environment we have also started the process of testing them against our live environment (with backout plans in place should something go awry).
As a quick overview so you have a little background:
- Server is a VMWare VM.
- Windows Server 2003
- 12gb RAM.
- 4 Cores (2 processors x2 cores each) @ 2.99ghz
- Live DB is running on 6x SAS 15k 250gb drives in RAID 1/0 on 4gb Fibrechannel backbone.
- VM has secondary drives that are carved from the standard VM SAN allocation.
9:30pm
- Probkup online backup is scheduled. The resulting DB size is around 98.5gb.
- Backup runs to a second drive on the server (one of the VMFS drives on the standard VM SAN).
- This backup usually takes until around 2:30-3am, but can take longer (recently has slowed right down and can take until 5am, we are unsure why as yet).
- This same backup on our test server takes around 3 hours and 30 minutes. This is not a true comparison as reports and other DB accesses are always hitting our live DB, but at this time of night it shouldn't be enough to cause the difference we are seeing here.
12:01am
- Morning batch processing begins.
- The command that is called (by scheduled task) to run this process is: ""C:\program files\progress\bin\prowin32.exe" -db D:\documents -db D:\database -BL 200000 10000 -d dmy -crc -Bt 2500 -TM 31 -TB 32 -b -q -v6q -o LPT1: -ininame D:\ini\server.ini -p D:\nextday.p -Wa -wpp"
- Previous days accounts are reviewed, DB changes based on the previous days trading occur.
- This process usually completes at around 3am, but recently has been taking longer, to about 4:30am or even so long that it clashes with the next days daystart.
So, we have several issues here. The foremost is our rapidly degrading overall performance, but the critical issue right now is that this process must be completed correctly before the next days trading can begin.
In order to speed up the process, we have been looking at ways to speed up the probkup portion of the schedule, in the hopes that completing this sooner will subsequently speed up the account reviews that start after midnight (as the system won't also be running an online backup while the reviews are trying to run). After reading documents, forums and receiving some advice, we decided to try two things. The first is to add the -com switch to our probkup string, this has been successfully tested in our test environment, and the plan was to test it's effects on the live environment tonight. The second thing was to change the Default Configuration of or main DB, increasing the "Blocks in DB buffer" value from it's current 300000 to a new value of 400000, increasing the quantity of memory allocated to the DB as a whole, and hopefully increasing performance as a result. This change was put in place last night, and this is what happened:
Probkup process started, and continued running correctly, this was still running at midnight.
At 12:01am our "overnight.p" script attempted to run, but hung, no processing took place (I also received no log files telling me of failure, it just appeared not to run at all).
I manually tried to run the overnight.p file, when running manually I received a windows error popup stating: "Unable to attach shared memory Global\sharemem.e.vendor.db.live.pinsys.11, error 0. (1720)".
Unable to run anything required on the DB, I manually took the DB offline, and tried to start it back up (hoping that this would clear the cached memory. I then received: "Error in startup of database localhost:20931insysLive. Message: 07:36:48 BROKER : ** This process terminated with exit code 1. (8619) (JUNMsg024)"
At this point I rebooted the entire server, and on startup I changed the "Blocks in DB Buffer" value back to 300000 and started the DB, after which the DB was correctly running again and I was able to manually run the overnight process.
If you guys are able to help, I can provide any logs you might need. Warning in advance, I am a sysadmin, not a DB admin, and unfortunately I know very little about Progress, so I apologise in advance if I have to ask dumb questions to understand what I need to do.
Any suggestions or simply letting me know that I'm an idiot and the steps I've taken on this are stupid are greatly appreciated!