Roll forward delay on AI file

mattk

Member
Hi there,

We have a Progress 9.1D database on MS Windows 2003 platform. We have AI enabled on that database and use PHP scripts to do a roll forward onto a target database every 15 minutes. There are 3 AI files and they cycle around quite happily during our working hours.
Last night we experienced an issue on the master server which crashed the databases and rebooted the server. Upon restart the databases were fine and index checks and dbscans have shown no problems. Our AI solution was restarted however the roll forward onto the target database which normally takes 1 - 2 minutes took just short of an hour to complete.

Is there any reason why this would occur? Does Progress recognise that the database was not shut down smoothly and then perform an integrity check on roll forward?
All subsequent cycles have been of a ususal interval it was just that first one.

Any ideas would be most welcome.

Thanks and regards.
 

TomBascom

Curmudgeon
Progress always does an integrity check on roll-forward. There shouldn't have been anything special in that regards about the particular extent rolled forward.

However... if you crashed while lots of transactions were active those transactions would have been backed out on restart and that unusual level of activity may have been reflected in the ai file. That might be what you saw.

Another possible contributing factor might also be that because of the reboot your disk caches were all cleared and you therefore had slower than usual IO.

Or it may just be that there was a lot of other (external to the ai roll forward) post crash recovery activity that swamped the system and generally slowed down the roll forward.

Other notes:

9.1D? :eek: Ancient, unsupported. There are also some nasty AI related bugs. You really ought to upgrade. If you must stick with v9 9.1E service pack 4 was the very last release and it fixes lots of bugs between 9.1d and 9.1e. It is also noticeably faster for many tasks. Of course OpenEdge 10 is even better :)

It sounds like you are rolling forward against a db on the same server as the source db. Kudos for having a warm spare (or "verified backup") but you can greatly improve robustness by having that db on a discrete server that is another time zone.

3 ai extents "works" but it is a very small. 4 is usually considered the safe minimum -- 1 active, 1 full and ready to be archived, 1 empty and ready to switch to and then 1 empty spare in case an online backup or other action forces a switch when you are in the process of archiving. Personally I like 8 as a nice comfortable number.
 

mattk

Member
Hi Tom,

Thanks for the reply. The target database that we are writing to is in fact on a seperate machine on a seperate domain over 30 miles away.

On the night of this event, the server that carries the target database was untouched it was just the source that keeled over. Could the transaction backout events that you have described still have caused the delay in this situation? There would have been upwards of 220 users on the system at the time all of whom would have been kicked out. The AI file itself was only 4MEG in size so tiny in comparison to normal files of 15 to 20MEG created every 15 minutes.

We are looking at Progress 9.1E however it is down to our vendor's discretion.

Thanks,

Matt
 

TomBascom

Curmudgeon
With a such a tiny ai file being rolled forward I doubt that it had anything to do with backing out a lot of work. So scratch that theory.

And since it is a discrete and non-local server (I'm glad that I was wrong about that!) the IO load theories go down the tubes too.

So you're left with "ai roll forward added a mysterious delay" Which, unfortunately, is hardly unprecedented :( Sorry.

One thing you might want to do is to double check the bi file size on the target. Is it inexplicably large?

As for 9.1E and your vendor's discretion... If they say anything other than "yes sir! right away sir!" they are jerking you around.
 

mattk

Member
Thanks Tom,

I have checked the bi file which is at a rather large 5Gig. I dont think it has been truncated since we performed our last failover test. Would the delay seen on the first AI file not be seen on all AI files if it was the bi file interrupting things? Every subsequent AI roll forward has taken the rather more agreeable time of 5 or so minutes.

Am I right in thinking that these types of delays are not unheard of and often go without explanations? The fact that the situation is currently stable and replication is working correctly at present leaves me a lot happier. I guess our major concern was if the data in the AI file was corrupt.
 

TomBascom

Curmudgeon
For reasons that remain mysterious AI roll forward sometimes decides that it needs to scan the entire bi file. This sounds like what has happened to you.

If you're lucky it only happens once.

If you are not so lucky roll forward starts making a habit out of it.

5GB is way, way too big for a normal bi file. Something bad happened somewhere along the line. I'd truncate that sucker to prevent future unhappiness at inconvenient moments.
 

mattk

Member
Thanks for all the information Tom. I will schedule some time in to truncate the bi file, I agree 5 Gig is far too big. I expect that could be due to the fact that the database has remained untouched with nothing but roll forwards occuring for the best part of a year!

Thanks again.

Matt
 
Top