Question BI growth

Surya Singh Ghale · Oct 14, 2014

Hi All,

We are performing bulk loading and the BI is getting increased in uncontrolled way.
Any solution to restrict BI growth?

Thanks and Regards,
Surya

Cringer · Oct 14, 2014

What method are you using for the load? You must be doing it online. The BI is a not of all running transactions etc so that in the event of a crash they can be undone systematically to ensure that the DB integrity is maintained. You don't want to prevent this from happening in my opinion. On the other hand, if you load the data offline (probably more sensible anyway IMO) then the BI won't grow IIRC.
Alternatively examine your load code to make sure it is only performing small transactions rather than a big one that stays outstanding for the life of the load. That way the BI should not grow out of control.

RealHeavyDude · Oct 14, 2014

The BI growth is determined by two factors

Transaction load
Long running transactions.

In almost all cases I have seen the second factor - long running transactions - were the root cause of the growth. Long running transactions which prevent BI clusters from being re-used are alway cause by the software uitilizing transaction scopes which are much larger than they need be. Most of the times I've seen this it was not on purpose but by accident.

The only thing you can do is to fix the transaction scope so that the transactions are as small as possible to allow BI clusters from being re-used much faster.

You should inspect the software you are using to do the load and compile it using the listing option to see the transaction scopes.

There is nothing you can do on the database to prevent such transactions. All you can do to prevent the database from shutting down abnormally is to use the -bistall paramater ( possibly in conjunction with the -bithold parameter ).

Heavy Regards, RealHeavyDude.

Rob Fitzpatrick · Oct 14, 2014

If it is possible to take a maintenance window where other users cannot access the database then you should do so. Then you can restart the database with startup parameters and BI configuration tuned for a bulk load (and for preventing access by business users). If the load is very large and time-consuming I use the maximum BI block size and cluster size during the load to improve performance (although that does nothing to change volume of BI notes).

If it is a very large and time-consuming load I may also start the database with no-integrity (-i) which also improves performance and does decrease the volume of BI notes. Warning: do not use -i unless you clearly understand its implications and the very limited cases where it is acceptable for use. This parameter prevents crash recovery, so it is only acceptable for use when there is no normal forward processing (business activity) happening in your database. You must have a good backup of the database taken just before the database was started. If you crash, your only means of recovery is restoring that backup. Remember to shut down the DB again and back up again immediately after the load finishes. Never use this parameter in production.

As RHD said, your problem may well be a combination of two factors: an old active transaction (not necessarily related to your bulk load session) that has pinned the earliest BI cluster and prevented its re-use, combined with the high volume of BI notes caused by the bulk load that causes additional clusters to be added to the BI file. But without more information from you we can only speculate. I suggest you review your bulk-loading code to see if its transaction scope is too large. It is possible that the developer tried to use a "transaction batching" technique to purposely increase the transaction scope as a means of improving throughput. This can work but the trade-off is BI file growth. You can review manually and also run COMPILE LISTING to see your transaction scope in each procedure.

I also suggest you run your bulk-loading code on a test database where you are the only user. Monitor the database in promon or ProTop. How many transactions (commit in promon) do you see? How often does your transaction ID change? Are the numbers what you expect? Do you see BI growth?

Surya Singh Ghale · Oct 15, 2014

Hi Cringer/RealHeavyDude/Fitz,

Thanks a lot for your information.

We are actually performing huge loading transactions. We did a series and test and the conclusion is that BI cluster was unused due the very first cluster being active. It was interesting to see that after the first transaction was killed, the BI stopped growing as it was being reused.

BTW, Any version in which any of the cluster can be used rather than wait for the first one to be free?

Thanks and Regards,

Surya

Cringer · Oct 15, 2014

I suspect your understanding of how the BI works is a little off. The basic answer to your question is that the workings of the BI has not changed in years and is unlikely to because it works very well. Your issue isn't the BI, it's the code you're using.
For large data loads, binary dump and load is far more efficient.

Rob Fitzpatrick · Oct 15, 2014

Some info on the internals and mechanisms associated with the before image file, from Progress Exchange 2013:

Inside the Progress OpenEdge RDBMS: Before-Images, Checkpoints, Crashes - Gus Björklund, Progress Software
https://www.progress.com/~/media/Progress/Documents/News and Events/Exchange2013/Track 5 - Inside the RDBMS_v1.pdf

Similar presentation from PUG Challenge 2013, but with audio recording:
Before-image log,checkpoints, crashes
Slide deck
Audio recording

Rob Fitzpatrick · Oct 15, 2014

Any active transaction for which before image notes have been written will prevent re-use of the BI cluster(s) that contain those notes. The clusters cannot be reused because that transaction might be rolled back, either because of the client's action or because the database or system crashed and the subsequent crash recovery at startup caused it to be rolled back. Reusing the BI cluster would prevent transaction processing from working properly.

So when doing large data loads, you don't want any transaction activity from clients other than the one(s) doing the load, regardless of how you're loading (ABL, SQL, dictionary, or proutil load). Because the presence of a long-lived transaction during a load which creates many BI notes will prevent BI cluster re-use and force new BI clusters to be added, i.e. it will result in BI growth.

Surya Singh Ghale · Oct 15, 2014

Cringer said:
I suspect your understanding of how the BI works is a little off. The basic answer to your question is that the workings of the BI has not changed in years and is unlikely to because it works very well. Your issue isn't the BI, it's the code you're using.
For large data loads, binary dump and load is far more efficient.

Hi Cringer,

You are correct. I have just started learning more about progress. I am an Oracle DBA and taking up Progress DBA role also. This is the first time that I ever went into more details about how BI clusters are assigned and reused.

BTW, we are using 10.2A3 in RHEL 5.8. Recently we performed delete archive activity. We tried to perform Transaction History loading as Post Delete Archive this time. Application Team changed this way to save time.

Just a question: If I have an active transaction in BI Cluster 1. Then I perform mass loading in one transaction and get next 499 clusters created leading to 500. Now, if transaction holding cluster 1 is still active and I start next loading, will a new BI cluster 501 be created or will it reuse cluster 2 and cluster 3 and so on?

Thanks and Regards,
Surya

Cringer · Oct 15, 2014

I think it would probably be worth your while going on a DBA course to be honest. I did the DBA Bootcamp last week and it was invaluable.
To those who know more about this than me, please forgive my probable use of incorrect terminology!
Think of the BI as a ring of clusters. As one gets filled, the next one comes into action. The currently used one is marked as busy, the previous one will be marked as full. It will remain full, until all active transactions in that cluster are complete, at which point it will be marked as empty again. If Progress has filled up all the current clusters though, and all of them still have active transactional information in them then Progress will create a new cluster, and slip it into the list. This will continue until one of the full clusters is able to be emptied as the transactions are now complete. Each time a new cluster is added to the ring the size of the BI file is increased. This has the impact you are seeing, and it also has the impact of slowing things down as that space has to be allocated to the BI.

Rob Fitzpatrick · Oct 15, 2014

Surya Singh Ghale said:
Just a question: If I have an active transaction in BI Cluster 1. Then I perform mass loading in one transaction and get next 499 clusters created leading to 500. Now, if transaction holding cluster 1 is still active and I start next loading, will a new BI cluster 501 be created or will it reuse cluster 2 and cluster 3 and so on?

Think of the BI clusters as logically being in a ring (linked list):

Code:

--> cluster 0 --> cluster 1 --> cluster 2 --> cluster 3 --> cluster 4 --> ... --> cluster n --
|                                                                                            |
----------------------------------------------------------------------------------------------

Notes are only added, in chronological order, to the current cluster. When it fills, the current cluster is closed and new writes will go to the "next" cluster in the ring. Which one is next? Imagine that cluster 2 is the currently-open BI cluster. BI notes are written to it until it fills. Then a decision is required: can we write new notes to cluster 3, the next one in the ring? If cluster 3 contains no notes that are related to active transactions or to changes that have not yet been committed to disk in the database then the notes in cluster 3 are no longer needed and its space can be re-used. New BI notes are written to cluster 3. But if cluster 3 cannot be re-used, for the reasons given above, then a new cluster must be formatted and inserted into the ring (logically, not physically) between 2 and 3 and new BI notes are written into it. When it fills, the process above repeats itself. I have simplified somewhat but that is the basic process. Gus does a very good job of explaining the process (and a bunch of other interesting DB internals stuff) in this video.

Cringer · Oct 15, 2014

Rob Fitzpatrick said:
Think of the BI clusters as logically being in a ring (linked list):

Code:

--> cluster 0 --> cluster 1 --> cluster 2 --> cluster 3 --> cluster 4 --> ... --> cluster n -- | | ----------------------------------------------------------------------------------------------

Notes are only added, in chronological order, to the current cluster. When it fills, the current cluster is closed and new writes will go to the "next" cluster in the ring. Which one is next? Imagine that cluster 2 is the currently-open BI cluster. BI notes are written to it until it fills. Then a decision is required: can we write new notes to cluster 3, the next one in the ring? If cluster 3 contains no notes that are related to active transactions or to changes that have not yet been committed to disk in the database then the notes in cluster 3 are no longer needed and its space can be re-used. New BI notes are written to cluster 3. But if cluster 3 cannot be re-used, for the reasons given above, then a new cluster must be formatted and inserted into the ring (logically, not physically) between 2 and 3 and new BI notes are written into it. When it fills, the process above repeats itself. I have simplified somewhat but that is the basic process. Gus does a very good job of explaining the process (and a bunch of other interesting DB internals stuff) in this video.

You see, this is why I shouldn't post much on here because that's exactly what I was trying to say, but in a much better way. lol

Rob Fitzpatrick · Oct 15, 2014

I think we basically said the same thing.

I recommend the video; Gus has a knack for providing simple explanations for complicated things.

Surya Singh Ghale · Oct 15, 2014

Cringer said:
I think it would probably be worth your while going on a DBA course to be honest. I did the DBA Bootcamp last week and it was invaluable.
To those who know more about this than me, please forgive my probable use of incorrect terminology!
Think of the BI as a ring of clusters. As one gets filled, the next one comes into action. The currently used one is marked as busy, the previous one will be marked as full. It will remain full, until all active transactions in that cluster are complete, at which point it will be marked as empty again. If Progress has filled up all the current clusters though, and all of them still have active transactional information in them then Progress will create a new cluster, and slip it into the list. This will continue until one of the full clusters is able to be emptied as the transactions are now complete. Each time a new cluster is added to the ring the size of the BI file is increased. This has the impact you are seeing, and it also has the impact of slowing things down as that space has to be allocated to the BI.

Hi Cringer,

I am already in the process of going for a training on progress DBA. But no facility in ASIA....

Thanks a lot for your explanation...

Thanks and Regards,
Surya

TheMadDBA · Oct 15, 2014

Not to threadjack this too much but... I can understand your frustration/confusion coming from an Oracle background. When I started using Oracle after using Progress for 20+ years I was amazed at how many things just worked differently (and imo better) in Oracle than in Progress.

On topic: The documentation is actually pretty decent (in modern releases of Progress) at explaining a lot of the core DBA concepts. Dan Foreman at BravePoint has a book (Database Administration Guide) that would probably be very helpful for you. It covers a broad range of topics.

Surya Singh Ghale · Oct 15, 2014

TheMadDBA said:
Not to threadjack this too much but... I can understand your frustration/confusion coming from an Oracle background. When I started using Oracle after using Progress for 20+ years I was amazed at how many things just worked differently (and imo better) in Oracle than in Progress.

On topic: The documentation is actually pretty decent (in modern releases of Progress) at explaining a lot of the core DBA concepts. Dan Foreman at BravePoint has a book (Database Administration Guide) that would probably be very helpful for you. It covers a broad range of topics.

Hi TheMadDBA,

Thanks a lot. I will see if I can acquire the book. I am also watching the video now and also went through GUS. I am doing better.

Surya

Cringer · Oct 15, 2014

Surya Singh Ghale said:
Hi Cringer,

I am already in the process of going for a training on progress DBA. But no facility in ASIA....

Thanks a lot for your explanation...

Thanks and Regards,
Surya

Hmmm I see a business venture...

Rob Fitzpatrick · Oct 15, 2014

Surya Singh Ghale said:
I am already in the process of going for a training on progress DBA. But no facility in ASIA....

There is also the option of Progress' web-based training. You get access to that for free if you have a PSDN subscription.
https://wbt.progress.com

And I second the recommendation for Dan Foreman's books. I have the DBA Guide, Performance Tuning Guide, DBA Jumpstart, and VST book. Great stuff. Available here:
http://bravepoint.com/products-publications.shtml

Question BI growth

Surya Singh Ghale

Member

Cringer

ProgressTalk.com Moderator

RealHeavyDude

Well-Known Member

Rob Fitzpatrick

ProgressTalk.com Sponsor

Surya Singh Ghale

Member

Cringer

ProgressTalk.com Moderator

Rob Fitzpatrick

ProgressTalk.com Sponsor

Rob Fitzpatrick

ProgressTalk.com Sponsor

Surya Singh Ghale

Member

Cringer

ProgressTalk.com Moderator

Rob Fitzpatrick

ProgressTalk.com Sponsor

Cringer

ProgressTalk.com Moderator

Rob Fitzpatrick

ProgressTalk.com Sponsor

Surya Singh Ghale

Member

TheMadDBA

Active Member

Surya Singh Ghale

Member

Cringer

ProgressTalk.com Moderator

Rob Fitzpatrick

ProgressTalk.com Sponsor

Similar threads