Question Segregate Tables & Indexes On A Type-ii Storage Area

Hello Everyone,
We have a DB with 900+ tables and approximately 1.1 TB of data. I have couple of quick questions;

1. How do we segregate tables under different areas? I am sure it shouldn't be based on functionality. Can we have it 1 table per area and group of indexes of that table in another area which will lead to 1800 area's?
2. Or do I have to group all the small tables in one area and keep each fast growing tables in separate area?

What is the preferred guidelines for segregating tables and indexes? Do we have any specific article/documentation? Or based on someone's experience please do let me know the best practices.

Regards,
Saravanakumar B
 

Cringer

ProgressTalk.com Moderator
Staff member
There's no black and white guidelines as people have different opinions.
What is important is to segregate data and indexes. That is certainly without debate IMO.
One are per table and another for its indexes is serious levels of overkill and is unmaintainable. You may find that some large fast growing tables will benefit from their own areas, but the vast majority will cohabit happily. Remember, Type II storage already ensures that blocks do not contain data from different tables which was the main problem with Type I areas in terms of fragmentation etc.
There's quite a lot of talk on the matter in here already which a quick search should throw up, and no doubt others will be along with their 2 cents... :)
 

RealHeavyDude

Well-Known Member
As Cringer already said, there are different opinions:

I use mostly 2.5 patterns to group tables into storage areas:
  1. Size of the table - large tables may merit their own storage area
  2. Transaction load - tables with a high transaction load may also merit their own storage area.
  3. ( Mean Record length - tables with very large records may also merit their own storage ara. )
But this, as always, is a compromise and is subject to debate. Lately I don't focus that much on mean record length. I guess mostly you will benefit from having the tables in type II storage areas. Grouping them by mean record lenght might give some benefits here and there but also may not be worth it.

Heavy Regards, RealHeavyDude.
 

cj_brandt

Active Member
I like to keep tables with word indexes in a separate area as well as their indexes.
If you decide to enable auditing, I like a separate area for tables and indexes.
 

Rob Fitzpatrick

ProgressTalk.com Sponsor
From: Different Progress Database Structures

  • Always use Type II storage areas (areas with 8, 64, or 512 blocks per cluster are Type II areas).
  • Never put any objects (tables, indexes, or LOB columns) in the Schema Area.
  • Never put objects of different types (e.g. tables and indexes) in a single storage area.
  • Create a storage area for each very large table.
  • For each of those, create a storage area for the indexes of that table.
  • Create separate storage areas for LOBs, if any.
  • Create a storage area for tables with word indexes, if any (this will yield benefits for future index rebuilds; ensures that data scans can be multi-threaded).
  • Ensure that you understand your choices for storage area settings for records per block and blocks per cluster before defining your structure. Choose values appropriate for your data. RPB doesn't matter for index areas (provided they only contain indexes). For those I use RPB 128 just in case someone accidentally adds a table to an index area in the future.
  • Split tables into different areas based on the physical characteristics of the data (e.g. large tables, small tables, empty tables, tables with small mean record size, tables with large mean record size) rather than the functional characteristics of the data (e.g. accounting tables, sales tables, tax tables, etc.).
  • Re fixed versus variable extents: unless you have very high transaction volume or your storage areas are very large (hundreds of GB), you should be fine using all variable extents in your structure. That, combined with enabling large file support, will reduce the maintenance and monitoring required. Variable extents can grow to a maximum of 1 TB. Without large file support the maximum extent size is 2 GB.
  • When creating a new DB, choose the 8 KB database block size. Don't ever use a file system block size that is larger than your database block size, as performance will suffer.
  • In production, always use after imaging (and monitor its status!). Using variable-length AI extents will make your life easier.
  • If possible, locate the after image extents on separate physical storage from the rest of the database. That way if your database becomes corrupted or is lost due to media failure, you can recreate it with your last backup and your AI files since that backup. Obviously that won't be an option if your AI files were also stored on the failed media.
Related stuff:
  • Get to know your storage subsystem. What storage is available to you? Is it on a SAN? NAS? Direct-attached storage? HDD or SDD? RAID level? Are there different tiers of storage available to you?
  • Don't put database files on parity-based RAID storage. Examples: RAID 5/6/50/60/DP, etc. RAID 10 is a good choice.
  • Use quality enterprise storage. Don't put database files on a NAS or on consumer-grade disks.
Good guidance from Tom Bascom of White Star Software/DBAppraise:
Storage Optimization Strategies
video: http://download.psdn.com/media/exchange_online_2010/1004TomBascom.wmv
slides: http://dbappraise.com/ppt/sos.pptx
 
@Rob Fitzpatrick - Sorry for the delayed response. I was working in classifying the tables to different areas based on the information and advice from this thread. One quick question, you have mentioned saying you will put tables which are very large size into a new area. I have tables ranging from MB's to 170 GB's. What size of table would you recommend to be segregated to a separate area, say any table more than 100 GB of data? Or any other recommended value? Please advice.
 

TomBascom

Curmudgeon
If you chart them there is usually a "knee" in the relative table sizes. IOW a few are generally much, much larger than any of the rest. It might be one or two or it might be a dozen -- every application is a bit different but it probably isn't 30 tables ;) I usually just get a list of table sizes and eyeball it.
 
Thanks Tom, I took the list of 900 tables to an excel with then list of records. As you pointed out there are around 6 tables that are > 100GB and rest are less than 50 GB. I guess I can put those 7 tables into separate areas.

I have also categorized it based on CRUD operations report, mean record length and word indexes.
 
Last edited:
We have a new approach proposed from our DBA team and wanted to see if it will be a recommended one. As per this thread we have classified tables based on below parameters;

1. Record Size
2. Transaction Operations (CRUD Report)
3. Mean Record Size

Considering these parameters, I can classify 23 tables as Large, when I say Large it can be Large based on one of the above parameters. I can segregate 90 tables as Medium and 800+ tables as small (out of which 300 tables have 0 to 100 records and rest less than 2000 records each).

Currently the areas are classified based on functionality which is incorrect. We got a recommendation saying;
1. 23 Large tables will be provided with separate data and index area for each table
2. 800+ small tables will be put into single area xxx_small_01_data and xxx_small_01_index where 'xxx' will be the database name
3. The other 90 medium tables will be grouped based on crud operations and record size; and will be named as xxx_medium_01_data, xxx_medium_01_index, xxx_medium_02_data, xxx_medium_02_index and so on (could have multiple index and data area)

We have around 230 WORD INDEXes. Do we have to put each Index in separate area? Or group WORD INDEXes based on table size?

Please advise.
 

Rob Fitzpatrick

ProgressTalk.com Sponsor
We have around 230 WORD INDEXes. Do we have to put each Index in separate area? Or group WORD INDEXes based on table size?
The justification for different treatment of tables with word indexes is to optimize index rebuild. On OpenEdge 10.2B06+ and 11.1+, the data-scan portion of index rebuild, where RM blocks are scanned to extract key data, can be multi-threaded when certain conditions are met for the data area currently being scanned. The conditions are:
  • you have indicated that you have enough space for disk sorting;
  • the area being scanned doesn't contain any of the indexes that are being rebuilt;
  • the area being scanned doesn't contain any tables that have word indexes;
  • the area being scanned is a Type II area.
When the conditions are met, the scan can be multi-threaded using the -datascanthreads option on proutil idxbuild. As you can see above, what matters here is not where the word indexes are located. What matters is the location of the tables that have word indexes. So if the tables that have those 230 word indexes are all fairly small, put them all in a single storage area so that they don't needlessly slow down the key scans of other tables that don't have word indexes. The word indexes themselves could go into a common index area containing other non-word indexes. The exception would be if one of those tables is very large, in which case that table would be segregated into its own area anyway, and its indexes would have their own area.
 

Rob Fitzpatrick

ProgressTalk.com Sponsor
Got it. Thanks @Rob Fitzpatrick.. How about the rest, i mean the table classifications? Are you fine with the table being classified this way?
I certainly agree with putting each very large table in its own area and creating an index area for each of these tables. I agree with putting all the small/empty tables in a single area, with another single area for all of their indexes.

I sometimes tune area RPBs based on mean record size, but I don't subdivide that by CRUD. I'm not aware of a reason to segregate heavily-read tables from lightly-read ones. There are other tools for dealing with heavy reads, like -lruskips and Alternate Buffer Pool. If there are high numbers of creates then those will be fast-growing (and large, or soon to be large) tables for which you already have a rule.

Can you expand on exactly what is meant by "grouped based on crud operations and record size"?
 
When I say Record Size it's Mean Record Size. Let's consider the below example;

Table Name - # of Records, Table Size, Mean Record Size
T1 - 120969962, 28.8G, 98 [120.96 Million Records]
T2 - 35389652, 40.2G, 315 [35.9 Million Records]

In the above example even though T2 has way too less # of records, as the mean record size is high the table size shoots up. Hence I would be classifying this as a Large Table and not a Medium size table. Is that fine?

On the same thread we have a statement saying "Transaction load - tables with a high transaction load may also merit their own storage area." and that's the reason I got the CRUD details of all the tables and if a table has high create/update am moving it to a different area.

Please advise.
 
Last edited:

Rob Fitzpatrick

ProgressTalk.com Sponsor
You may want to tune storage areas for medium-size tables to match the table mean record size to the area records per block (RPB). But remember that mean record size can change based on application changes. Even a fairly large change in average record size might not show up in an average of many years of historical data if the change was recent.

In general you don't want RPB to be too high as this can cause record fragmentation if records grow beyond their initial size when they are updated. And you don't want RPB to be too low for the record size, assuming a reasonable distribution of record sizes around the mean, as you will tend to have blocks that are partially empty because the row directory filled before the free space in the block was used up. This reduces caching efficiency and increases the ratio of physical to logical I/O.

Segregating two or more tables into separate areas based on a high rate of updates might help in the sense that it spreads out file contention. I can't say; my databases tend to be small so I don't see I/O rates where that becomes a problem. But others may have more informed opinions than me.
 
Segregating two or more tables into separate areas based on a high rate of updates might help in the sense that it spreads out file contention. I can't say; my databases tend to be small so I don't see I/O rates where that becomes a problem. But others may have more informed opinions than me.

FYI - Our application can be classified under records management domain where we face very high CRUD operations on tables.
 

Rob Fitzpatrick

ProgressTalk.com Sponsor
Regarding the R in CRUD:
  • use -lruskips 100 on your primary broker;
  • assign heavily-read tables and indexes that are small and static (and the Schema Area) to the alternate buffer pool to keep them memory-resident, and set -B2 to an appropriate value;
  • use -lru2skips 100, if also using -B2;
  • use -omsize n, where n is greater than the number of records in _StorageObject, unless that number is less than the default of 1024;
  • set -B as large as you can without starving the OS or other processes on the machine for memory.
 
Top