Rebuild of db

jozef · Nov 16, 2009

Hi there.

I'm kinda new in a dba role and having a look at what I've inherited I want to plan on doing a rebuild of our db's.

Progress 10.2A
Redhat Linux

A bit of background - we have number of db's on a number of servers on a WAN. These db's vary in size from 300mb upto 1.5gb. The larger are multi-volume while the smaller are single volume.

The previous dba did a straight convert from v9 to v10 and as a consequence, the db blocksize is still 1k with type I areas.

I want to go to type 2 and have a standard .st table across all db's. I've run a tabanalys and get mean record-size of around 200. I figure I chould do something like 'd "Data Area":10,64;512' and split the schema and index areas out on their own. With disk space not being an issue I was going to standardise with a 1gb fixed extent and then a variable.

After that I'm stumped. I'm wondering what size I should make the fixed bi extent (or even bother, just leave it as variable). Also wonder what RPB abd blocks/cluster I should make the Index & Schema areas.

Any suggestions would be appreciated.

Thanks,

Jo

TomBascom · Nov 16, 2009

jozef said:
Hi there.

I'm kinda new in a dba role and having a look at what I've inherited I want to plan on doing a rebuild of our db's.

Progress 10.2A
Redhat Linux

A bit of background - we have number of db's on a number of servers on a WAN. These db's vary in size from 300mb upto 1.5gb. The larger are multi-volume while the smaller are single volume.

If it is a version 10 database then it is multi-volume. I suspect that you are really saying that there is just one storage are. Or possibly that there is only one data extent currently defined.

The previous dba did a straight convert from v9 to v10 and as a consequence, the db blocksize is still 1k with type I areas.

Yuck.

I want to go to type 2

Excellent!

and have a standard .st table across all db's. I've run a tabanalys and get mean record-size of around 200. I figure I chould do something like 'd "Data Area":10,64;512' and split the schema and index areas out on their own. With disk space not being an issue I was going to standardise with a 1gb fixed extent and then a variable.

After that I'm stumped. I'm wondering what size I should make the fixed bi extent (or even bother, just leave it as variable). Also wonder what RPB abd blocks/cluster I should make the Index & Schema areas.

Any suggestions would be appreciated.

Thanks,

Jo

0) Use either a 4k or an 8k db block. I prefer 8k unless detailed testing shows that 4k is clearly better. (It hardly ever does.)

1) The schema area is what it is. You cannot set rows per block or cluster size. When you are done it should have no data and no indexes in it. Just the meta schema.

2) Average row size is meaningful on a per table basis. Don't aggregate the average row sizes for the whole db (which is what it sounds like you have done).

3) To estimate the optimum rows per block for a table use the next higher power of 2 after: blockSize / ( averageRowSize + 20 ). i.e.:

8192 / (90 + 20) = 74, next power of 2 = 128

4) Remember that the calculation above is an estimate and that it assumes that records sizes in the table follow a Gaussian distribution ("normal curve") and that the record is created at close to its eventual size -- IOW it does not grow substantially over its lifetime. If it does grow substantially then you should probably choose the next lower power of 2.

5) Group tables together into storage areas based on the rows per block settings determined above. You will probably end up with half a dozen or so data areas. Create an index area to parallel each data area (no special reason, I just like the symmetry).

6) Indexes are usually smaller than the table data and don't grow as rapidly. A cluster size of 8 might be the most sensible. (Sometimes a table has a mix of both very large and very small indexes. In that case it may make sense to have 2 index areas... but that probably only applies for very large systems.) Rows per block for index areas doesn't matter so much. In some sense "1" is best as it has a slight advantage for large systems. But if you accidentally put data in an index area "1" is not so good

(yes, I've done it). So you might go with 128 if you want a "just in case" value.

7) These are pretty small databases. A cluster size of 512 is probably larger than you really need. You might also consider using only variable extents rather than variable extents. If you enable large files this makes maintenance fairly painless.

8) Going with a variable length bi file could be quite sensible as well. Or create a 1GB fixed extent and a variable overflow.

9) You don't mention after-imaging. If your data has any value at all you should enable after-imaging. It is foolhardy and irresponsible not to. You should have 4 to 8 variable after image extents.

Casper · Nov 16, 2009

Some more on the cluster size:

If you have many tables then it might be worth while to put all data with more then 4MB of data in clustersize 512 area´s while you put the other tables in clustersize 8 area´s.

If you have reloaded you can use the dbanalys, which you have also taken after the load of the data and rebuilding of indexes, as a baseline to look at possible fragmentation: You can use dbanalys to look at the number of records created vs the number of fragments created after the reload. If it turns out that for some tables the fragmentation seems pretty high then you can move it to a lower rpb area of the same clustersize.

Casper.

LJA · Nov 16, 2009

I might join in here as well, it's sure going to help with my planning. On the bright side, just about anything I do will be a VAST improvement on what the users now have but I still want to get it right, just to make it easier for me to look after and anyone who follows me in the position.

I've loaded up ProTop but am a little confused about some of the information it provides. I'd much rather use the GUI version but unfortunately we don't have a Client license.

Just as a mater of interest, are any of you guys using SAN? How does that affect your planning in terms of having data, recovery & index areas on different disks/volumes?

jozef · Nov 16, 2009

Thanks guys - that's very informative and most helpful. Tom, I did neglect to say we would be using AI files. I'm plumping for 8k just because that seems to be the general rule-of-thumb.

v10 & multi-volume (showing my inexperience :blush1

With the Schema area, were I to do something like 'd "Schema Area":6,32', you're saying that the 32 is irrelevant?

I can see that the assigning of storage areas is going to require a lot more thought than I was initially going to give it!!

In your calculation of RPB you use a + 20. I notice that that is in the 10.1 documentation but in the 10.2 it shows + 2 (Database Essentials - Administrative Planning). Is this a typo?

I gues an RPB of 128 for the index area isn't going to be a huge overhead, so I'll probably go with that. Are you saying that a clustersize of 256 is a better fit? (BTW our largest table has around 5.5m records with a mean size of 86).

Enablelargefiles isn't an option, unfortunately, as we don't have an Enterprise licence.

Casper, that seems a pretty drastic drop from 512 to 8. Is that really the best?

Cheers guys.

TomBascom · Nov 17, 2009

These are smallish databases so be careful not to over analyze this problem

Define the schema area in the .st file just like any other. You have no choice with regards to the rows per block or cluster size so just omit them.

The "+ 20" roughly accounts for block overhead. It's a kind of crude approximation. I'm not sure where you are looking but "2" sounds like a typo to me. I don't think anything changed in 10.2 that would justify "2" vs "20".

Cluster size is mostly a matter of managing how much space you're going to be wasting -- on average you will waste half a cluster per object (tables & indexes). So you don't want to use a cluster size of 512 for a table that only has 12k of data (look in dbanalys to see how big tables and indexes are).

SANs are routinely used. There is no point to splitting data and indexes onto distinct LUNs -- let the SAN manage striping and balancing IO. Try to avoid RAID5 if at all possible (although these are small databases so it probably isn't the end of the world if you do get stuck with it). You do generally want the after-image logs on a discrete set of physical drives for reliablity reasons -- you don't want to lose the ai logs in the same disk failure that takes out the database.

jozef · Nov 17, 2009

That's very handy to know, thanks Tom. Would there be somewhere that all these nuggets of information are gathered together (in an easy to read format) or is it a case of trolling through these threads?

In regard to the +2, it's on page 76 (section 4-8) of the Database Essentials pdf.

Just another thing, when calculating the RPB we use the mean, right? How does a large aberration (e.g. Min: 83 Max: 607 Mean: 119) go? In this case, the RPB would be (8192 / (119 + 20))=58.9, so 64. The block with the max record would have a bit more wasted space?

TomBascom · Nov 17, 2009

Nope, there is no consolidated source of such nuggets

I do do PUG presentations from time to time though... And you can always hire me to do training, mentoring db review etc, etc

jozef · Nov 17, 2009

TomBascom said:
Nope, there is no consolidated source of such nuggets I do do PUG presentations from time to time though... And you can always hire me to do training, mentoring db review etc, etc

Gee, thanks Tom, that's very generous of you

but seeing as how we don't even live in the same hemisphere, I don't reckon I can scrape together the funds

jozef · Nov 17, 2009

And while I'm unearthing these nuggets (and ignoring your suggestion not to over-analyse), can you give me some insight as to what you would do in these scenarios?

Our RPB fits into these bands: 64, 128 & 256
Our cluster sizes fit into : 8, 512

large record size, small amout of data (<1mb) RPB 32, csize 8
small record size, larger amount of data (~ 60mb) RPB 64, csize 512
somewhere in the middle RPB 256, csize 512
somehwere in the middle RPB 256, csize 8

Don't really want to have a data area for each combination & permutation, so how would you assign precedence here? Move 1. into 2.? What's the ramification of this? Is that what you call "page tearing" (i.e. "forcing" a 32 RPB into a 64 RPB)?

So would it be better to move larger RPB's down to smaller values (just to reduce the number of combinations)? Or should I look at the size as well?

Or group all the 512's together and use the lowest RPB or vice-versa?

Do I sound confused????????

Casper · Nov 18, 2009

Casper, that seems a pretty drastic drop from 512 to 8. Is that really the best?

Well it probably isn't the best. But I tend to split up data in small and large clusters with the criteria I already mentioned. So i think in terms of 512 and 8

I never use clustersize 64.

You must realize that just by using typeII storage area's you already gain a perfromance boost and I doubt that it would be measurabe in your database to see the difference between using 512, 64 and 8 as clustersize.

Your database is really small so dont overdo it and therefore making your structure more complex.
Maybe in your case it proofs enough to just make 4 Area's:
-rpb 128, 64 (data bigger tables)
-rpb 128, 8 (data smaller tables)
-rpb 1,8 --> indexes belonging to 64 clustsize
-rpb 1,8 --> indexes belonging to 8 clustersize

(I like symmetry

)

And then if, after reload, the blocks are overall reasonably filled ( 80% or more, to state a number) and after using the database for a while there are no tables with relatively large amounts of data in it that will be fragmented a lot then it is fine.

If some of the larger tables tend to fragment a lot you can then decide to use tablemove to put it in an area with a lower rpb.

if the database blocks are overall poorly filled then you may need to move the bigger tables with the small records 1 rpb higher.
Tables of that size are easy to move around with tablemove and new area's are easily created.

Casper.

TomBascom · Nov 19, 2009

I wouldn't hesitate to have 20 or 30 storage areas.

After that I might start looking for opportunities to consolidate them.

TomBascom · Nov 19, 2009

jozef said:
Gee, thanks Tom, that's very generous of you but seeing as how we don't even live in the same hemisphere, I don't reckon I can scrape together the funds

You never know until you try! Which hemisphere are you in?

jozef · Nov 19, 2009

TomBascom said:
You never know until you try! Which hemisphere are you in?

Southern - bottom left

jozef · Dec 6, 2009

Hi - I'm back.

Just finished setting up a test bed and playing around with different RPB/Cluster size settings and wonder if anyone can explain the results I'm getting with a dbanalys?

8k blocksize

d "Data Area",128;512 ./data f 1048576

Code:

                                       -Record Size (B)- ---Fragments--- Scatter
Table                Records    Size   Min   Max  Mean      Count Factor  Factor
PUB.table001            1508  166.1K    72   181   112       1508    1.0     1.0
PUB.table002              44    2.2K    45    60    51         44    1.0     1.0
PUB.table003              44    2.9K    53    83    67         44    1.0     1.0
PUB.table004              17  993.0B    45    72    58         17    1.0     1.0
PUB.table005           39595    5.0M   104   164   132      39595    1.0     1.0

disk size of the data area = 1,246,424 1k blocks

d "Data Area",64;8 ./data f 1048576

Code:

                                       -Record Size (B)- ---Fragments--- Scatter
Table                Records    Size   Min   Max  Mean      Count Factor  Factor
PUB.table001            1508  166.1K    72   181   112       1508    1.0     1.0
PUB.table002              44    2.2K    45    60    51         44    1.0     1.0
PUB.table003              44    2.9K    53    83    67         44    1.0     1.0
PUB.table004              17  993.0B    45    72    58         17    1.0     1.0
PUB.table005           39595    5.0M   104   164   132      39595    1.0     1.0

disk size of the data area = 1,049,757 1k blocks

I would have thought there would be a marked difference in the Fragments count. Or should I just look at the factor?

Delo123 · Dec 7, 2009

TomBascom said:
SANs are routinely used. There is no point to splitting data and indexes onto distinct LUNs -- let the SAN manage striping and balancing IO. Try to avoid RAID5 if at all possible (although these are small databases so it probably isn't the end of the world if you do get stuck with it).

Hi Tom,

Why avoid Raid 5. What do you advise? Raid 10?

TomBascom · Dec 7, 2009

jozef said:
Hi - I'm back.

Just finished setting up a test bed and playing around with different RPB/Cluster size settings and wonder if anyone can explain the results I'm getting with a dbanalys?

...I would have thought there would be a marked difference in the Fragments count. Or should I just look at the factor?

If I understand correctly all you have done is dump & load into 2 different configurations?

That is not what creates fragmentation. Fragmentation occurs when records are added at a small size and then grow larger over time. If the RPB setting is "too high" there is roon in the block to create the record but when it comes time to expand it there is not enough room for the updated record. The original stub of the record has to remain where it is (Progress won't change the RECID on an update) so the additional data fragment goes somewhere else -- thus 2 or more fragments to make up a record.

This won't normally happen with a dump & load because the record's full size is known when it is created and it either fits or it doesn't fit. (You could still get fragments during a dump & load if the record size is larger than the block size -- but that is unavoidable).

TomBascom · Dec 7, 2009

Delo123 said:
Hi Tom,

Why avoid Raid 5. What do you advise? Raid 10?

Yes, I advise RAID 10 whenever possible.

I recommend avoiding RAID 5 and it's fellow-traveler parity based RAIDs because the performance sucks. (One might say that parity based RAID provides a parody of performance....)

This sucky performance can be largely hidden, but not completely eliminated, by the provisioning of a large RAM cache. But unless the cache is larger than the data the cache can always be saturated by high activity. And when a disk failure occurs the performance degradation as the RAID reconstructs the data is horrific.

None the less... if you have a small or lightly used database you will probably "get away with it". You are likely to only notice if you have a disk failure or if you need to do something unusual like a dump & load.

jozef · Dec 7, 2009

TomBascom said:
If I understand correctly all you have done is dump & load into 2 different configurations?

Oh I see! Yep, that's all I've done. So it's a trade off between fragmentation and wasting space? And seeing as how storage is relatively inexpensive, that's not as big a deal.

TomBascom · Dec 7, 2009

Fragmentation isn't a foregone conclusion -- it is actually fairly rare. Progress does a pretty good job of avoiding it.

The "wasted space" trade-off isn't so straight-forward either. At first blush, sure disk is cheap, don't worry about it. But the most expensive resource on your server is an IO operation. You want to pack as much data into each IO operation as you possibly can. Not because space is expensive but because IO operations are sloooooow.

Which is why I try to pack data as tightly as I can and I'm willing to take some chances on potential fragmentation.

Rebuild of db

New Member

Curmudgeon

ProgressTalk.com Moderator

New Member

New Member

Curmudgeon

New Member

Curmudgeon

New Member

New Member

ProgressTalk.com Moderator

Curmudgeon

Curmudgeon

New Member

New Member

New Member

Curmudgeon

Curmudgeon

New Member

Curmudgeon

Similar threads