Hash Table Entries and the -hash startup parameter

Chris Kelleher

Administrator
Staff member
The -hash Startup Parameter

The database buffer pool is a collection of buffer headers and buffers.
Buffer headers contain descriptions of buffer contents. Buffers are
used to keep copies of recently used database blocks in memory. In
order to quickly determine whether or not a desired database block is
present in the buffer pool without time consuming searches, a "hash
table" is used. The hash table is sized in proportion to the number
of buffers allocated and contains pointers to linked lists of buffer
headers.

To locate a specific buffer, the desired database block number is
first divided by the size of the hash table. The remainder is then
used to select a corresponding hash table entry. Note that there is
the possibility that several database blocks may be hashed to the same
table entry. When this happens, the buffer headers are connected
together in a "hash chain" or linked list.

Once the correct hash table entry has been determined, that entry is
examined. If it contains a zero, then the desired buffer is not
present in the buffer pool.

If the entry is nonzero, it is a pointer to a buffer header. The
buffer header is then examined to determine if it is for the desired
buffer. If it is, then the buffer has been located.

If the buffer header does not match the desired block, then the
header's "next buffer in the hash chain" pointer is examined to find
another buffer header with the same hash value, which is then examined
as desribed above. The process is repeated until either the desired
buffer is found or it is determined that the desired buffer is not
present. When the "next" pointer is zero, or a buffer header that refers
to a higher block number than the desired one is found, then the
desired block is not present.

The default value for -hash is chosen to be a prime number
approximately equal to one fourth of the number of database
buffers (specified with the -B startup parameter). The exact
size of the hash table is not important (see below), but its
size should be a prime number because that will produce the most
even distribution of buffer headers across hash chains.

The default hash table size provides for an average hash chain length
of approximately 4. If the distribution of blocks across hash chains
is even, a search will require examining at most 2 buffer headers on
the average.

The default hash table size is found from a small prime number table.
It has 30 entries, with values increasing in a geometric progression
using the approximate ratio 1 to 1.4. The largest number in the table
is 98407. The prime number table is identical in versions 6.3, 7, and
8.

As mentioned above, the average hash chain length will be the number
of buffers (the value of -B) divided by the hash prime actually used.
The current upper limit for -B is 500,000 buffers.

With 500,000 buffers, the average chain length will be approximately
500,000 / 98407 or 5.08 entries. To find a specific block thus
requires examining 2.54 hash chain entries, on the average. Since the
chains are ordered by block number, searches may often be shorter.

Given the facts mentioned in the foregoing, performance gains that
could be obtained by manually setting the value of -hash are likely
to be very small or nonexistent.

In short, I don't think -hash is worth bothering with. I'm sorry I
made it adjustable.

--
regards,
gus
****************************************************************
Gus Bjorklund, Progress Software Corporation, Bedford MA.
Purveyors of the finest rdbms on the third planet from the sun.

"In theory, there is no difference between theory and practice,
but in practice, there is." - Jan L.A. van de Snepscheut
****************************************************************
 
Top