Bug 735464 - dhash can corrupt memory if initial table size is large
Summary: dhash can corrupt memory if initial table size is large
Alias: None
Product: Fedora
Classification: Fedora
Component: ding-libs
Version: 15
Hardware: Unspecified
OS: Unspecified
Target Milestone: ---
Assignee: John Dennis
QA Contact: Fedora Extras Quality Assurance
Depends On:
Blocks: 736074
TreeView+ depends on / blocked
Reported: 2011-09-02 17:47 UTC by John Dennis
Modified: 2011-09-29 01:39 UTC (History)
2 users (show)

Fixed In Version: ding-libs-0.1.3-5.fc15
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 736074 (view as bug list)
Last Closed: 2011-09-29 01:39:44 UTC

Attachments (Terms of Use)
Fix to prevent over writing allocated array, plus other clean-up (18.86 KB, patch)
2011-09-06 14:58 UTC, John Dennis
no flags Details | Diff
Fix to prevent over writing allocated array, plus other clean-up (22.66 KB, patch)
2011-09-06 18:42 UTC, John Dennis
sgallagh: review+
Details | Diff

Description John Dennis 2011-09-02 17:47:02 UTC
This problem was originally reported by Brent E Stephens <bestephe@us.ibm.com> in a private email, I'm opening the bugzilla on behalf of Brent.

Brent observed if he called hash_create() with an initial table size of 65536 it would corrupt memory. Under the default parameters the corruption can occur if the initial table size > 1024.

This occurs because at line 568 of dhash.c is this loop:

for (i = 0; i < count; i++) {
    table->directory[i] = (segment_t *)halloc(table, table->segment_size * sizeof(segment_t));

The directory array is allocated above and the count variable is not bounded by the size of the directory array.

Briefly the count variable is meant to be the next power of two less then two to the number of bits allocated to a segment (sorry, that's obscure).

The number of buckets in the hash table is 2**directory_bits + 2**segment_bits. By default both directory_bits and segment_bits is 5, therefore in a default case the directory size is 32 and the segment size is 32 yielding 1024 buckets.

The value of count above will exceed 32 (the default length of the directory) when the initial count > 1024.

There are two problems with the directory initialization loop:

1) It was not bounded by the directory size.

2) It was a dubious optimization.  The idea was to pre-populate every conceivable bucket based on the initial size. But the whole point of a dynamic hash table is to allow it's internal data structure to grow or contract based on the current number of items in the table to strike a balance between memory usage and performance. But if one pre-allocates based on an initial size you lose the memory advantage. The disadvantage is further compounded by the fact after the first item is entered into the table the table will examine it's memory usage based on having just one item and may elect to contract the table to conserve memory. The table should be initialized to it's minimum size and allow to grow naturally.

The correct optimization based on the suggested table size is to adjust the number of directory_bits and segment_bits to accommodate the suggested table size. The directory_bits and segment_bits are fixed at table creation time and previously assumed default values of 5 unless overridden via parameters in create_table_ex(). It's the directory_bits and segment_bits which potentially have more impact on the table performance thus they should have been influenced by the suggested table size (because those values influence the length of the bucket chain which must be traversed on a hash collision). The bottom line is that optimal performance is a function of the number of hash collisions vs. memory utilization which is an extremely hard thing to predict a priori. But adjusting the memory utilization via the size of the directory_bits and segment_bits based on the anticipated table size is probably as close as we can get. The ideal is to have the number of buckets equal the table size and for every item in the table to hash to a unique bucket. We want to avoid buckets with long chains or buckets with no chains. Since we can't predict the hash collision a priori our next best solution is to allow the table to grow to where the number of buckets matches the number of entries. This can be done by choosing directory_bits and segment_bits based on the table size. The number of collisions will determine how many of the buckets have more than one entry and how many buckets are unpopulated. If the number of buckets >= table size and there are no collisions we've reached optimal performance in terms of the number of processor cycles used to locate the item in the table (of course that has to be balanced against processor cache misses and virtual memory paging if we use too much memory).

Sorry for the long rant, but to fix the bug I had to take a long hard look at how memory was utilized and to refer back to the ACM paper the original implementation was based on. Also, the original implementation did not have the ability to contract the table to conserve memory, thus there was an advantage to pre-allocating based on suggested table size. That pre-allocation survived the later addition of the contraction logic. As far as I can tell the original implementation always contained the bug which did not bound to the pre-allocation loop to the directory size.

Comment 1 John Dennis 2011-09-06 14:58:52 UTC
Created attachment 521682 [details]
Fix to prevent over writing allocated array, plus other clean-up

* Resolves: bug #735464 Fix the loop limit used to initialize the table
  directory, was based on count, now limited to segment_count.

* Do not pre-allocate all buckets based on requested table size. This
  defeats the point of a dynamic hash table which adjusts it's memory
  usage depending upon the number of items it holds. Pre-allocation
  also runs afoul of the table contraction logic, if the table is
  pre-allocated to a large size it will subsequently try to compact it
  negating the pre-allocation. Now the initial allocation is
  restricted to one directory segment (i.e. table->segment_count == 1)

* If the caller did not specify the directory_bits and segment_bits
  then compute them from the requested table size. The directory_size
  and segment_size must be powers of 2 to accmodate fast
  arithmetic. An optimal maximum number of buckets would be equal to
  the anticipated table size. If there were no collisions that would
  mean each item would be located in it's own bucket whose chain
  length is 1, the item would be immediatly found upon indexing into
  the bucket table.

* There is a requirment there be at least one directory
  segment, however contract_table() failed to enforce this
  requirement. Add a check to enforce the requirement.

* If hash_create() or hash_create_ex() failed it left the tbl
  parameter uninitialized but returned an error code. Now upon failure
  tbl is initialized to NULL as well as returning an error code.

* Clean up how the directory and segment sizes were computed. It had
  been using a loop and shifting 1 bit at a time, now it shifts all
  the bits in one operation and assures at least one directory and
  segment are allocated.

* In the event of an early exit from hash_create_ex() due to an error
  make sure all previously allocated resources are cleaned up by
  calling hash_destroy(). There was only one place this was
  missing. hash_destroy() blindly assumed the table had a directory,
  normally it would unless hash_destroy was called due to an early
  exit in hash_create_ex(). Modify hash_destroy() to check for the
  existence of the directory before cleaning up the directory

* Update the function documentation in dhash.h to explain each
  parameter of hash_create() and hash_create_ex().

* Enhance dhash_test.c
  - To output the random seed used to intialize the random number
    generator and add command line option to set the random seed. This
    permits recreating a failed test run.
  - Add command line parameter to set the initial table size.
  - Use proper string to long conversion routines when reading command
    line parameters (e.g. strtoul() instead of atoi())
  - Add logic to make sure each test value is unique.

* Add additional information to the debug output to show the computed
  directory_bits, segment_bits, sizes, etc.

* Adjust the debug_level conditionals to be less verbose.

Comment 2 John Dennis 2011-09-06 18:42:59 UTC
Created attachment 521734 [details]
Fix to prevent over writing allocated array, plus other clean-up

Identical patch as previous patch except using git format.

Comment 3 Stephen Gallagher 2011-09-06 19:13:56 UTC
Patch is acked and committed upstream. I will do a new upstream release with this fix in the next couple of days.

Comment 4 Fedora Update System 2011-09-16 13:26:10 UTC
ding-libs-0.1.3-5.fc15 has been submitted as an update for Fedora 15.

Comment 5 Fedora Update System 2011-09-18 01:01:27 UTC
Package ding-libs-0.1.3-5.fc15:
* should fix your issue,
* was pushed to the Fedora 15 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing ding-libs-0.1.3-5.fc15'
as soon as you are able to.
Please go to the following url:
then log in and leave karma (feedback).

Comment 6 Fedora Update System 2011-09-29 01:39:39 UTC
ding-libs-0.1.3-5.fc15 has been pushed to the Fedora 15 stable repository.  If problems still persist, please make note of it in this bug report.

Note You need to log in before you can comment on or make changes to this bug.