Bug 1390314 - GFS2: Make dir hash table contiguous
Summary: GFS2: Make dir hash table contiguous
Keywords:
Status: NEW
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: rawhide
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
Assignee: gfs2-maint
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-10-31 16:42 UTC by Steve Whitehouse
Modified: 2016-10-31 16:43 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Type: Bug


Attachments (Terms of Use)

Description Steve Whitehouse 2016-10-31 16:42:03 UTC
As the hash table grows, it gets doubled in size each time. While it is in the inode, and when it has been first promoted to be external to the inode, it will be a single block, and therefore contiguous. After that point in time, it will fail to be contiguous in most cases since it will be allocated new blocks at various points in time and the new files that live in the directory will be interleaved with the hash table blocks.

We would like to be able to have the dir hash table as a contiguous run of blocks on disk, so that we can read it all in one go, whatever size it might happen to be. In order to do that, we will need to know from the information in the dir inode, the location of the initial block in the hash table (i.e. without reading the indirect blocks) and the size of the dir hash table.

We can use di_payload_format both to indicate whether the directory hash table is contiguous, and its length (since it is 32 bits long). The advantage of using this field is that it will be updated to GFS2_FORMAT_DE by any older nodes, if they make a change to the directory, thus giving us full backwards compatibility.

We then only need to figure out where to put the details of the first block of the hash table. We could either use part of the __pad4 or part of the 44 reserved bytes for that. In fact if we did use part of __pad4, then we could also add cr_time as the other part of __pad4 and __pad2 - something that is also overdue being added.

The only other issue then is what to do with the holes that are created as we move the hash table when we expand it. I think the sensible thing to do is to try and fill them with the directory indirect blocks - since we will no longer need to look at those directly, they are only there for backwards compatibility and so it doesn't matter if they are fragmented. Otherwise we'll have to come up with a different plan in due course.

As a later exercise we could also look at whether we can play some similar games with the leaf blocks at some stage, but that will be more tricky I think. Lets do the easier bit first.


Note You need to log in before you can comment on or make changes to this bug.