Red Hat Bugzilla – Bug 500484
GFS2: libgfs2's buffering system needs to be overhauled
Last modified: 2010-02-25 13:27:37 EST
Description of problem:
The gfs2-utils package has long suffered from an annoying "feature"
of libgfs2's buffering system. It needs to be overhauled. The
convoluted logic in the current system is often to blame for problems
where blocks are written to disk when they shouldn't be. There have
been a number of fsck problems, for example, where fsck.gfs2 with the
"-n" parameter (do not write to disk) still results in writes. These
have been largely kludged around, but problems still remain. For
example, see comments #1 and #2 in bug #455300.
Today's system uses a set of kernel-like functions to fetch buffers
from disk, mark them dirty and write them back out. The syntax is
convenient so I'd like to keep it mostly the same.
Today's syntax uses: bread(), bget() and
The problem is, fsck.gfs2 is so convoluted that it's nearly
impossible for a given function to determine whether an underlying
function actually changed a buffer and therefore needs to be written
back to disk.
I propose getting rid of the "updated" parameter from the brelse
function in libgfs2/buf.c and replacing it with a new tiny (possibly
inline) function that emulates the kernel's mark_buffer_dirty().
Then, all of the gfs2-utils functions will only mark a buffer dirty
if they actually change the contents. This will eliminate all the
guesswork from fsck and force us to be consistent.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Create a bunch of files in a gfs2 file system
3. Run fsck.gfs2 -y /dev/your/device
4. Run seekwatcher to see if any writes occur
or set a gdb stop in the function that writes a block to disk.
Blocks are written to the file system.
The file system should be "clean" and therefore fsck.gfs2 should not
write to the file system.
The fix for bug #455300 is the first step in disentangling the
Keeping Eric Sandeen in the loop, mainly because bug #455300 was his baby.
I think probably that eliminating it entirely would be a better plan. If we want to read the fs, then we could just mmap() it. That would allow us to have a r/o mapping in the case that we were not intending to modify the blocks and hence catch any incorrect code. Also it means that the blocks are backed by the fs and this allows writeback when it is required in a more sensible order than with the existing scheme.
We would also be able to drop hints with madvise() and msync() to help speed things up. We'd need to ensure that we map things in batches due to the limited amount of virtual memory available on 32 bit arches (we couldn't be certain to map a complete rgrp at once, for example)
In fact a good plan, would be to always mmap() read-only, and use remap_file_pages() to make them read-write only when we need to write to them. Think of it as building in ElectricFence as part of the design :-)
I know at some time in the past, we did consider looking into changing the existing scheme to use mmap()ed buffers and that there was some reason why the change wasn't trivial. I forget what that was now...
The real question I guess, is whether its better to start from scratch (I think it may well be) or to try and force this into the existing tools.
I scraped my previous patch for this and reworked everything as
part of bug #455300, for improving the speed of fsck.gfs2. The
replacement buf.c is a tiny shell of its former self. It's merely
a convenient way to read, write and store data with really no
buffering and no linked lists as before. So I'm closing this as
a duplicate of that one. That fix is nearly complete. At least
fsck.gfs2 seems to work mostly, but I need to do a lot of testing
for all gfs2 user space tools.
*** This bug has been marked as a duplicate of bug 455300 ***