500484 – GFS2: libgfs2's buffering system needs to be overhauled

Bug 500484 - GFS2: libgfs2's buffering system needs to be overhauled

Summary: GFS2: libgfs2's buffering system needs to be overhauled

Keywords:
Status:	CLOSED DUPLICATE of bug 455300
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	gfs2-utils
Sub Component:
Version:	5.3
Hardware:	All
OS:	Linux
Priority:	low
Severity:	low
Target Milestone:	rc
Target Release:	---
Assignee:	Robert Peterson
QA Contact:	Cluster QE
Docs Contact:
URL:
Whiteboard:
Depends On:	455300 568446
Blocks:
TreeView+	depends on / blocked

Reported:	2009-05-12 21:36 UTC by Robert Peterson
Modified:	2010-02-25 18:27 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2009-11-19 14:37:46 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Robert Peterson 2009-05-12 21:36:31 UTC

Description of problem:
The gfs2-utils package has long suffered from an annoying "feature"
of libgfs2's buffering system.  It needs to be overhauled.  The
convoluted logic in the current system is often to blame for problems
where blocks are written to disk when they shouldn't be.  There have
been a number of fsck problems, for example, where fsck.gfs2 with the
"-n" parameter (do not write to disk) still results in writes.  These
have been largely kludged around, but problems still remain.  For
example, see comments #1 and #2 in bug #455300.

Today's system uses a set of kernel-like functions to fetch buffers
from disk, mark them dirty and write them back out.  The syntax is
convenient so I'd like to keep it mostly the same.

Today's syntax uses: bread(), bget() and
brelse(bh, updated|not_updated);

The problem is, fsck.gfs2 is so convoluted that it's nearly
impossible for a given function to determine whether an underlying
function actually changed a buffer and therefore needs to be written
back to disk.

I propose getting rid of the "updated" parameter from the brelse
function in libgfs2/buf.c and replacing it with a new tiny (possibly
inline) function that emulates the kernel's mark_buffer_dirty().
Then, all of the gfs2-utils functions will only mark a buffer dirty
if they actually change the contents.  This will eliminate all the
guesswork from fsck and force us to be consistent.

Version-Release number of selected component (if applicable):
RHEL5

How reproducible:
Always

Steps to Reproduce:
1. Create a bunch of files in a gfs2 file system
2. unmount
3. Run fsck.gfs2 -y /dev/your/device
4. Run seekwatcher to see if any writes occur
   or set a gdb stop in the function that writes a block to disk.
  
Actual results:
Blocks are written to the file system.

Expected results:
The file system should be "clean" and therefore fsck.gfs2 should not
write to the file system.

Additional info:
The fix for bug #455300 is the first step in disentangling the
buf.c web.

Comment 1 Robert Peterson 2009-05-12 22:00:42 UTC

Keeping Eric Sandeen in the loop, mainly because bug #455300 was his baby.

Comment 2 Steve Whitehouse 2009-05-13 09:20:44 UTC

I think probably that eliminating it entirely would be a better plan. If we want to read the fs, then we could just mmap() it. That would allow us to have a r/o mapping in the case that we were not intending to modify the blocks and hence catch any incorrect code. Also it means that the blocks are backed by the fs and this allows writeback when it is required in a more sensible order than with the existing scheme.

We would also be able to drop hints with madvise() and msync() to help speed things up. We'd need to ensure that we map things in batches due to the limited amount of virtual memory available on 32 bit arches (we couldn't be certain to map a complete rgrp at once, for example)

In fact a good plan, would be to always mmap() read-only, and use remap_file_pages() to make them read-write only when we need to write to them. Think of it as building in ElectricFence as part of the design :-)

I know at some time in the past, we did consider looking into changing the existing scheme to use mmap()ed buffers and that there was some reason why the change wasn't trivial. I forget what that was now...

The real question I guess, is whether its better to start from scratch (I think it may well be) or to try and force this into the existing tools.

Comment 3 Robert Peterson 2009-11-19 14:37:46 UTC

I scraped my previous patch for this and reworked everything as
part of bug #455300, for improving the speed of fsck.gfs2.  The
replacement buf.c is a tiny shell of its former self.  It's merely
a convenient way to read, write and store data with really no
buffering and no linked lists as before.  So I'm closing this as
a duplicate of that one.  That fix is nearly complete.  At least
fsck.gfs2 seems to work mostly, but I need to do a lot of testing
for all gfs2 user space tools.

*** This bug has been marked as a duplicate of bug 455300 ***

Note You need to log in before you can comment on or make changes to this bug.