Bug 494889 - GFS2: Turning statfs_slow on and off doesn't re-sync statfs info
GFS2: Turning statfs_slow on and off doesn't re-sync statfs info
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel (Show other bugs)
All Linux
low Severity medium
: rc
: ---
Assigned To: Ben Marzinski
Cluster QE
Depends On:
Blocks: 533192
  Show dependency treegraph
Reported: 2009-04-08 10:40 EDT by Nate Straz
Modified: 2010-04-20 09:09 EDT (History)
6 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 539337 (view as bug list)
Last Closed: 2010-04-20 09:09:59 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
Proposed patch (5.97 KB, patch)
2009-11-19 16:14 EST, Robert Peterson
no flags Details | Diff

  None (edit)
Description Nate Straz 2009-04-08 10:40:49 EDT
Description of problem:

After hitting bug 494885, turning statfs_slow on and off doesn't resync the file system size according to statfs().

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
Actual results:

Expected results:

Additional info:
Comment 2 Steve Whitehouse 2009-07-31 09:15:07 EDT
Ben, is this a dup of something that you've already fixed, or is this still an issue?
Comment 3 Ben Marzinski 2009-07-31 19:07:32 EDT
I don't exactly know what to call this.  The only way I know to see the bug is with the statfs grow bug (bz494885), which is now fixed.  That being said, without the 494885 fix, once statfs is messed up, switching to statfs_slow seems like it should fix the issue, but it doesn't. It does change the df output, but not to the correct numbers.  If you unmount and then remount, statfs_fast is still gives the wrong numbers, but statfs_slow now gives you the correct numbers.
Comment 4 Ben Marzinski 2009-11-18 17:29:32 EST
The more I think about this, the less possible it seems.  Things work fine with some nodes running statfs_slow, and the others running fast statfs. The problem we're trying to fix here is that if the master statfs file gets messed up, people would like turning on statfs_slow to fix it.

When we do a statfs sync, we know the calculated statfs info for the filesystem, and we know what we have changed locally. We can probably lock the statfs master file, and know what the current master file values are.  But we still don't know (and can't know as far as I can see) what part of the difference between the calculated values and current master file values is due to a bug, and what part is due to unsynced changes from other nodes.

Unless there is an unfixed statfs bug in GFS2, the difference between the calculated statfs info and the current master file values will always be due to unsynced changes from other nodes, which could get synced at any time.

I seems to me that the only good way to fix corruption of the master statfs file is offline with fsck, just like we fix other forms of filesystem corruption.  If I'm missing something, please speak up, or I'm going to mark this WONTFIX.
Comment 5 Robert Peterson 2009-11-19 16:14:46 EST
Created attachment 372340 [details]
Proposed patch

Perhaps we should use this bugzilla record to fix fsck.gfs2 so that
it detects and fixes statfs file problems.  This patch does just that.
It was tested on system roth-01.  If this sounds reasonable, I will
take the bug and use it to push this fix to RHEL5.5.  If not, I can
open a new bugzilla for that effort.  I thought Aneesh was going to
open a bugzilla and attach his IT record to it, but that hasn't
happened yet, so I'll add him to the cc list.
Comment 7 Robert Peterson 2009-12-14 12:36:03 EST
The patch that makes fsck.gfs2 check and fix the statfs file has
been implemented via bug #539337.
Comment 10 Robert Peterson 2010-04-20 09:09:59 EDT
As I stated in comment #7 the latest, greatest fsck.gfs2 for 5.5
checks and fixes the statfs file.  As Ben stated in comment #4,
having fsck.gfs2 do the work is much less painful than anything
we could implement in the kernel code.  Therefore, I'm closing
this as WONTFIX as per Steve's suggestion in comment #9.

Note You need to log in before you can comment on or make changes to this bug.