Bug 494889

Summary: GFS2: Turning statfs_slow on and off doesn't re-sync statfs info
Product: Red Hat Enterprise Linux 5 Reporter: Nate Straz <nstraz>
Component: kernelAssignee: Ben Marzinski <bmarzins>
Status: CLOSED WONTFIX QA Contact: Cluster QE <mspqa-list>
Severity: medium Docs Contact:
Priority: low    
Version: 5.3CC: adas, cluster-maint, mmahudha, pdemauro, rpeterso, swhiteho
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 539337 (view as bug list) Environment:
Last Closed: 2010-04-20 13:09:59 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 533192    
Attachments:
Description Flags
Proposed patch none

Description Nate Straz 2009-04-08 14:40:49 UTC
Description of problem:

After hitting bug 494885, turning statfs_slow on and off doesn't resync the file system size according to statfs().


Version-Release number of selected component (if applicable):
gfs2-utils-0.1.53-1.el5_3.2
kernel-2.6.18-128.el5


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 2 Steve Whitehouse 2009-07-31 13:15:07 UTC
Ben, is this a dup of something that you've already fixed, or is this still an issue?

Comment 3 Ben Marzinski 2009-07-31 23:07:32 UTC
I don't exactly know what to call this.  The only way I know to see the bug is with the statfs grow bug (bz494885), which is now fixed.  That being said, without the 494885 fix, once statfs is messed up, switching to statfs_slow seems like it should fix the issue, but it doesn't. It does change the df output, but not to the correct numbers.  If you unmount and then remount, statfs_fast is still gives the wrong numbers, but statfs_slow now gives you the correct numbers.

Comment 4 Ben Marzinski 2009-11-18 22:29:32 UTC
The more I think about this, the less possible it seems.  Things work fine with some nodes running statfs_slow, and the others running fast statfs. The problem we're trying to fix here is that if the master statfs file gets messed up, people would like turning on statfs_slow to fix it.

When we do a statfs sync, we know the calculated statfs info for the filesystem, and we know what we have changed locally. We can probably lock the statfs master file, and know what the current master file values are.  But we still don't know (and can't know as far as I can see) what part of the difference between the calculated values and current master file values is due to a bug, and what part is due to unsynced changes from other nodes.

Unless there is an unfixed statfs bug in GFS2, the difference between the calculated statfs info and the current master file values will always be due to unsynced changes from other nodes, which could get synced at any time.

I seems to me that the only good way to fix corruption of the master statfs file is offline with fsck, just like we fix other forms of filesystem corruption.  If I'm missing something, please speak up, or I'm going to mark this WONTFIX.

Comment 5 Robert Peterson 2009-11-19 21:14:46 UTC
Created attachment 372340 [details]
Proposed patch

Perhaps we should use this bugzilla record to fix fsck.gfs2 so that
it detects and fixes statfs file problems.  This patch does just that.
It was tested on system roth-01.  If this sounds reasonable, I will
take the bug and use it to push this fix to RHEL5.5.  If not, I can
open a new bugzilla for that effort.  I thought Aneesh was going to
open a bugzilla and attach his IT record to it, but that hasn't
happened yet, so I'll add him to the cc list.

Comment 7 Robert Peterson 2009-12-14 17:36:03 UTC
The patch that makes fsck.gfs2 check and fix the statfs file has
been implemented via bug #539337.

Comment 10 Robert Peterson 2010-04-20 13:09:59 UTC
As I stated in comment #7 the latest, greatest fsck.gfs2 for 5.5
checks and fixes the statfs file.  As Ben stated in comment #4,
having fsck.gfs2 do the work is much less painful than anything
we could implement in the kernel code.  Therefore, I'm closing
this as WONTFIX as per Steve's suggestion in comment #9.