Bug 494889
Summary: | GFS2: Turning statfs_slow on and off doesn't re-sync statfs info | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Nate Straz <nstraz> | ||||
Component: | kernel | Assignee: | Ben Marzinski <bmarzins> | ||||
Status: | CLOSED WONTFIX | QA Contact: | Cluster QE <mspqa-list> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | low | ||||||
Version: | 5.3 | CC: | adas, cluster-maint, mmahudha, pdemauro, rpeterso, swhiteho | ||||
Target Milestone: | rc | ||||||
Target Release: | --- | ||||||
Hardware: | All | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | |||||||
: | 539337 (view as bug list) | Environment: | |||||
Last Closed: | 2010-04-20 13:09:59 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 533192 | ||||||
Attachments: |
|
Description
Nate Straz
2009-04-08 14:40:49 UTC
Ben, is this a dup of something that you've already fixed, or is this still an issue? I don't exactly know what to call this. The only way I know to see the bug is with the statfs grow bug (bz494885), which is now fixed. That being said, without the 494885 fix, once statfs is messed up, switching to statfs_slow seems like it should fix the issue, but it doesn't. It does change the df output, but not to the correct numbers. If you unmount and then remount, statfs_fast is still gives the wrong numbers, but statfs_slow now gives you the correct numbers. The more I think about this, the less possible it seems. Things work fine with some nodes running statfs_slow, and the others running fast statfs. The problem we're trying to fix here is that if the master statfs file gets messed up, people would like turning on statfs_slow to fix it. When we do a statfs sync, we know the calculated statfs info for the filesystem, and we know what we have changed locally. We can probably lock the statfs master file, and know what the current master file values are. But we still don't know (and can't know as far as I can see) what part of the difference between the calculated values and current master file values is due to a bug, and what part is due to unsynced changes from other nodes. Unless there is an unfixed statfs bug in GFS2, the difference between the calculated statfs info and the current master file values will always be due to unsynced changes from other nodes, which could get synced at any time. I seems to me that the only good way to fix corruption of the master statfs file is offline with fsck, just like we fix other forms of filesystem corruption. If I'm missing something, please speak up, or I'm going to mark this WONTFIX. Created attachment 372340 [details]
Proposed patch
Perhaps we should use this bugzilla record to fix fsck.gfs2 so that
it detects and fixes statfs file problems. This patch does just that.
It was tested on system roth-01. If this sounds reasonable, I will
take the bug and use it to push this fix to RHEL5.5. If not, I can
open a new bugzilla for that effort. I thought Aneesh was going to
open a bugzilla and attach his IT record to it, but that hasn't
happened yet, so I'll add him to the cc list.
The patch that makes fsck.gfs2 check and fix the statfs file has been implemented via bug #539337. As I stated in comment #7 the latest, greatest fsck.gfs2 for 5.5 checks and fixes the statfs file. As Ben stated in comment #4, having fsck.gfs2 do the work is much less painful than anything we could implement in the kernel code. Therefore, I'm closing this as WONTFIX as per Steve's suggestion in comment #9. |