Bug 231239

Summary: GFS2: statfs code clean-up
Product: Red Hat Enterprise Linux 5 Reporter: Wendy Cheng <nobody+wcheng>
Component: gfs2-utilsAssignee: Wendy Cheng <nobody+wcheng>
Status: CLOSED WORKSFORME QA Contact: GFS Bugs <gfs-bugs>
Severity: medium Docs Contact:
Priority: medium    
Version: 5.1CC: kanderso, rkenna, swhiteho
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-04-13 18:01:56 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 204760, 208514    

Description Wendy Cheng 2007-03-06 22:14:25 UTC
Description of problem:

In the middle of porting GFS2 fast statfs implementation back into 
GFS1. While working thru the code, there are obvious issues found 
in the existing GFS2 code. Open this bugzilla to track them so it
won't get lost. 

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Wendy Cheng 2007-03-07 04:43:31 UTC
* GFS2 stores statfs statistics info in two files. Upon node crash,  
  the data left behind in page cache could get lost (?). Can't find 
  the logic where these data can be recovered. Need to look into this. 
* There is a "slow" path statfs call (identical to GFS1 behavior) but
  doesn't seem to have external interfaces to invoke it (?). With the 
  above recovery issue, we may need this slow path to re-calculate 
  the disk utilization data and correct the master file accordingly.
* The fields in struct gfs_statfs_change_host is defined as __u64 
  (unsigned long long) but is used as s64 (signed long long) thru out
  the code. Not sure how compiler handles this but some of the math
  looks very wrong to me (gfs2_statfs_i(), for example).
* There are probably more - will update the bugzilla when I find them.

Comment 2 Wendy Cheng 2007-03-07 04:58:47 UTC
* Some of the fields in struct kstatfs (buf) is not properly set in
  gfs2_statfs().

Comment 3 Wendy Cheng 2007-03-07 21:33:04 UTC
This is not a problem (maybe it is ?) but can well describe the difficult
aspect of this approach. The statfs per node delta is synced to master 
file every 30 (default) seconds. This implies if anyone floods huge amount 
of data into GFS2 partitions quickly from one node, the "df" output from 
other node will be far-off from the actual value. We need to improve this 
a little bit (say if the delta is above 10%, it should get unconditionally
flushed).



Comment 4 Steve Whitehouse 2007-03-12 15:21:49 UTC
I must admit that I thought that it was using the same algorithm as the quota
system of increasing the frequency of syncing as we near the limit. Have you
come to any conclusions about what needs to be done in GFS2 yet?

Probably the reason that statfs_change_host has a __u64 is that it was copied
directly from the statfs_change on-disk structure which would be a __be64 and
thus sign isn't really taken into account. So long as all the manipulations
are done using the s64, then there should be no problem. It is a long term goal
to rid ourselves of the _host structures, and I think that should be trivial in
this case just by passing the struct kstatfs to gfs2_statfs_slow and
gfs2_statfs_i instead.

I presume the two fields you are referring to in comment #2 are f_fsid and
f_frsize ?


Comment 5 Wendy Cheng 2007-03-12 15:57:18 UTC
Not sure how RHEL5 compiler handles this - but on RHEL4, I got funny data.
When blocks count should be subtracted, it immediately becomes a huge number
(apparently the compiler does an "add-using-unsigned", instead of two signed
long long math). However, it is possible that I have header file issues 
though. 

And yes, the f_frsize, etc are not initialized. 

These are not critical issues - so don't worry about them. Will fix them
when I'm done with RHEl4 port (and using the same test suites to unit test
GFS2).