Red Hat Bugzilla – Bug 498469
GFS2: Improvements to statfs_fast
Last modified: 2009-12-16 17:11:16 EST
Description of problem:
I've been working on a bunch of changes to the statfs_fast code in
gfs for bug #488318. It would be good to port these to GFS2.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
Here are some notes from a conversation I had about it with Steve
Whitehouse on IRC:
1. We should replace the gfs2_tool settune <mnt> statfs_slow 0/1 with
a mount option. E.g. mount -o statfs=[lazy|normal]
2. Why not keep a record of when we last had the rgrp lock and insist
on locking it if we've not seen it for N seconds.
3. Right now, gfs uses the quota daemon to re-sync the rgrp glocks.
I know we've ripped out some gfs2 daemons, so I have to look into
whether this is the best way to implement this in gfs2.
4. Steve had a concern that re-syncing every 5 seconds is too agressive.
We may want another mount option to adjust that, and/or make the
default looser. We probably need to investigate.
I have a feeling that I did not read the details directly. The details were about the quota code, and there was an assumption that the fast statfs code would work the same way. In otherwords, it would be bounded by time and the rate of using up the quota/free blocks to give a bounded error.
See for example:
Even if that is not the case, it would seem a not unreasonable solution to the issue. The change, as I understand it, would be to move from insisting that we reread all the rgrps if our cached values were too old, to rereading only those individual rgrps which we'd not seen for some time.
Also to comment on some of the other messages I've seen on this topic, GL_SKIP does indeed mean to miss out the reading in of whatever object the glock covers. The users of rgrp glocks do not work if GL_SKIP is set, since the code is written based on the assumption that if the rgrp is locked, its content will be uptodate.
I would like to change that assumption for gfs2. It is the main barrier to keeping summary information in LVBs in order to speed up the statfs process.
We also need to resolve some of the other issues. Currently the block allocation algorithm only issues try locks for rgrps based on the assumption that if an rgrp is locked by another node, then it is in use and must be skipped. Consider what will happen to block allocation when we have at least one node in the cluster running the slow version of statfs from time to time... the net result will be that allocations will lose all locality and get scattered all over the disk.
I would like to see a common enough solution to fast statfs and quotas that we can have a similar set of mount options to control them. Having a time limit and a local usage limit seems a good plan in both cases. The old "slow" setting would just mean reducing the time limit to 0.
I do not like the idea of removing the statfs files from the metafs. We will have to continue to update them in order to maintain backwards compatibility for older nodes. We can probably get away with not updating the per_node statfs files, but we will still have to retain them in the metafs at this stage.
Ben, Bob, can we close this one now?
Steve, not all the improvements you mentioned in comment #2 are currently in the code. We have mount option time limits, and a local usage limit, but the allocation issues and the lack of LVBs remain. We can keep this open to track those, or we can just dup this closed.
I'm not too worried about the allocation issues. Using LVBs would be good, so perhaps close this one and open a Fedora rawdide bz for LVBs and statfs?
We can move this to upstream development now we've fixed the main items here I think.
*** This bug has been marked as a duplicate of bug 529796 ***