Red Hat Bugzilla – Bug 221237
GFS2 rename panic using anaconda
Last modified: 2007-11-30 17:07:39 EST
Description of problem:
The stand-alone test program used in bz 215088 apparently doesn't catch
all the possible panic cases with gfs2_rename() running under Anaconda.
After pulling in anaconda fix from bugzilla 220465 (dup of 220021), we're
finally able to resume a full set of gfs2 testing with anaconda. Few more
rename panics are found:
1. Instead of plain file rename (that was fixed in bz 215088), there is
another gfs2_change_nlink() under directory rename that can cause
2. The fix added via 215088 ignores the cases where the files/directories
involved may have different rg groups.
3. There are few other gfs2_change_nlink() hidden layers under other calls
(gfs2_rmdiri() for example) in rename code path that can cause similar
It should be noted that these panics are a result of trying to return the
disk blocks back to rg group inside gfs2_change_nlink() function. More
specifically, this is a locking issues - the gfs2_change_nlink() needs to
get various rg locks that can easily conflict with the existing rename locks.
Version-Release number of selected component (if applicable):
Each time and every time
Steps to Reproduce:
Install RHEL5 partitions on GFS2 filesystem
Keep patching gfs2_rename (that already has more than 10 required locks
involved) is really not a good idea - maybe we should re-examine
gfs2_change_nlink() instead ? As noted above, the whole issue is with
trying to return meta-data blocks back to RG and:
1. By patching rename, we'll need to add more "if"s into the code that
will clutter the logic so much - hard to maintain.
2. We already has a GFS1 bugzilla that complains about rename performance
and GFS1 doesn't even have this returning-to-RG path - performance.
So I'm thinking to chain these inodes into a linked list and then piggy
back the return-to-rg logic (yank it out of current gfs2_change_ulink())
into one of the daemons ?
Steve, comment ?
There are a lot of issues here, but I think you are going in broadly the right
direction. Changing gfs2_change_nlink() shouldn't be a problem I think and I'd
be happy to see the end of the +1, -1 argument in particular (maybe split it
into two functions at the same time?). Most of the complexity is in the -1 case,
so splitting out the +1 case from it would seem sensible at least.
On the other hand, I'm not at all keen on changing the "return to rg" logic.
There are two parts to this in fact: gfs2_change_nlink() should only mark the
inode as being unlinked and nothing more. The real "return to rg" logic only
runs when the inode is finally closed and the blocks are marked free.
You'll also need to take into account the lock ordering between the various RGs
(both those used for allocation and that in which the inode is being unlinked)
to ensure that its deadlock free.
I don't think that moving the work to a daemon is a good idea. Its doesn't make
it take any less time, it just changes the place where the work is being done.
If its slow then I'd first start looking at whether it takes us a long time to
look up rgrps. Ext2/3 stores the block group of the inode in the inode itself so
that its very quick to find the block group on delete, where as we have to
search a (potentially very long) list to try and find the correct rgrp. In fact,
worse than that we apparently do it twice, once in gfs2_change_nlink() in order
to find the right rgrp to lock and again in rgblk_free() called via
gfs2_unlink_di() and at least one of those two lookups should be eliminated. The
other place which might be causing us perfomance problems is the journaling
code, although I think thats much better than it used to be.
Created attachment 144885 [details]
patch to address this issue
Installer ran to completion on x86_64 single CPU test machine this
afternoon with this patch, plus Steve's bz220117 fix. Need to move
to an SMP test box to make sure we don't have new show stopper(s).
Unfortunately, after the installation completed, the first reboot
panics since VFS can't find the root partition (which is a gfs2) -
most likely the new initrd (compiled by installer) doesn't include
gfs2 and its lock modles.
Will continue tomorrow.
The patch looks good. Let me know when you've got something that I can apply to
the upstream kernel.
Devel ACK - waiting on final unit testing on the proposed patch.
Built into 2.6.18-1.3002.el5.
A package has been built which should help the problem described in
this bug report. This report is therefore being closed with a resolution
of CURRENTRELEASE. You may reopen this bug report if the solution does
not work for you.