Description of problem: The stand-alone test program used in bz 215088 apparently doesn't catch all the possible panic cases with gfs2_rename() running under Anaconda. After pulling in anaconda fix from bugzilla 220465 (dup of 220021), we're finally able to resume a full set of gfs2 testing with anaconda. Few more rename panics are found: 1. Instead of plain file rename (that was fixed in bz 215088), there is another gfs2_change_nlink() under directory rename that can cause similar panic. 2. The fix added via 215088 ignores the cases where the files/directories involved may have different rg groups. 3. There are few other gfs2_change_nlink() hidden layers under other calls (gfs2_rmdiri() for example) in rename code path that can cause similar panic. It should be noted that these panics are a result of trying to return the disk blocks back to rg group inside gfs2_change_nlink() function. More specifically, this is a locking issues - the gfs2_change_nlink() needs to get various rg locks that can easily conflict with the existing rename locks. Version-Release number of selected component (if applicable): 2.6.18-1.2943.el5 How reproducible: Each time and every time Steps to Reproduce: Install RHEL5 partitions on GFS2 filesystem Actual results: panic Expected results: no panic Additional info:
Keep patching gfs2_rename (that already has more than 10 required locks involved) is really not a good idea - maybe we should re-examine gfs2_change_nlink() instead ? As noted above, the whole issue is with trying to return meta-data blocks back to RG and: 1. By patching rename, we'll need to add more "if"s into the code that will clutter the logic so much - hard to maintain. 2. We already has a GFS1 bugzilla that complains about rename performance and GFS1 doesn't even have this returning-to-RG path - performance. So I'm thinking to chain these inodes into a linked list and then piggy back the return-to-rg logic (yank it out of current gfs2_change_ulink()) into one of the daemons ? Steve, comment ?
There are a lot of issues here, but I think you are going in broadly the right direction. Changing gfs2_change_nlink() shouldn't be a problem I think and I'd be happy to see the end of the +1, -1 argument in particular (maybe split it into two functions at the same time?). Most of the complexity is in the -1 case, so splitting out the +1 case from it would seem sensible at least. On the other hand, I'm not at all keen on changing the "return to rg" logic. There are two parts to this in fact: gfs2_change_nlink() should only mark the inode as being unlinked and nothing more. The real "return to rg" logic only runs when the inode is finally closed and the blocks are marked free. You'll also need to take into account the lock ordering between the various RGs (both those used for allocation and that in which the inode is being unlinked) to ensure that its deadlock free. I don't think that moving the work to a daemon is a good idea. Its doesn't make it take any less time, it just changes the place where the work is being done. If its slow then I'd first start looking at whether it takes us a long time to look up rgrps. Ext2/3 stores the block group of the inode in the inode itself so that its very quick to find the block group on delete, where as we have to search a (potentially very long) list to try and find the correct rgrp. In fact, worse than that we apparently do it twice, once in gfs2_change_nlink() in order to find the right rgrp to lock and again in rgblk_free() called via gfs2_unlink_di() and at least one of those two lookups should be eliminated. The other place which might be causing us perfomance problems is the journaling code, although I think thats much better than it used to be.
Created attachment 144885 [details] patch to address this issue Installer ran to completion on x86_64 single CPU test machine this afternoon with this patch, plus Steve's bz220117 fix. Need to move to an SMP test box to make sure we don't have new show stopper(s). Unfortunately, after the installation completed, the first reboot panics since VFS can't find the root partition (which is a gfs2) - most likely the new initrd (compiled by installer) doesn't include gfs2 and its lock modles. Will continue tomorrow.
The patch looks good. Let me know when you've got something that I can apply to the upstream kernel.
Devel ACK - waiting on final unit testing on the proposed patch.
Built into 2.6.18-1.3002.el5.
in 2.6.18-1.3002.el5
A package has been built which should help the problem described in this bug report. This report is therefore being closed with a resolution of CURRENTRELEASE. You may reopen this bug report if the solution does not work for you.