221237 – GFS2 rename panic using anaconda

Bug 221237 - GFS2 rename panic using anaconda

Summary: GFS2 rename panic using anaconda

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	5.0
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Wendy Cheng
QA Contact:	GFS Bugs
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2007-01-03 05:44 UTC by Wendy Cheng
Modified:	2007-11-30 22:07 UTC (History)
CC List:	6 users (show)
Fixed In Version:	RC
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2007-02-08 02:06:42 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
patch to address this issue (3.23 KB, patch) 2007-01-05 04:45 UTC, Wendy Cheng	no flags	Details \| Diff
View All

Description Wendy Cheng 2007-01-03 05:44:17 UTC

Description of problem:

The stand-alone test program used in bz 215088 apparently doesn't catch 
all the possible panic cases with gfs2_rename() running under Anaconda. 
After pulling in anaconda fix from bugzilla 220465 (dup of 220021), we're
finally able to resume a full set of gfs2 testing with anaconda. Few more 
rename panics are found:

1. Instead of plain file rename (that was fixed in bz 215088), there is 
   another gfs2_change_nlink() under directory rename that can cause 
   similar panic.
2. The fix added via 215088 ignores the cases where the files/directories
   involved may have different rg groups.
3. There are few other gfs2_change_nlink() hidden layers under other calls 
   (gfs2_rmdiri() for example) in rename code path that can cause similar 
   panic.  

It should be noted that these panics are a result of trying to return the
disk blocks back to rg group inside gfs2_change_nlink() function. More
specifically, this is a locking issues - the gfs2_change_nlink() needs to 
get various rg locks that can easily conflict with the existing rename locks.

Version-Release number of selected component (if applicable):
2.6.18-1.2943.el5

How reproducible:
Each time and every time

Steps to Reproduce:
Install RHEL5 partitions on GFS2 filesystem

 
Actual results:
panic

Expected results:
no panic

Additional info:

Comment 1 Wendy Cheng 2007-01-03 06:24:07 UTC

Keep patching gfs2_rename (that already has more than 10 required locks 
involved) is really not a good idea - maybe we should re-examine 
gfs2_change_nlink() instead ? As noted above, the whole issue is with 
trying to return meta-data blocks back to RG and:

1. By patching rename, we'll need to add more "if"s into the code that 
   will clutter the logic so much - hard to maintain.
2. We already has a GFS1 bugzilla that complains about rename performance 
   and GFS1 doesn't even have this returning-to-RG path - performance.

So I'm thinking to chain these inodes into a linked list and then piggy
back the return-to-rg logic (yank it out of current gfs2_change_ulink()) 
into one of the daemons ?

Steve, comment ?

Comment 2 Steve Whitehouse 2007-01-03 09:44:01 UTC

There are a lot of issues here, but I think you are going in broadly the right
direction. Changing gfs2_change_nlink() shouldn't be a problem I think and I'd
be happy to see the end of the +1, -1 argument in particular (maybe split it
into two functions at the same time?). Most of the complexity is in the -1 case,
so splitting out the +1 case from it would seem sensible at least.

On the other hand, I'm not at all keen on changing the "return to rg" logic.
There are two parts to this in fact: gfs2_change_nlink() should only mark the
inode as being unlinked and nothing more. The real "return to rg" logic only
runs when the inode is finally closed and the blocks are marked free.

You'll also need to take into account the lock ordering between the various RGs
(both those used for allocation and that in which the inode is being unlinked)
to ensure that its deadlock free.

I don't think that moving the work to a daemon is a good idea. Its doesn't make
it take any less time, it just changes the place where the work is being done.
If its slow then I'd first start looking at whether it takes us a long time to
look up rgrps. Ext2/3 stores the block group of the inode in the inode itself so
that its very quick to find the block group on delete, where as we have to
search a (potentially very long) list to try and find the correct rgrp. In fact,
worse than that we apparently do it twice, once in gfs2_change_nlink() in order
to find the right rgrp to lock and again in rgblk_free() called via 
gfs2_unlink_di() and at least one of those two lookups should be eliminated. The
other place which might be causing us perfomance problems is the journaling
code, although I think thats much better than it used to be.

Comment 3 Wendy Cheng 2007-01-05 04:45:39 UTC

Created attachment 144885 [details]
patch to address this issue

Installer ran to completion on x86_64 single CPU test machine this 
afternoon with this patch, plus Steve's bz220117 fix. Need to move 
to an SMP test box to make sure we don't have new show stopper(s). 

Unfortunately, after the installation completed, the first reboot 
panics since VFS can't find the root partition (which is a gfs2) - 
most likely the new initrd (compiled by installer) doesn't include
gfs2 and its lock modles. 

Will continue tomorrow.

Comment 5 Steve Whitehouse 2007-01-05 09:05:49 UTC

The patch looks good. Let me know when you've got something that I can apply to
the upstream kernel.

Comment 6 Kiersten (Kerri) Anderson 2007-01-05 21:13:49 UTC

Devel ACK - waiting on final unit testing on the proposed patch.

Comment 11 Jay Turner 2007-01-10 15:51:13 UTC

Built into 2.6.18-1.3002.el5.

Comment 13 Don Zickus 2007-01-10 23:54:44 UTC

in 2.6.18-1.3002.el5

Comment 14 RHEL Program Management 2007-02-08 02:06:42 UTC

A package has been built which should help the problem described in 
this bug report. This report is therefore being closed with a resolution 
of CURRENTRELEASE. You may reopen this bug report if the solution does 
not work for you.

Note You need to log in before you can comment on or make changes to this bug.