Bug 458289
Summary: | GFS2: rm on multiple nodes causes panic | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Nate Straz <nstraz> | ||||||||||
Component: | kernel | Assignee: | Robert Peterson <rpeterso> | ||||||||||
Status: | CLOSED ERRATA | QA Contact: | Martin Jenner <mjenner> | ||||||||||
Severity: | medium | Docs Contact: | |||||||||||
Priority: | medium | ||||||||||||
Version: | 5.3 | CC: | adas, bmarzins, bstevens, edamato, lhh, lwang, rpeterso, swhiteho | ||||||||||
Target Milestone: | rc | ||||||||||||
Target Release: | --- | ||||||||||||
Hardware: | All | ||||||||||||
OS: | Linux | ||||||||||||
Whiteboard: | |||||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||||
Doc Text: | Story Points: | --- | |||||||||||
Clone Of: | Environment: | ||||||||||||
Last Closed: | 2009-01-20 20:19:17 UTC | Type: | --- | ||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||
Documentation: | --- | CRM: | |||||||||||
Verified Versions: | Category: | --- | |||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
Embargoed: | |||||||||||||
Attachments: |
|
Description
Nate Straz
2008-08-07 14:28:33 UTC
We've got at least two problems here, but I've got one solved and the other shouldn't be tough. I'll be posting a patch for the first one shortly. Created attachment 314011 [details]
Patch to to fix the first problem
This patch fixes the first problem, which has multiple symptoms.
The problem was that a glock was being enqueued on the holders list,
then, when the error is discovered (because the other node did the
delete so this node couldn't), the glock was not deallocated off the
list, but when the function exits, the memory for the holder gets
reused for a multitude of purposes. Often, the holder is reused for
the same purpose (another holder) which causes the holders list to
get hosed: the list pointing to itself or the list pointers getting
zeroed out, etc. Much badness.
We need to fix this in RHEL5. This patch allows the code to go
further without all the weird symptoms. However, there is still a
glock hang that occurs when you attempt simultaneous deletes, so this
should not be considered a "complete" solution at this time.
If you apply the patch from comment #2 and try to recreate the problem, you will likely get into a glock hang. The problem should not be too difficult to get sorted out. Basically, there are two nodes, each of which is waiting for the other. These snippets from the glock dumps explain it all: roth-01: G: s:UN n:3/340598 f:l t:EX d:EX/0 l:0 a:0 r:4 H: s:EX f:W e:0 p:2934 [rm] gfs2_unlink+0xa4/0x199 [gfs2] ffff81004fa13de8 G: s:EX n:2/3449f5 f:D t:EX d:UN/204737000 l:0 a:0 r:4 H: s:EX f:H e:0 p:2934 [rm] gfs2_unlink+0x64/0x199 [gfs2] ffff81004fa13d78 I: n:3291/3426805 t:4 f:0x00000010 roth-03: G: s:EX n:3/340598 f:Dy t:EX d:UN/204130000 l:0 a:0 r:4 H: s:EX f:H e:0 p:3049 [rm] gfs2_rmdir+0x93/0x182 [gfs2] ffff81006b3c1df8 R: n:3409304 G: s:UN n:2/3449f5 f:l t:EX d:EX/0 l:0 a:0 r:4 H: s:EX f:W e:0 p:3049 [rm] gfs2_rmdir+0x6f/0x182 [gfs2] ffff81006b3c1dc0 So apparently gfs2_unlink grabs a inode glock and waits for a rgrp glock, but gfs2_rmdir apparently grabs that rgrp glock and wait for that inode glock. I would hope it would be easy to fix with a lock ordering switch, but I haven't dug through it. Requesting flags for inclusion in RHEL5.3. We absolutely need to do this fix. I think the problem is that gfs2_rmdir uses nq_m_sync, which sorts the holders by address, but gfs2_unlink doesn't do that sorting; it does dip followed by ip, followed by rgrpd. So I guess I'll let Steve decide which it should be. We either need to make gfs2_unlink use gfs2_glock_nq_m or rmdir do them individually. Since there are a bunch of other places that use gfs2_glock_nq_m, my personal preference is to make gfs2_unlink use gfs2_glock_nq_m as well. Perhaps I'll give it a try; it would make my previous patch obsolete and make the code less messy as well. If it works, I'll attach another patch. Created attachment 314040 [details] Patch to fix both problems This patch fixes it the "right" way. By using gfs2_glock_nq_m rather than individually, it avoids deadlocks and avoids the problem described in comment #2. I'll submit this to cluster-devel for upstream. I submitted by revised patch to cluster-devel for scrutiny. If Steve likes it and pushes it upstream, I can submit it for inclusion in the RHEL5 kernel. One more note before I forget: I noticed that function gfs2_glock_dq_m does this: for (x = 0; x < num_gh; x++) gfs2_glock_dq(&ghs[x]); It seems like we could get into trouble here. If the original sort order of the locks is different, can't we get into a situation where it does: (1) lock(a) (2) lock(b); ... (3) unlock(a); (4) unlock(b); And doesn't that break locking? Can't someone sneak in and grab a glock for (b) then wait for (a) between steps 3 and 4? If so, we have many other potential hangs. It just seems like if gfs2_glock_nq_m sorts the locks, then gfs2_glock_dq_m should sort them in the same fashion and unlock them in reverse order, otherwise we're exposed. I need to speak with Steve about this. Please also fix gfs2_link which appears to have the same problem. gfs2_glock_nq_m doesn't know all the information required to sort locks in the correct order. The ordering is explained in the Documentation/filesystems/gfs2-glock.txt file. I'd like to see as little use of gfs2_glock_nq_m as possible. Ideally it would only ever be used for rgrps but I guess we still need it in some cases in rename for inodes. Also, regarding comment #7, the unlock order makes no difference. *** Bug 458303 has been marked as a duplicate of this bug. *** Created attachment 314124 [details]
Revised patch
This RHEL5 patch gets rid of the gfs2_glock_nq_m calls in favor of
the more favored individual calls. There are potential hangs in all
the code that calls gfs2_glock_nq_m except for rgrps, so I've now
changed link, unlink, rmdir and rename. This version has gotten more
testing than the last one did, and I couldn't confuse it, hang it or
cause it to panic.
Created attachment 314125 [details]
Revised patch (upstream version)
This is an upstream version of the revised patch. The revised patch
doesn't apply cleanly to the upstream GFS2, so I made this one.
RHEL5 patch was tested on the roth-0{1,3} cluster, and accepted into the upstream NMW git tree. I posted it to rhkernel-list too. Reassigning to Don Zickus for inclusion in the RHEL5.3 kernel. *** Bug 459843 has been marked as a duplicate of this bug. *** in kernel-2.6.18-107.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5 An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2009-0225.html |