Bug 455940
Summary: | GFS2: hang running consistency test. | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Ben Marzinski <bmarzins> | ||||||||||||||||
Component: | kernel | Assignee: | Steve Whitehouse <swhiteho> | ||||||||||||||||
Status: | CLOSED DUPLICATE | QA Contact: | Cluster QE <mspqa-list> | ||||||||||||||||
Severity: | low | Docs Contact: | |||||||||||||||||
Priority: | high | ||||||||||||||||||
Version: | 5.2 | CC: | adas, cluster-maint, edamato, rpeterso | ||||||||||||||||
Target Milestone: | rc | ||||||||||||||||||
Target Release: | --- | ||||||||||||||||||
Hardware: | All | ||||||||||||||||||
OS: | Linux | ||||||||||||||||||
Whiteboard: | |||||||||||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||||||||||
Doc Text: | Story Points: | --- | |||||||||||||||||
Clone Of: | Environment: | ||||||||||||||||||
Last Closed: | 2008-09-17 14:57:44 UTC | Type: | --- | ||||||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||||||
Documentation: | --- | CRM: | |||||||||||||||||
Verified Versions: | Category: | --- | |||||||||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||||||
Embargoed: | |||||||||||||||||||
Attachments: |
|
Description
Ben Marzinski
2008-07-18 21:58:04 UTC
Looking at the lock dumps for this, in a case where marathon-01, marathon-03, and marathon-05 hung out of all five nodes, I see the following. On marathon-01: ------------------------------- G: s:UN n:2/a6ac f:l t:EX d:UN/0 l:0 a:0 r:4 H: s:EX f:aW e:0 p:3635 [d_doio] gfs2_prepare_write+0xc7/0x360 [gfs2] Resource ffff81006d463480 Name (len=24) " 2 a6ac" Master Copy Granted Queue 01ce0001 PR Remote: 4 01ad0001 Conversion Queue 00660001 NL (EX) Remote: 5 01b70001 021c0002 NL (EX) Remote: 3 03340001 02a10001 NL (EX) Waiting Queue On marathon-03: ---------------------------- G: s:UN n:2/a6ac f:l t:EX d:UN/0 l:0 a:0 r:8 H: s:EX f:aW e:0 p:3445 [d_doio] gfs2_prepare_write+0xc7/0x360 [gfs2] H: s:SH f:aW e:0 p:3444 [d_doio] gfs2_readpage+0x6a/0x18a [gfs2] H: s:SH f:aW e:0 p:3447 [d_doio] gfs2_readpage+0x6a/0x18a [gfs2] H: s:SH f:aW e:0 p:3443 [d_doio] gfs2_readpage+0x6a/0x18a [gfs2] H: s:SH f:aW e:0 p:3446 [d_doio] gfs2_readpage+0x6a/0x18a [gfs2] Resource ffff81006b8dc280 Name (len=24) " 2 a6ac" Local Copy, Master is node 1 Granted Queue Conversion Queue 03340001 NL (EX) Master: 021c0002 Waiting Queue On marathon-05: ---------------------------- G: s:UN n:2/a6ac f:l t:EX d:UN/0 l:0 a:0 r:8 H: s:EX f:aW e:0 p:3426 [d_doio] gfs2_prepare_write+0xc7/0x360 [gfs2] H: s:SH f:aW e:0 p:3422 [d_doio] gfs2_readpage+0x6a/0x18a [gfs2] H: s:SH f:aW e:0 p:3425 [d_doio] gfs2_readpage+0x6a/0x18a [gfs2] H: s:SH f:aW e:0 p:3423 [d_doio] gfs2_readpage+0x6a/0x18a [gfs2] H: s:SH f:aW e:0 p:3424 [d_doio] gfs2_readpage+0x6a/0x18a [gfs2] Resource ffff81006cba35c0 Name (len=24) " 2 a6ac" Local Copy, Master is node 1 Granted Queue Conversion Queue 01b70001 NL (EX) Master: 00660001 Waiting Queue On marathon-04: ----------------------------- G: s:SH n:2/a6ac f: t:SH d:EX/0 l:0 a:0 r:3 I: n:1048619/42668 t:8 f:0x00000010 Resource ffff81006c391680 Name (len=24) " 2 a6ac" Local Copy, Master is node 1 Granted Queue 01ad0001 PR Master: 01ce0001 Conversion Queue Waiting Queue marathon-04 is has the lock in the shared state, but there's nothing running on marathon-04 to keep the lock open. All the tests have completed on it. /usr/tests/sts-rhel5.2/gfs/bin/coherency -h Usage: /usr/tests/sts-rhel5.2/gfs/bin/coherency [options] -i <iterations> Run each test case for n iterations. (Default is 120 seconds, 0 runs forever) -j Jdata is use, do not do mmap. -l <STS_LROOT> If not given, use what is in env -m <mountpoint> Start IO on fs mounted on mountpoint. -r <STS_RROOT> If not given, use what is in env -s <random seed> Seed tests with this seed (Default is PID) -u <user> Run IO as user. (Default is root) -w <N> Run N test case at once. (Default is 1) -O <result dir> Directory where results and temporary files will be placed. -R <resource_file> Get the list of nodes to use from resource_file. -W <N> Run N processes on each node. (Default is 2) I ran with all the defaults on the marathon cluster. It doesn't hang in the same test case each time it hangs. I've seen it stuck in pwrite/pread and write/mmread cases the last two times. (That would be pwrite writing the data, pread verifying the data, and write writing with a mmread verifying). So, I can't give you a single test case which always hangs. FWIW -- I hit this hang again over the weekend. The test case was sync-mmwrite-read - Two nodes hung: marathon-03 and marathon marathon-04. 27 hours later the OOM killer got both of them, so I'm afraid there is nothing of use aside from the fact they were leaking memory somehow. I'll restart. This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. Created attachment 312277 [details]
Patch which might fix the hang
After this patch, there will be only one place in the code which resets the
GLF_DEMOTE flag. The plan is to try and catch any code paths which might
otherwise reset it. I believe that due to recent changes the case where
demote_state == state in the glock workqueue function should not happen. On
that basis I've put a GLOCK_BUG_ON() in there to be sure of that.
Also when we have GLF_DEMOTE_PENDING set, we should not try and promote a lock,
as that might lead to deadlocks and lost wakeups, so we add an extra test for
that as well.
Longer term, we need better debugging in this area, as its been a source of a
number of problems in the past. Not sure what the best way to tackle it is at
the moment.
Created attachment 312369 [details]
A second attempt at a test patch.
Created attachment 312492 [details]
A third attempt at a test patch
Created attachment 312577 [details]
Another attempt at trying to work out whats going on
So this patch changes the demote flags. We get rid of GLF_DEMOTE and
GLF_PENDING_DEMOTE and replace them with GLF_LOCAL_DEMOTE and GLF_REMOTE_DEMOTE
in order to try and make the logic a bit clearer. Also since GLF_LOCAL_DEMOTE
is always to LM_ST_UNLOCKED, we can keep gl_demote_state for use only with
remote demotes.
This should make the glock dumps more meaningful.... or thats what I'm hoping
anyway.
Created attachment 312596 [details]
Updated patch
Created attachment 312646 [details]
Back to first principals
So we know that when the hang occurs, we have "lost" a demote request
somewhere. Assuming that the request was actually delivered, then there are
only two places where this might happen. This patch adds a flag which is set in
one of those places and reset in the other of those places. That way we'll know
which of them was passed most recently when we see the hang.
Going back to comment #1, the lock which is causing the problem is being cached in the SH state. Lets assume that due to the lack of demote flags, that the last operation was a demote (since thats the most likely way that the flags got canceled). There are several possible paths that could have been taken: EX -> SH (demote) DF -> SH (demote) - v. unlikely as no waiting DF holders DF -> UN -> SH (demote) - also v. unlikely due to above reason So EX -> SH is the most likely previous operation for the glock in question. If we assume that the previous operation was a promote, then we have the following possible paths: UN -> SH (promote) DF -> SH (promote) If the 'T' flag shows up when we run the patch from comment #10, then we can reduce the possible previous state changes down to just one of those sets. So the other thing I'm thinking of doing is adding a prev_state variable to each glock so that we can see what that was. That should be just enough to tell us which of the possible state changes is happening to our glock. I've already been through the code for all of the above, this morning and I've not spotted anything thats obviously wrong, so far. Created attachment 312657 [details]
Updated test patch
This one will record the previous lock state as well as the current state.
Looks like a possible dup of #458684. Cannot reproduce at the moment. Will look at this again when we have a fix for #458684. *** This bug has been marked as a duplicate of bug 458684 *** |