Bug 432108
| Summary: | GFS2: Two gfs2 nodes in four node cluster hung in ls. | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 5 | Reporter: | Dean Jansa <djansa> |
| Component: | kernel | Assignee: | Steve Whitehouse <swhiteho> |
| Status: | CLOSED DUPLICATE | QA Contact: | GFS Bugs <gfs-bugs> |
| Severity: | low | Docs Contact: | |
| Priority: | low | ||
| Version: | 5.2 | CC: | cluster-maint, edamato, rpeterso |
| Target Milestone: | rc | ||
| Target Release: | --- | ||
| Hardware: | ia64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2008-07-09 15:44:07 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
Here is what I'm seeing: link-13 and link-14 look like they're both
waiting for glocks that they already hold. They both have "trylock"
holders for the file /mnt/gfs2/writev-read. Here are the pertinent
lockdump sections from all nodes:
link-13:------------------------------------------------------------
Glock 0xe000000002585b48 (2, 0x220efe3)
gl_flags = 1
gl_ref = 6
gl_state = 0
gl_owner = pid 6112 (d_doio)
gl_ip = 11529215054675295824
req_gh = yes
lvb_count = 0
object = yes
le = no
reclaim = no
aspace = 0xe000000003dc8180 nrpages = 0
ail = 0
Request
owner = 6112 (d_doio)
gh_state = 3
gh_flags = 1 9
error = 0
gh_iflags = 1 10
initialized at: gfs2_readpage+0xc0/0x280 [gfs2]
Waiter3
owner = 6112 (d_doio)
gh_state = 3
gh_flags = 1 9
error = 0
gh_iflags = 1 10
initialized at: gfs2_readpage+0xc0/0x280 [gfs2]
Waiter3
owner = 6133 (ls)
gh_state = 3
gh_flags = 3
error = 0
gh_iflags = 1 10
initialized at: gfs2_getattr+0x150/0x240 [gfs2]
Inode: busy
link-14:------------------------------------------------------------
Glock 0xe00000002ca17b98 (2, 0x220efe3)
gl_flags = 1
gl_ref = 6
gl_state = 0
gl_owner = pid 5929 (d_doio)
gl_ip = 11529215054675295824
req_gh = yes
lvb_count = 0
object = yes
le = no
reclaim = no
aspace = 0xe00000002ca26da0 nrpages = 0
ail = 0
Request
owner = 5929 (d_doio)
gh_state = 3
gh_flags = 1 9
error = 0
gh_iflags = 1 10
initialized at: gfs2_readpage+0xc0/0x280 [gfs2]
Waiter3
owner = 5929 (d_doio)
gh_state = 3
gh_flags = 1 9
error = 0
gh_iflags = 1 10
initialized at: gfs2_readpage+0xc0/0x280 [gfs2]
Waiter3
owner = 5949 (ls)
gh_state = 3
gh_flags = 3
error = 0
gh_iflags = 1 10
initialized at: gfs2_getattr+0x150/0x240 [gfs2]
Inode: busy
link-15:------------------------------------------------------------
Glock 0xe000000028ffd858 (2, 0x220efe3)
gl_flags = (unlocked)
gl_ref = 3
gl_state = 3
gl_owner = 5840 (ended)
gl_ip = 11529215054675263056
req_gh = no
lvb_count = 0
object = yes
le = no
reclaim = no
aspace = 0xe00000002814c430 nrpages = 1
ail = 0
Inode:
num = 25/35712995
type = 8
i_flags =
link-16:------------------------------------------------------------
Glock 0xe0000040657e19d0 (2, 0x220efe3)
gl_flags = (unlocked)
gl_ref = 3
gl_state = 3
gl_owner = 7416 (ended)
gl_ip = 11529215054678753552
req_gh = no
req_bh = no
lvb_count = 0
object = yes
le = no
reclaim = no
aspace = 0xe000004063480448 nrpages = 1
ail = 0
Inode:
num = 25/35712995
type = 8
i_flags =
Okay, Steve, here is my theory. Does this make sense?
The glock is in the unlocked state, but it has a holder. So it was
once locked and now isn't. However, the holder didn't get taken off
the owner's list (gl_req_gh). This is hard to explain, but my theory
is that this sequence of events might be occurring:
1. The glock is locked, so the holder is put on the gl_req_gh list.
2. Function drop_bh is called to drop the holder.
3. drop_bh initializes "*gh = gl->gl_req_gh;" without the gl_spin lock.
4. drop_bh calls state_change
5. drop_bh calls go_inval
6. In the mean time, a holder gets put onto the gl_req_bh list.
7. go_inval finishes and returns back to drop_bh
8. Now we've got a holder, but gh is still NULL, so it never deletes
the holder off the list.
In other words, shouldn't function drop_bh() protect its setting of
gh = gl->gl_req_gh with the gl_spin held, and perhaps only do it
after the return from go_inval()? Perhaps drop_bh should look more
like this:
struct gfs2_holder *gh;
...
state_change(gl, LM_ST_UNLOCKED);
gl->gl_req_gh;
if (glops->go_inval)
glops->go_inval(gl, DIO_METADATA);
spin_lock(&gl->gl_spin);
gh = gl->gl_req_gh;
if (gh) {
list_del_init(&gh->gh_list);
gh->gh_error = 0;
}
spin_unlock(&gl->gl_spin);
Since there's a spin_lock immediately after, we can eliminate the
unlock/relock from the code too.
I don't know if this theory holds any water...
Function xmote_bh does the same kind of thing. gl_req_gh isn't a list, its a pointer to the lock request which caused the current state change and is only valid during a demote or promote which involved some remote operation. These are potentially duplicates of bz 432370. Marking NEEDINFO to have QE try to recreate. This looks exactly like #432057, so unless this can still be recreated then I'll mark it as a dup of that bug. |
Description of problem: Four node ia64 cluster, after starting a test run I noted two of the nodes were not making any progress. I attempted to run ls in the working directory to see if the test files were created and two of the four nodes hung in the ls. Version-Release number of selected component (if applicable): link-13,14,15 running 2.6.18-79.el5 link-16 running 2.6.18-76.el5 How reproducible: Unsure Steps to Reproduce: mount fs, create file, ls on all nodes Actual results: two nodes (each running -79) hung in ls link-13: ls D a0000002010170c0 0 6133 6084 (NOTLB) Call Trace: [<a00000010062d170>] schedule+0x1db0/0x20a0 sp=e0000000168bfc20 bsp=e0000000168b9378 [<a0000002010170c0>] just_schedule+0x20/0x40 [gfs2] sp=e0000000168bfcb0 bsp=e0000000168b9360 [<a00000010062ebf0>] __wait_on_bit+0xd0/0x180 sp=e0000000168bfcb0 bsp=e0000000168b9310 [<a00000010062ed60>] out_of_line_wait_on_bit+0xc0/0xe0 sp=e0000000168bfcb0 bsp=e0000000168b92c8 [<a000000201017060>] wait_on_holder+0x60/0xa0 [gfs2] sp=e0000000168bfcf0 bsp=e0000000168b92a8 [<a00000020101ac30>] glock_wait_internal+0x350/0x700 [gfs2] sp=e0000000168bfcf0 bsp=e0000000168b9260 [<a00000020101b580>] gfs2_glock_nq+0x5a0/0x640 [gfs2] sp=e0000000168bfcf0 bsp=e0000000168b9210 [<a0000002010399a0>] gfs2_getattr+0x160/0x240 [gfs2] sp=e0000000168bfcf0 bsp=e0000000168b91d0 [<a00000010017bfe0>] vfs_getattr+0x100/0x220 sp=e0000000168bfd30 bsp=e0000000168b9198 [<a00000010017c170>] vfs_lstat_fd+0x70/0xa0 sp=e0000000168bfd30 bsp=e0000000168b9168 [<a00000010017c6f0>] sys_newlstat+0x30/0x80 sp=e0000000168bfdc0 bsp=e0000000168b9108 [<a00000010000bdb0>] __ia64_trace_syscall+0xd0/0x110 sp=e0000000168bfe30 bsp=e0000000168b9108 [<a000000000010620>] __start_ivt_text+0xffffffff00010620/0x400 sp=e0000000168c0000 bsp=e0000000168b9108 link-14: ls D a0000002010170c0 0 5949 5902 (NOTLB) Call Trace: [<a00000010062d170>] schedule+0x1db0/0x20a0 sp=e00000002caefc20 bsp=e00000002cae9378 [<a0000002010170c0>] just_schedule+0x20/0x40 [gfs2] sp=e00000002caefcb0 bsp=e00000002cae9360 [<a00000010062ebf0>] __wait_on_bit+0xd0/0x180 sp=e00000002caefcb0 bsp=e00000002cae9310 [<a00000010062ed60>] out_of_line_wait_on_bit+0xc0/0xe0 sp=e00000002caefcb0 bsp=e00000002cae92c8 [<a000000201017060>] wait_on_holder+0x60/0xa0 [gfs2] sp=e00000002caefcf0 bsp=e00000002cae92a8 [<a00000020101ac30>] glock_wait_internal+0x350/0x700 [gfs2] sp=e00000002caefcf0 bsp=e00000002cae9260 [<a00000020101b580>] gfs2_glock_nq+0x5a0/0x640 [gfs2] sp=e00000002caefcf0 bsp=e00000002cae9210 [<a0000002010399a0>] gfs2_getattr+0x160/0x240 [gfs2] sp=e00000002caefcf0 bsp=e00000002cae91d0 [<a00000010017bfe0>] vfs_getattr+0x100/0x220 sp=e00000002caefd30 bsp=e00000002cae9198 [<a00000010017c170>] vfs_lstat_fd+0x70/0xa0 sp=e00000002caefd30 bsp=e00000002cae9168 [<a00000010017c6f0>] sys_newlstat+0x30/0x80 sp=e00000002caefdc0 bsp=e00000002cae9108 [<a00000010000bdb0>] __ia64_trace_syscall+0xd0/0x110 sp=e00000002caefe30 bsp=e00000002cae9108 [<a000000000010620>] __start_ivt_text+0xffffffff00010620/0x400 sp=e00000002caf0000 bsp=e00000002cae9108