Bug 432108 - GFS2: Two gfs2 nodes in four node cluster hung in ls.
GFS2: Two gfs2 nodes in four node cluster hung in ls.
Status: CLOSED DUPLICATE of bug 432057
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel (Show other bugs)
5.2
ia64 Linux
low Severity low
: rc
: ---
Assigned To: Steve Whitehouse
GFS Bugs
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2008-02-08 16:14 EST by Dean Jansa
Modified: 2009-05-27 23:39 EDT (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2008-07-09 11:44:07 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Dean Jansa 2008-02-08 16:14:20 EST
Description of problem:

Four node ia64 cluster, after starting a test run I noted two of the nodes were
not making any progress.  I attempted to run ls in the working directory to see
if the test files were created and two of the four nodes hung in the ls.


Version-Release number of selected component (if applicable):
link-13,14,15 running 2.6.18-79.el5
link-16 running 2.6.18-76.el5



How reproducible:

Unsure


Steps to Reproduce:
mount fs, create file, ls on all nodes

  
Actual results:

two nodes (each running -79) hung in ls

link-13:

ls            D a0000002010170c0     0  6133   6084                     (NOTLB)

Call Trace:
 [<a00000010062d170>] schedule+0x1db0/0x20a0
                                sp=e0000000168bfc20 bsp=e0000000168b9378
 [<a0000002010170c0>] just_schedule+0x20/0x40 [gfs2]
                                sp=e0000000168bfcb0 bsp=e0000000168b9360
 [<a00000010062ebf0>] __wait_on_bit+0xd0/0x180
                                sp=e0000000168bfcb0 bsp=e0000000168b9310
 [<a00000010062ed60>] out_of_line_wait_on_bit+0xc0/0xe0
                                sp=e0000000168bfcb0 bsp=e0000000168b92c8
 [<a000000201017060>] wait_on_holder+0x60/0xa0 [gfs2]
                                sp=e0000000168bfcf0 bsp=e0000000168b92a8
 [<a00000020101ac30>] glock_wait_internal+0x350/0x700 [gfs2]
                                sp=e0000000168bfcf0 bsp=e0000000168b9260
 [<a00000020101b580>] gfs2_glock_nq+0x5a0/0x640 [gfs2]
                                sp=e0000000168bfcf0 bsp=e0000000168b9210
 [<a0000002010399a0>] gfs2_getattr+0x160/0x240 [gfs2]
                                sp=e0000000168bfcf0 bsp=e0000000168b91d0
 [<a00000010017bfe0>] vfs_getattr+0x100/0x220
                                sp=e0000000168bfd30 bsp=e0000000168b9198
 [<a00000010017c170>] vfs_lstat_fd+0x70/0xa0
                                sp=e0000000168bfd30 bsp=e0000000168b9168
 [<a00000010017c6f0>] sys_newlstat+0x30/0x80
                                sp=e0000000168bfdc0 bsp=e0000000168b9108
 [<a00000010000bdb0>] __ia64_trace_syscall+0xd0/0x110
                                sp=e0000000168bfe30 bsp=e0000000168b9108
 [<a000000000010620>] __start_ivt_text+0xffffffff00010620/0x400
                                sp=e0000000168c0000 bsp=e0000000168b9108



link-14:
ls            D a0000002010170c0     0  5949   5902                     (NOTLB)

Call Trace:
 [<a00000010062d170>] schedule+0x1db0/0x20a0
                                sp=e00000002caefc20 bsp=e00000002cae9378
 [<a0000002010170c0>] just_schedule+0x20/0x40 [gfs2]
                                sp=e00000002caefcb0 bsp=e00000002cae9360
 [<a00000010062ebf0>] __wait_on_bit+0xd0/0x180
                                sp=e00000002caefcb0 bsp=e00000002cae9310
 [<a00000010062ed60>] out_of_line_wait_on_bit+0xc0/0xe0
                                sp=e00000002caefcb0 bsp=e00000002cae92c8
 [<a000000201017060>] wait_on_holder+0x60/0xa0 [gfs2]
                                sp=e00000002caefcf0 bsp=e00000002cae92a8
 [<a00000020101ac30>] glock_wait_internal+0x350/0x700 [gfs2]
                                sp=e00000002caefcf0 bsp=e00000002cae9260
 [<a00000020101b580>] gfs2_glock_nq+0x5a0/0x640 [gfs2]
                                sp=e00000002caefcf0 bsp=e00000002cae9210
 [<a0000002010399a0>] gfs2_getattr+0x160/0x240 [gfs2]
                                sp=e00000002caefcf0 bsp=e00000002cae91d0
 [<a00000010017bfe0>] vfs_getattr+0x100/0x220
                                sp=e00000002caefd30 bsp=e00000002cae9198
 [<a00000010017c170>] vfs_lstat_fd+0x70/0xa0
                                sp=e00000002caefd30 bsp=e00000002cae9168
 [<a00000010017c6f0>] sys_newlstat+0x30/0x80
                                sp=e00000002caefdc0 bsp=e00000002cae9108
 [<a00000010000bdb0>] __ia64_trace_syscall+0xd0/0x110
                                sp=e00000002caefe30 bsp=e00000002cae9108
 [<a000000000010620>] __start_ivt_text+0xffffffff00010620/0x400
                                sp=e00000002caf0000 bsp=e00000002cae9108
Comment 1 Robert Peterson 2008-02-08 16:19:11 EST
Here is what I'm seeing:  link-13 and link-14 look like they're both
waiting for glocks that they already hold.  They both have "trylock"
holders for the file /mnt/gfs2/writev-read.  Here are the pertinent
lockdump sections from all nodes:

link-13:------------------------------------------------------------
Glock 0xe000000002585b48 (2, 0x220efe3)
  gl_flags = 1 
  gl_ref = 6
  gl_state = 0
  gl_owner = pid 6112 (d_doio)
  gl_ip = 11529215054675295824
  req_gh = yes
  lvb_count = 0
  object = yes
  le = no
  reclaim = no
  aspace = 0xe000000003dc8180 nrpages = 0
  ail = 0
  Request
    owner = 6112 (d_doio)
    gh_state = 3
    gh_flags = 1 9 
    error = 0
    gh_iflags = 1 10 
    initialized at: gfs2_readpage+0xc0/0x280 [gfs2]
  Waiter3
    owner = 6112 (d_doio)
    gh_state = 3
    gh_flags = 1 9 
    error = 0
    gh_iflags = 1 10 
    initialized at: gfs2_readpage+0xc0/0x280 [gfs2]
  Waiter3
    owner = 6133 (ls)
    gh_state = 3
    gh_flags = 3 
    error = 0
    gh_iflags = 1 10 
    initialized at: gfs2_getattr+0x150/0x240 [gfs2]
  Inode: busy

link-14:------------------------------------------------------------
Glock 0xe00000002ca17b98 (2, 0x220efe3)
  gl_flags = 1 
  gl_ref = 6
  gl_state = 0
  gl_owner = pid 5929 (d_doio)
  gl_ip = 11529215054675295824
  req_gh = yes
  lvb_count = 0
  object = yes
  le = no
  reclaim = no
  aspace = 0xe00000002ca26da0 nrpages = 0
  ail = 0
  Request
    owner = 5929 (d_doio)
    gh_state = 3
    gh_flags = 1 9 
    error = 0
    gh_iflags = 1 10 
    initialized at: gfs2_readpage+0xc0/0x280 [gfs2]
  Waiter3
    owner = 5929 (d_doio)
    gh_state = 3
    gh_flags = 1 9 
    error = 0
    gh_iflags = 1 10 
    initialized at: gfs2_readpage+0xc0/0x280 [gfs2]
  Waiter3
    owner = 5949 (ls)
    gh_state = 3
    gh_flags = 3 
    error = 0
    gh_iflags = 1 10 
    initialized at: gfs2_getattr+0x150/0x240 [gfs2]
  Inode: busy

link-15:------------------------------------------------------------
Glock 0xe000000028ffd858 (2, 0x220efe3)
  gl_flags = (unlocked) 
  gl_ref = 3
  gl_state = 3
  gl_owner = 5840 (ended)
  gl_ip = 11529215054675263056
  req_gh = no
  lvb_count = 0
  object = yes
  le = no
  reclaim = no
  aspace = 0xe00000002814c430 nrpages = 1
  ail = 0
  Inode:
    num = 25/35712995
    type = 8
    i_flags = 

link-16:------------------------------------------------------------
Glock 0xe0000040657e19d0 (2, 0x220efe3)
  gl_flags = (unlocked) 
  gl_ref = 3
  gl_state = 3
  gl_owner = 7416 (ended)
  gl_ip = 11529215054678753552
  req_gh = no
  req_bh = no
  lvb_count = 0
  object = yes
  le = no
  reclaim = no
  aspace = 0xe000004063480448 nrpages = 1
  ail = 0
  Inode:
    num = 25/35712995
    type = 8
    i_flags = 

Comment 2 Robert Peterson 2008-02-08 17:34:08 EST
Okay, Steve, here is my theory.  Does this make sense?

The glock is in the unlocked state, but it has a holder.  So it was
once locked and now isn't.  However, the holder didn't get taken off
the owner's list (gl_req_gh).  This is hard to explain, but my theory
is that this sequence of events might be occurring:

1. The glock is locked, so the holder is put on the gl_req_gh list.
2. Function drop_bh is called to drop the holder.
3. drop_bh initializes "*gh = gl->gl_req_gh;" without the gl_spin lock.
4. drop_bh calls state_change
5. drop_bh calls go_inval
6. In the mean time, a holder gets put onto the gl_req_bh list.
7. go_inval finishes and returns back to drop_bh
8. Now we've got a holder, but gh is still NULL, so it never deletes
   the holder off the list.

In other words, shouldn't function drop_bh() protect its setting of
gh = gl->gl_req_gh with the gl_spin held, and perhaps only do it
after the return from go_inval()?  Perhaps drop_bh should look more
like this:

	struct gfs2_holder *gh;
...
	state_change(gl, LM_ST_UNLOCKED);
	gl->gl_req_gh;

	if (glops->go_inval)
		glops->go_inval(gl, DIO_METADATA);
	spin_lock(&gl->gl_spin);
	gh = gl->gl_req_gh;
	if (gh) {
		list_del_init(&gh->gh_list);
		gh->gh_error = 0;
	}
	spin_unlock(&gl->gl_spin);
Since there's a spin_lock immediately after, we can eliminate the
unlock/relock from the code too.

I don't know if this theory holds any water...
Comment 3 Robert Peterson 2008-02-08 17:40:58 EST
Function xmote_bh does the same kind of thing.
Comment 4 Steve Whitehouse 2008-02-11 11:44:37 EST
gl_req_gh isn't a list, its a pointer to the lock request which caused the
current state change and is only valid during a demote or promote which involved
some remote operation.
Comment 5 Kiersten (Kerri) Anderson 2008-02-13 12:08:52 EST
These are potentially duplicates of bz 432370.  Marking NEEDINFO to have QE try
to recreate.
Comment 6 Steve Whitehouse 2008-06-02 06:32:20 EDT
This looks exactly like #432057, so unless this can still be recreated then I'll
mark it as a dup of that bug.
Comment 8 Steve Whitehouse 2008-07-09 11:44:07 EDT

*** This bug has been marked as a duplicate of 432057 ***

Note You need to log in before you can comment on or make changes to this bug.