Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

For bugs related to Red Hat Enterprise Linux 5 product line. The current stable release is 5.10. For Red Hat Enterprise Linux 6 and above, please visit Red Hat JIRA https://issues.redhat.com/secure/CreateIssue!default.jspa?pid=12332745 to report new issues.

Bug 432108

Summary:	GFS2: Two gfs2 nodes in four node cluster hung in ls.
Product:	Red Hat Enterprise Linux 5	Reporter:	Dean Jansa <djansa>
Component:	kernel	Assignee:	Steve Whitehouse <swhiteho>
Status:	CLOSED DUPLICATE	QA Contact:	GFS Bugs <gfs-bugs>
Severity:	low	Docs Contact:
Priority:	low
Version:	5.2	CC:	cluster-maint, edamato, rpeterso
Target Milestone:	rc
Target Release:	---
Hardware:	ia64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2008-07-09 15:44:07 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Dean Jansa 2008-02-08 21:14:20 UTC

Description of problem:

Four node ia64 cluster, after starting a test run I noted two of the nodes were
not making any progress.  I attempted to run ls in the working directory to see
if the test files were created and two of the four nodes hung in the ls.


Version-Release number of selected component (if applicable):
link-13,14,15 running 2.6.18-79.el5
link-16 running 2.6.18-76.el5



How reproducible:

Unsure


Steps to Reproduce:
mount fs, create file, ls on all nodes

  
Actual results:

two nodes (each running -79) hung in ls

link-13:

ls            D a0000002010170c0     0  6133   6084                     (NOTLB)

Call Trace:
 [<a00000010062d170>] schedule+0x1db0/0x20a0
                                sp=e0000000168bfc20 bsp=e0000000168b9378
 [<a0000002010170c0>] just_schedule+0x20/0x40 [gfs2]
                                sp=e0000000168bfcb0 bsp=e0000000168b9360
 [<a00000010062ebf0>] __wait_on_bit+0xd0/0x180
                                sp=e0000000168bfcb0 bsp=e0000000168b9310
 [<a00000010062ed60>] out_of_line_wait_on_bit+0xc0/0xe0
                                sp=e0000000168bfcb0 bsp=e0000000168b92c8
 [<a000000201017060>] wait_on_holder+0x60/0xa0 [gfs2]
                                sp=e0000000168bfcf0 bsp=e0000000168b92a8
 [<a00000020101ac30>] glock_wait_internal+0x350/0x700 [gfs2]
                                sp=e0000000168bfcf0 bsp=e0000000168b9260
 [<a00000020101b580>] gfs2_glock_nq+0x5a0/0x640 [gfs2]
                                sp=e0000000168bfcf0 bsp=e0000000168b9210
 [<a0000002010399a0>] gfs2_getattr+0x160/0x240 [gfs2]
                                sp=e0000000168bfcf0 bsp=e0000000168b91d0
 [<a00000010017bfe0>] vfs_getattr+0x100/0x220
                                sp=e0000000168bfd30 bsp=e0000000168b9198
 [<a00000010017c170>] vfs_lstat_fd+0x70/0xa0
                                sp=e0000000168bfd30 bsp=e0000000168b9168
 [<a00000010017c6f0>] sys_newlstat+0x30/0x80
                                sp=e0000000168bfdc0 bsp=e0000000168b9108
 [<a00000010000bdb0>] __ia64_trace_syscall+0xd0/0x110
                                sp=e0000000168bfe30 bsp=e0000000168b9108
 [<a000000000010620>] __start_ivt_text+0xffffffff00010620/0x400
                                sp=e0000000168c0000 bsp=e0000000168b9108



link-14:
ls            D a0000002010170c0     0  5949   5902                     (NOTLB)

Call Trace:
 [<a00000010062d170>] schedule+0x1db0/0x20a0
                                sp=e00000002caefc20 bsp=e00000002cae9378
 [<a0000002010170c0>] just_schedule+0x20/0x40 [gfs2]
                                sp=e00000002caefcb0 bsp=e00000002cae9360
 [<a00000010062ebf0>] __wait_on_bit+0xd0/0x180
                                sp=e00000002caefcb0 bsp=e00000002cae9310
 [<a00000010062ed60>] out_of_line_wait_on_bit+0xc0/0xe0
                                sp=e00000002caefcb0 bsp=e00000002cae92c8
 [<a000000201017060>] wait_on_holder+0x60/0xa0 [gfs2]
                                sp=e00000002caefcf0 bsp=e00000002cae92a8
 [<a00000020101ac30>] glock_wait_internal+0x350/0x700 [gfs2]
                                sp=e00000002caefcf0 bsp=e00000002cae9260
 [<a00000020101b580>] gfs2_glock_nq+0x5a0/0x640 [gfs2]
                                sp=e00000002caefcf0 bsp=e00000002cae9210
 [<a0000002010399a0>] gfs2_getattr+0x160/0x240 [gfs2]
                                sp=e00000002caefcf0 bsp=e00000002cae91d0
 [<a00000010017bfe0>] vfs_getattr+0x100/0x220
                                sp=e00000002caefd30 bsp=e00000002cae9198
 [<a00000010017c170>] vfs_lstat_fd+0x70/0xa0
                                sp=e00000002caefd30 bsp=e00000002cae9168
 [<a00000010017c6f0>] sys_newlstat+0x30/0x80
                                sp=e00000002caefdc0 bsp=e00000002cae9108
 [<a00000010000bdb0>] __ia64_trace_syscall+0xd0/0x110
                                sp=e00000002caefe30 bsp=e00000002cae9108
 [<a000000000010620>] __start_ivt_text+0xffffffff00010620/0x400
                                sp=e00000002caf0000 bsp=e00000002cae9108

Comment 1 Robert Peterson 2008-02-08 21:19:11 UTC

Here is what I'm seeing:  link-13 and link-14 look like they're both
waiting for glocks that they already hold.  They both have "trylock"
holders for the file /mnt/gfs2/writev-read.  Here are the pertinent
lockdump sections from all nodes:

link-13:------------------------------------------------------------
Glock 0xe000000002585b48 (2, 0x220efe3)
  gl_flags = 1 
  gl_ref = 6
  gl_state = 0
  gl_owner = pid 6112 (d_doio)
  gl_ip = 11529215054675295824
  req_gh = yes
  lvb_count = 0
  object = yes
  le = no
  reclaim = no
  aspace = 0xe000000003dc8180 nrpages = 0
  ail = 0
  Request
    owner = 6112 (d_doio)
    gh_state = 3
    gh_flags = 1 9 
    error = 0
    gh_iflags = 1 10 
    initialized at: gfs2_readpage+0xc0/0x280 [gfs2]
  Waiter3
    owner = 6112 (d_doio)
    gh_state = 3
    gh_flags = 1 9 
    error = 0
    gh_iflags = 1 10 
    initialized at: gfs2_readpage+0xc0/0x280 [gfs2]
  Waiter3
    owner = 6133 (ls)
    gh_state = 3
    gh_flags = 3 
    error = 0
    gh_iflags = 1 10 
    initialized at: gfs2_getattr+0x150/0x240 [gfs2]
  Inode: busy

link-14:------------------------------------------------------------
Glock 0xe00000002ca17b98 (2, 0x220efe3)
  gl_flags = 1 
  gl_ref = 6
  gl_state = 0
  gl_owner = pid 5929 (d_doio)
  gl_ip = 11529215054675295824
  req_gh = yes
  lvb_count = 0
  object = yes
  le = no
  reclaim = no
  aspace = 0xe00000002ca26da0 nrpages = 0
  ail = 0
  Request
    owner = 5929 (d_doio)
    gh_state = 3
    gh_flags = 1 9 
    error = 0
    gh_iflags = 1 10 
    initialized at: gfs2_readpage+0xc0/0x280 [gfs2]
  Waiter3
    owner = 5929 (d_doio)
    gh_state = 3
    gh_flags = 1 9 
    error = 0
    gh_iflags = 1 10 
    initialized at: gfs2_readpage+0xc0/0x280 [gfs2]
  Waiter3
    owner = 5949 (ls)
    gh_state = 3
    gh_flags = 3 
    error = 0
    gh_iflags = 1 10 
    initialized at: gfs2_getattr+0x150/0x240 [gfs2]
  Inode: busy

link-15:------------------------------------------------------------
Glock 0xe000000028ffd858 (2, 0x220efe3)
  gl_flags = (unlocked) 
  gl_ref = 3
  gl_state = 3
  gl_owner = 5840 (ended)
  gl_ip = 11529215054675263056
  req_gh = no
  lvb_count = 0
  object = yes
  le = no
  reclaim = no
  aspace = 0xe00000002814c430 nrpages = 1
  ail = 0
  Inode:
    num = 25/35712995
    type = 8
    i_flags = 

link-16:------------------------------------------------------------
Glock 0xe0000040657e19d0 (2, 0x220efe3)
  gl_flags = (unlocked) 
  gl_ref = 3
  gl_state = 3
  gl_owner = 7416 (ended)
  gl_ip = 11529215054678753552
  req_gh = no
  req_bh = no
  lvb_count = 0
  object = yes
  le = no
  reclaim = no
  aspace = 0xe000004063480448 nrpages = 1
  ail = 0
  Inode:
    num = 25/35712995
    type = 8
    i_flags =

Comment 2 Robert Peterson 2008-02-08 22:34:08 UTC

Okay, Steve, here is my theory.  Does this make sense?

The glock is in the unlocked state, but it has a holder.  So it was
once locked and now isn't.  However, the holder didn't get taken off
the owner's list (gl_req_gh).  This is hard to explain, but my theory
is that this sequence of events might be occurring:

1. The glock is locked, so the holder is put on the gl_req_gh list.
2. Function drop_bh is called to drop the holder.
3. drop_bh initializes "*gh = gl->gl_req_gh;" without the gl_spin lock.
4. drop_bh calls state_change
5. drop_bh calls go_inval
6. In the mean time, a holder gets put onto the gl_req_bh list.
7. go_inval finishes and returns back to drop_bh
8. Now we've got a holder, but gh is still NULL, so it never deletes
   the holder off the list.

In other words, shouldn't function drop_bh() protect its setting of
gh = gl->gl_req_gh with the gl_spin held, and perhaps only do it
after the return from go_inval()?  Perhaps drop_bh should look more
like this:

	struct gfs2_holder *gh;
...
	state_change(gl, LM_ST_UNLOCKED);
	gl->gl_req_gh;

	if (glops->go_inval)
		glops->go_inval(gl, DIO_METADATA);
	spin_lock(&gl->gl_spin);
	gh = gl->gl_req_gh;
	if (gh) {
		list_del_init(&gh->gh_list);
		gh->gh_error = 0;
	}
	spin_unlock(&gl->gl_spin);
Since there's a spin_lock immediately after, we can eliminate the
unlock/relock from the code too.

I don't know if this theory holds any water...

Comment 3 Robert Peterson 2008-02-08 22:40:58 UTC

Function xmote_bh does the same kind of thing.

Comment 4 Steve Whitehouse 2008-02-11 16:44:37 UTC

gl_req_gh isn't a list, its a pointer to the lock request which caused the
current state change and is only valid during a demote or promote which involved
some remote operation.

Comment 5 Kiersten (Kerri) Anderson 2008-02-13 17:08:52 UTC

These are potentially duplicates of bz 432370.  Marking NEEDINFO to have QE try
to recreate.

Comment 6 Steve Whitehouse 2008-06-02 10:32:20 UTC

This looks exactly like #432057, so unless this can still be recreated then I'll
mark it as a dup of that bug.

Comment 8 Steve Whitehouse 2008-07-09 15:44:07 UTC


*** This bug has been marked as a duplicate of 432057 ***