349001 – GFS: d_rwrevdirectlarge hung

Bug 349001 - GFS: d_rwrevdirectlarge hung

Summary: GFS: d_rwrevdirectlarge hung

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Cluster Suite
Classification:	Retired
Component:	dlm-kernel
Sub Component:
Version:	4
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	David Teigland
QA Contact:	Cluster QE
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	228916 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2007-10-23 16:24 UTC by Nate Straz
Modified:	2018-10-19 21:50 UTC (History)
CC List:	7 users (show)
Fixed In Version:	RHBA-2007-0995
Clone Of:
Environment:
Last Closed:	2007-11-21 21:56:03 UTC
Embargoed:

Attachments	(Terms of Use)
lockdump with glock history showing failure (921.08 KB, application/octet-stream) 2007-10-28 17:56 UTC, Robert Peterson	no flags	Details
Annotated glock history of the problem--first version (7.12 KB, text/plain) 2007-10-30 13:46 UTC, Robert Peterson	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2007:0995	0	normal	SHIPPED_LIVE	dlm-kernel bug fix update	2007-11-29 14:48:05 UTC

Description Nate Straz 2007-10-23 16:24:52 UTC

Description of problem:

While running distributed I/O tests, the test d_rwrevdirectlarge hung on one
node in a RHCS 4.6 cluster.

Version-Release number of selected component (if applicable):
kernel-hugemem-2.6.9-62.EL
GFS-kernel-hugemem-2.6.9-75.6
cman-1.0.17-0
GFS-6.1.15-1


How reproducible:
unknown

Steps to Reproduce:
1. 
2.
3.
  
Actual results:

Stack traces of the hung processes on morph-04:

d_doio        D 000000D0  1948 17022  17021         17023       (NOTLB)
4effadd8 00000002 00000286 000000d0 7baae924 8071d268 02155fe2 0000793b 
       03fce3a0 7fe4f080 00000000 04018de0 00000000 00000000 abce4c40 000f4eb9 
       0231fa80 74c510b0 74c5121c 00000000 00000002 4effaf2c 4effaf28 4effadf4 
Call Trace:
 [<02155fe2>] rw_vm+0x28e/0x29c
 [<022ce47a>] wait_for_completion+0x94/0xcb
 [<0211e8b0>] default_wake_function+0x0/0xc
 [<02143074>] buffered_rmqueue+0x17d/0x1a5
 [<0211e8b0>] default_wake_function+0x0/0xc
 [<02143150>] __alloc_pages+0xb4/0x2a6
 [<82c99d4e>] glock_wait_internal+0xaa/0x1ef [gfs]
 [<82c9a1f1>] gfs_glock_nq+0xe3/0x116 [gfs]
 [<82cae17e>] do_write_direct+0xf3/0x3cc [gfs]
 [<022ce348>] schedule+0x850/0x8ee
 [<022ce378>] schedule+0x880/0x8ee
 [<0214de89>] handle_mm_fault+0xdc/0x193
 [<82cad8ec>] walk_vm+0xd7/0x100 [gfs]
 [<82caeaa9>] __gfs_write+0xa1/0xbb [gfs]
 [<82cae08b>] do_write_direct+0x0/0x3cc [gfs]
 [<82caeace>] gfs_write+0xb/0xe [gfs]
 [<0215af9e>] vfs_write+0xb6/0xe2
 [<0215b068>] sys_write+0x3c/0x62
d_doio        D 00000020  1948 17023  17021         17024 17022 (NOTLB)
79320e20 00000002 00000026 00000020 1c19ece0 7d5208b0 00000019 04018de0 
       7d5208b0 00000000 04021740 04020de0 00000001 000f4240 ac57a080 000f4eb9 
       7d5208b0 04791430 0479159c 00000001 000000d2 ffff0001 2d255b78 2d255af0 
Call Trace:
 [<022cee3e>] rwsem_down_read_failed+0x143/0x162
 [<82caf813>] .text.lock.ops_file+0x51/0xca [gfs]
 [<82cad8ec>] walk_vm+0xd7/0x100 [gfs]
 [<02155fe2>] rw_vm+0x28e/0x29c
 [<82cadd34>] __gfs_read+0x43/0x4a [gfs]
 [<82cad9e4>] do_read_direct+0x0/0x1fb [gfs]
 [<82cadd46>] gfs_read+0xb/0xe [gfs]
 [<0215b30a>] do_readv_writev+0x19d/0x1f7
 [<82cadd3b>] gfs_read+0x0/0xe [gfs]
 [<0214f513>] do_mmap_pgoff+0x3ef/0x688
 [<0214f6ae>] do_mmap_pgoff+0x58a/0x688
 [<0215b39f>] vfs_readv+0x3b/0x40
 [<0215b423>] sys_readv+0x3c/0x62
d_doio        D 040181C0  1904 17024  17021         17025 17023 (NOTLB)
5304ff1c 00000002 00000021 040181c0 3a59b4f8 2b416330 00000019 04020de0 
       2b416330 00000000 04019740 04018de0 00000000 000f4240 ac1a9780 000f4eb9 
       2b416330 7ff54230 7ff5439c 00000002 5304ff6c 2d255b60 00000246 2d255b68 
Call Trace:
 [<022cd925>] __down+0x81/0xdb
 [<0211e8b0>] default_wake_function+0x0/0xc
 [<022cda9c>] __down_failed+0x8/0xc
 [<82caf853>] .text.lock.ops_file+0x91/0xca [gfs]
 [<82caeace>] gfs_write+0xb/0xe [gfs]
 [<0215af9e>] vfs_write+0xb6/0xe2
 [<0215b068>] sys_write+0x3c/0x62
d_doio        D 040181C0  1948 17025  17021         17026 17024 (NOTLB)
3fec5f1c 00000002 0989a000 040181c0 48a434f8 00000246 0989a000 0989c000 
       63c01268 040181c0 0214c136 04018de0 00000000 000f4240 ac29d9c0 000f4eb9 
       0231fa80 2b416330 2b41649c 00000000 00000002 2d255b60 00000246 2d255b68 
Call Trace:
 [<0214c136>] zap_pmd_range+0x59/0x7c
 [<022cd925>] __down+0x81/0xdb
 [<0211e8b0>] default_wake_function+0x0/0xc
 [<022cda9c>] __down_failed+0x8/0xc
 [<82caf853>] .text.lock.ops_file+0x91/0xca [gfs]
 [<82caeace>] gfs_write+0xb/0xe [gfs]
 [<0215af9e>] vfs_write+0xb6/0xe2
 [<0215b068>] sys_write+0x3c/0x62
d_doio        D 00000000  1948 17026  17021               17025 (NOTLB)
6fd31e80 00000002 00000000 00000000 0370d3a0 00000000 59d93067 00000000 
       3a69da40 0214d9c8 5bd93000 04020de0 00000001 000f4240 ac66e2c0 000f4eb9 
       81eb10b0 7d5208b0 7d520a1c f6f48000 00000001 fffe0001 2d255b78 2d255af0 
Call Trace:
 [<0214d9c8>] do_anonymous_page+0x1c9/0x1db
 [<022cee3e>] rwsem_down_read_failed+0x143/0x162
 [<82caf813>] .text.lock.ops_file+0x51/0xca [gfs]
 [<82cad8ec>] walk_vm+0xd7/0x100 [gfs]
 [<0214f037>] vma_merge+0x156/0x165
 [<0214f513>] do_mmap_pgoff+0x3ef/0x688
 [<82cadd34>] __gfs_read+0x43/0x4a [gfs]
 [<82cad9e4>] do_read_direct+0x0/0x1fb [gfs]
 [<82cadd46>] gfs_read+0xb/0xe [gfs]
 [<0215adf1>] vfs_read+0xb6/0xe2
 [<0215b006>] sys_read+0x3c/0x62

glock we are stuck on from morph-04:
Glock (2, 36574638)
  gl_flags = 1 
  gl_count = 6
  gl_state = 1
  req_gh = yes
  req_bh = yes
  lvb_count = 0
  object = yes
  new_le = no
  incore_le = no
  reclaim = no
  aspace = 3
  ail_bufs = no
  Request
    owner = -1
    gh_state = 0
    gh_flags = 0 
    error = 0
    gh_iflags = 2 4 5                                                          
  Waiter2
    owner = -1
    gh_state = 0
    gh_flags = 0
    error = 0
    gh_iflags = 2 4 5                                                          
  Waiter3
    owner = 17022
    gh_state = 2
    gh_flags = 
    error = 0
    gh_iflags = 1 
  Inode: busy

DLM lock on master (morph-02, node 3):

Resource 60e7bb8c (parent 00000000). Name (len=24) "       2         22e15ae"  
Master Copy
LVB: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
Granted Queue
Conversion Queue
Waiting Queue

glock on morph-02:

Glock (2, 36574638)
  gl_flags = 
  gl_count = 2
  gl_state = 0
  req_gh = no
  req_bh = no
  lvb_count = 0
  object = yes
  new_le = no
  incore_le = no
  reclaim = no
  aspace = 0
  ail_bufs = no
  Inode:
    num = 36574638/36574638
    type = 1
    i_count = 1
    i_flags = 
    vnode = yes


Expected results:


Additional info:

Comment 1 Robert Peterson 2007-10-23 22:45:43 UTC

The above processes on morph-04 are waiting for a LM_ST_DEFERRED lock
on an inode glock.  The glock seems to be in an EXCLUSIVE state, but
there is apparently no holder on the Request queue, which is odd.
Since it's on Waiters3, the lock is waiting to be promoted, but since
it's already in exclusive, isn't that a demote?

I wonder if the glock was held in a lower mode (LM_ST_SHARED) and
then a requester got promoted to exclusive, leaving this promote
request behind or something.  It makes me think back on Josef's
fix to gfs2 where he made this comment:

"If a glock is in the exclusive state and a request for demote to
deferred has been received, then further requests for demote to
shared are being ignored. This patch fixes that by ensuring that
we demote to unlocked in that case."

Maybe the same can be said not only for shared, but for deferred as well.
Of course, the code is all different in gfs1, but I might as well add
him to the cc list too in case he has thoughts on it.  Maybe his
fix needs to be ported to gfs1.  I'll check that out tomorrow.

There was a weird syslog message on morph-02 (the dlm master for the
glock in question).  It said:  dlm: unlock ungranted 3

I don't know what that means.  Does it mean that someone tried to 
unlock said glock and was denied?

There were also weird syslog messages on morph-04, the node that's hung:

Oct 22 14:40:09 morph-04 qarshd[17019]: Running cmdline: cd /mnt/vedder &&
/usr/tests/sts-rhel4.6/bin/d_doio -P try -I 19942 -a -m 1000 -r 100 -n 5 
Oct 22 14:40:18 morph-04 kernel: dlm: vedder: cancel reply ret 0
Oct 22 14:40:18 morph-04 kernel: lock_dlm: unlock sb_status 0 2,22e15ae flags 0
Oct 22 14:40:18 morph-04 kernel: dlm: vedder: process_lockqueue_reply id ff03cf
state 0

I'd like to see what Dave T. can tell me about these messages, so I'm
adding him to the cc list.

Comment 2 Robert Peterson 2007-10-24 14:19:19 UTC

According to Dave, the "dlm: unlock ungranted 3" means:

"...a simple case of unlocking a lock that's not been granted
which should never happen, of course...The problem is that gfs is 
unlocking a lock that it presumes is granted, but the dlm thinks the
lock is being converted when it gets the unlock from gfs."

Comment 3 Robert Peterson 2007-10-28 17:56:06 UTC

Created attachment 240911 [details]
lockdump with glock history showing failure

I couldn't recreate the problem on my trin-{09,10,11} cluster, so I
put debug code on the morphs.  With the debug code on, it took several
runs throughout the weekend before it finally recreated.  But it did.

This tgz contains "gfs_tool lockdump" from all the morphs at the time
of failure.  The glock.c code has a "glock history" feature I've added
to analyze problems like this before in gfs2.  It remembers the last 100
significant events for every glock.

Although I haven't gone through the glock history yet, I can tell this:
The system showing the problem is morph-04 and the glock having the
problem is (2, 66860).	The problem will hopefully be apparent in the
history of that glock.	I just need to go through it.

Comment 4 Robert Peterson 2007-10-29 20:08:09 UTC

I'm testing a fix now.  With debug code, my fix has gone through
sixteen iterations of dd_io successfully.  Now I'm testing it without
the debug code, so just the bare minimum fix.  So far it's gone three
iterations successfully.  (Each iteration takes roughly 15 minutes).

Comment 5 Robert Peterson 2007-10-30 13:46:36 UTC

Created attachment 243361 [details]
Annotated glock history of the problem--first version

This is a condensed version of the glock dump from one failure
scenario on morph-04.  It contains only the glock in question and it
has some of my notes.  I'm now running a newer version that should
give me more details in the glock history as to how it got that way.
Provided I can recreate the problem again.

Comment 6 Robert Peterson 2007-10-30 13:51:56 UTC

Regarding comment #4: I ran with my patch for five iterations and finally
hit the problem again.  Without the patch, it was usually failing on the
first iteration.  So the patch definitely closes a timing window, but
the problem remains (I had a suspicion that was the case).
Now I'm running with my improved glock history patch again, but like
before, it's not recreating.  Yesterday it ran for 35 iterations without
failure.  This morning the test stopped around 6:00am, after running
overnight, but I'm not sure if it was a recreation or because my vpn
dropped.  I took some information and restarted it.

Comment 7 Robert Peterson 2007-10-30 18:50:29 UTC

As per Dave's suggestion, I tried running with the 4.5 gfs.  So far it
has gone eight iterations without a problem.  Before doing this exercise,
I narrowed the problem down to function gfs_glock_dq() which is where my
patch from comment #4 is located.  It is also the location of the only
significant change to gfs between 4.5 and today.  That change was done
for bug 191222.  I believe that fix may be a regression.  I'm adding Abhi
to the cc list to get him in the loop.

Comment 8 Robert Peterson 2007-10-30 19:25:19 UTC

I'm confused.  It now looks like glock.c is identical between RHEL4.5
and today.  As a matter of fact, it looks like there are no significant
differences between in the gfs kernel code AT ALL except for the
gfs_reset_readahead code that should just lie dormant until activated.
Still, I suspect that function is the one with the problem, and I 
suspect that code change is what did it.  The 191222 patch seems
reasonable at first glance.  My patch from comment #4 has to do with
what I'm seeing in the glock history, which is this:
While one process is inside that function, gfs_glock_dq(), another
process comes in and messes with the glock.  Then assumptions made at the
beginning of gfs_glock_dq are not necessarily true at the end.

Comment 9 Robert Peterson 2007-10-31 14:00:30 UTC

With the latest (almost 4.6) code I ran 97 iterations of JUST
d_rwrevdirectlarge last night without any failure.  When I ran the 
entire dd_io, it recreated on the 9th attempt.  So apparently it's not
enough to run just d_rwrevdirectlarge to get the problem to recreate.
Now I'm running a new patch to see if it will fix the problem.
Only time will tell if this is fixed.  The problem is that each run
lasts about 17 minutes.  If I was able to run just d_rwrevdirectlarge,
it would take only somewhere between 30 seconds and 5 minutes.

Comment 10 Robert Peterson 2007-11-02 15:00:59 UTC

Just an quick update to let everyone know what's happening here.
For some reason, the problem seems to be getting harder to recreate.
The hang is ALWAYS accompanied by the dlm message:
"dlm: unlock ungranted".  I've used lots of instrumentation in gfs to
determine that gfs SEEMS to be doing the right thing.  Here is my
latest annotated, instrumented glock history showing what's happening:

Glock (2, 145829642)
  gl_flags =GLF_LOCK 
  gl_count = 9
  gl_state = 1 (LM_ST_EXCLUSIVE)
  req_gh = yes
  req_bh = yes
  lvb_count = 0
  object = yes
  new_le = no
  incore_le = no
  reclaim = no
  aspace = 9
  ail_bufs = no
  Request
    owner = -1
    gh_state = 0
    gh_flags = 0 
    error = 0
    gh_iflags = 2 4 5 
  Waiter2
    owner = -1
    gh_state = 0
    gh_flags = 0 
    error = 0
    gh_iflags = 2 4 5 
  Waiter3
    owner = 9120
    gh_state = 2
    gh_flags = 
    error = 0
    gh_iflags = 1 
  Waiter3
    owner = 9118
    gh_state = 2
    gh_flags = 
    error = 0
    gh_iflags = 1 
  Waiter3
    owner = 9121
    gh_state = 3
    gh_flags = 3 
    error = 0
    gh_iflags = 1 
  Waiter3
    owner = 9117
    gh_state = 2
    gh_flags = 
    error = 0
    gh_iflags = 1 
  Inode: busy
  Glock history:
    1055365: RQ_PROMOTE4: LM_ST_DEFERRED, ref = 4, flags=1
    1055366: gfs_glock_dq(start): LM_ST_DEFERRED, ref = 4, flags=1
    1055367: gfs_holder_init: LM_ST_DEFERRED, ref = 5, flags=1
    1055368: gfs_glock_dq(b4runq): LM_ST_DEFERRED, ref = 4, flags=0
    1055369: gfs_glock_dq(end): LM_ST_DEFERRED, ref = 4, flags=0
    1055370: gfs_holder_init: LM_ST_DEFERRED, ref = 4, flags=0
    1055371: GFS_GLOCK_NQ: LM_ST_DEFERRED, ref = 4, flags=0
    1055372: RQ_PROMOTE4: LM_ST_DEFERRED, ref = 4, flags=1
    1055373: gfs_glock_dq(start): LM_ST_DEFERRED, ref = 4, flags=1
    1055374: gfs_holder_init: LM_ST_DEFERRED, ref = 5, flags=1
    1055375: gfs_glock_dq(b4runq): LM_ST_DEFERRED, ref = 4, flags=0
    1055376: gfs_glock_dq(end): LM_ST_DEFERRED, ref = 4, flags=0
    1055377: gfs_holder_init: LM_ST_DEFERRED, ref = 4, flags=0
    1055378: GFS_GLOCK_NQ: LM_ST_DEFERRED, ref = 4, flags=0
    1055379: RQ_PROMOTE4: LM_ST_DEFERRED, ref = 4, flags=1
    1055380: gfs_glock_dq(start): LM_ST_DEFERRED, ref = 4, flags=1
    1055381: gfs_holder_init: LM_ST_DEFERRED, ref = 5, flags=1
    1055382: gfs_glock_dq(b4runq): LM_ST_DEFERRED, ref = 4, flags=0
    1055383: gfs_glock_dq(end): LM_ST_DEFERRED, ref = 4, flags=0
    1055384: gfs_holder_init: LM_ST_DEFERRED, ref = 4, flags=0
    1055385: GFS_GLOCK_NQ: LM_ST_DEFERRED, ref = 4, flags=0
    1055386: RQ_PROMOTE4: LM_ST_DEFERRED, ref = 4, flags=1
    1055387: gfs_glock_dq(start): LM_ST_DEFERRED, ref = 4, flags=1
    1055388: gfs_holder_init: LM_ST_DEFERRED, ref = 5, flags=1
    1055389: gfs_glock_dq(b4runq): LM_ST_DEFERRED, ref = 4, flags=0
    1055390: gfs_glock_dq(end): LM_ST_DEFERRED, ref = 4, flags=0
    1055391: gfs_holder_init: LM_ST_DEFERRED, ref = 4, flags=0
    1055392: GFS_GLOCK_NQ: LM_ST_DEFERRED, ref = 4, flags=0
    1055393: RQ_PROMOTE4: LM_ST_DEFERRED, ref = 4, flags=1
    1055394: gfs_glock_dq(start): LM_ST_DEFERRED, ref = 4, flags=1
    1055395: gfs_holder_init: LM_ST_DEFERRED, ref = 5, flags=1
    1055396: gfs_glock_dq(b4runq): LM_ST_DEFERRED, ref = 4, flags=0
    1055397: gfs_glock_dq(end): LM_ST_DEFERRED, ref = 4, flags=0
    1055398: gfs_holder_init: LM_ST_DEFERRED, ref = 4, flags=0
    1055399: GFS_GLOCK_NQ: LM_ST_DEFERRED, ref = 4, flags=0
    1055400: RQ_PROMOTE4: LM_ST_DEFERRED, ref = 4, flags=1
    1055401: gfs_holder_init: LM_ST_DEFERRED, ref = 5, flags=1
    1055402: GFS_GLOCK_NQ: LM_ST_DEFERRED, ref = 5, flags=1
    1055403: RQ_PROMOTE4: LM_ST_DEFERRED, ref = 5, flags=1
    1055404: gfs_holder_init: LM_ST_DEFERRED, ref = 6, flags=1
    1055405: GFS_GLOCK_NQ: LM_ST_DEFERRED, ref = 6, flags=1
    1055406: RQ_PROMOTE4: LM_ST_DEFERRED, ref = 6, flags=1
    1055407: gfs_glock_dq(start): LM_ST_DEFERRED, ref = 6, flags=1
    1055408: gfs_holder_init: LM_ST_DEFERRED, ref = 7, flags=1
    1055409: gfs_glock_dq(b4runq): LM_ST_DEFERRED, ref = 6, flags=1
    1055410: gfs_glock_dq(end): LM_ST_DEFERRED, ref = 6, flags=1
    1055411: gfs_glock_dq(start): LM_ST_DEFERRED, ref = 5, flags=1
    1055412: gfs_holder_init: LM_ST_DEFERRED, ref = 6, flags=1
    1055413: gfs_glock_dq(b4runq): LM_ST_DEFERRED, ref = 5, flags=1
    1055414: gfs_glock_dq(end): LM_ST_DEFERRED, ref = 5, flags=1
    1055415: gfs_holder_init: LM_ST_DEFERRED, ref = 5, flags=1
    1055416: GFS_GLOCK_NQ: LM_ST_DEFERRED, ref = 5, flags=1
    1055417: RQ_PROMOTE4: LM_ST_DEFERRED, ref = 5, flags=1
    1055418: gfs_glock_dq(start): LM_ST_DEFERRED, ref = 5, flags=1
    1055419: gfs_holder_init: LM_ST_DEFERRED, ref = 6, flags=1
    1055420: gfs_glock_dq(b4runq): LM_ST_DEFERRED, ref = 5, flags=1
    1055421: gfs_glock_dq(end): LM_ST_DEFERRED, ref = 5, flags=1
    1055422: gfs_holder_init: LM_ST_DEFERRED, ref = 5, flags=1
    1055423: GFS_GLOCK_NQ: LM_ST_DEFERRED, ref = 5, flags=1
    1055424: RQ_PROMOTE4: LM_ST_DEFERRED, ref = 5, flags=1
    1055425: gfs_glock_dq(start): LM_ST_DEFERRED, ref = 5, flags=1
    1055426: gfs_holder_init: LM_ST_DEFERRED, ref = 6, flags=1
    1055427: gfs_glock_dq(b4runq): LM_ST_DEFERRED, ref = 5, flags=1
    1055428: gfs_glock_dq(end): LM_ST_DEFERRED, ref = 5, flags=1
    1055429: gfs_glock_dq(start): LM_ST_DEFERRED, ref = 4, flags=1
    1055430: gfs_holder_init: LM_ST_DEFERRED, ref = 5, flags=1
    1055431: gfs_glock_dq(b4runq): LM_ST_DEFERRED, ref = 4, flags=0
    1055432: gfs_glock_dq(end): LM_ST_DEFERRED, ref = 4, flags=0
    1055433: gfs_holder_init: LM_ST_DEFERRED, ref = 4, flags=0
    1055434: GFS_GLOCK_NQ: LM_ST_DEFERRED, ref = 4, flags=0
    1055435: RQ_PROMOTE4: LM_ST_DEFERRED, ref = 4, flags=1
    1055436: gfs_glock_dq(start): LM_ST_DEFERRED, ref = 4, flags=1
    1055437: gfs_holder_init: LM_ST_DEFERRED, ref = 5, flags=1
    1055438: gfs_glock_dq(b4runq): LM_ST_DEFERRED, ref = 4, flags=0
    1055439: gfs_glock_dq(end): LM_ST_DEFERRED, ref = 4, flags=0
    1055440: gfs_holder_init: LM_ST_DEFERRED, ref = 4, flags=0
    1055441: GFS_GLOCK_NQ: LM_ST_DEFERRED, ref = 4, flags=0
    1055442: gfs_glock_xmote_th: LM_ST_DEFERRED, ref = 5, flags=1000
    1055443: RQ_PROMOTE: LM_ST_DEFERRED, ref = 5, flags=1000
    1055444: search_bucket: LM_ST_DEFERRED, ref = 6, flags=1000
    1055445: state_change: LM_ST_EXCLUSIVE, ref = 6, flags=1000
    1055446: LM_CB_ASYNC: LM_ST_EXCLUSIVE, ref = 4, flags=1
    1055447: search_bucket: LM_ST_EXCLUSIVE, ref = 5, flags=1
    1055448: HANDLE_CALLBACK(strt): LM_ST_EXCLUSIVE, ref = 5, flags=1
    1055449: gfs_holder_init: LM_ST_EXCLUSIVE, ref = 6, flags=1
    1055450: HANDLE_CALLBACK(end): LM_ST_EXCLUSIVE, ref = 6, flags=101
    1055451: gfs_glock_dq(start): LM_ST_EXCLUSIVE, ref = 5, flags=101
    1055452: gfs_holder_init: LM_ST_EXCLUSIVE, ref = 6, flags=101
    1055453: gfs_glock_dq(b4runq): LM_ST_EXCLUSIVE, ref = 5, flags=100
    1055454: RQ_DEMOTE3: LM_ST_EXCLUSIVE, ref = 5, flags=100
    1055455: gfs_glock_drop_th: LM_ST_EXCLUSIVE, ref = 6, flags=100
    1055456: gfs_glock_dq(end): LM_ST_EXCLUSIVE, ref = 6, flags=100
    1055457: gfs_holder_init: LM_ST_EXCLUSIVE, ref = 6, flags=100
    1055458: GFS_GLOCK_NQ: LM_ST_EXCLUSIVE, ref = 6, flags=100
    1055459: gfs_holder_init: LM_ST_EXCLUSIVE, ref = 7, flags=1100
    1055460: GFS_GLOCK_NQ: LM_ST_EXCLUSIVE, ref = 7, flags=1100
    1055461: gfs_holder_init: LM_ST_EXCLUSIVE, ref = 8, flags=1100
    1055462: GFS_GLOCK_NQ: LM_ST_EXCLUSIVE, ref = 8, flags=1100
    1055463: gfs_holder_init: LM_ST_EXCLUSIVE, ref = 9, flags=1100
    1055464: GFS_GLOCK_NQ: LM_ST_EXCLUSIVE, ref = 9, flags=1100

On the line marked 1055446, we see the async callback from the lock
manager, giving us the lock in question.  It should be ours to
promote, demote or unlock as we see fit.  On the line marked
1055455, we see the call to gfs_glock_drop_th, which called the
lock module to unlock that glock.  However, all indication is that
it got a non-zero return code from the lock modules and therefore
bypassed calling drop_bh.  Likewise, we never get any kind of callback
indicating the unlock was done.  Instead, we get errors from dlm.

Here are dlm messages I got on my latest recreation:

dlm: vedder: process_lockqueue_reply id 150224 state 0
dlm: vedder: process_lockqueue_reply id 1c02ef state 0
dlm: vedder: process_lockqueue_reply id a0038e state 0
dlm: vedder: process_lockqueue_reply id 8f0151 state 0
dlm: vedder: process_lockqueue_reply id aa023d state 0
dlm: unlock ungranted 3
 [<f8bd1598>] dlm_unlock_stage2+0x148/0x210 [dlm]
 [<f8bd3c81>] process_cluster_request+0x81f/0x8e4 [dlm]
 [<02146b29>] cache_grow+0x116/0x138
 [<f8bd6f61>] midcomms_process_incoming_buffer+0x1ce/0x1f8 [dlm]
 [<02143074>] buffered_rmqueue+0x17d/0x1a5
 [<02120511>] autoremove_wake_function+0x0/0x2d
 [<02143150>] __alloc_pages+0xb4/0x2a6
 [<f8bd518c>] receive_from_sock+0x181/0x254 [dlm]
 [<f8bd607b>] dlm_recvd+0x0/0x95 [dlm]
 [<f8bd5f25>] process_sockets+0x56/0x91 [dlm]
 [<f8bd6100>] dlm_recvd+0x85/0x95 [dlm]
 [<02133271>] kthread+0x73/0x9b
 [<021331fe>] kthread+0x0/0x9b
 [<021041f5>] kernel_thread_helper+0x5/0xb


On the other node:


dlm: vedder: process_lockqueue_reply id 190116 state 0
dlm: vedder: convert from 3 ungranted
dlm: request
rh_cmd 4
rh_lkid f008f
remlkid 1c02ba
flags 44
status 0
rqmode 2
dlm: lkb
id 1c02ba
remid f008f
flags 40000
status 3
rqmode 5
grmode 2
nodeid 3
lqstate 0
lqflags 44
dlm: rsb
name "       2         8b12f0a"
nodeid 0
flags 2
ref 5
...
dlm: vedder: process_lockqueue_reply id 330330 state 0
dlm: vedder: process_lockqueue_reply id 370264 state 0
dlm: vedder: process_lockqueue_reply id 35004f state 0
dlm: vedder: process_lockqueue_reply id 31019b state 0
dlm: vedder: process_lockqueue_reply id 3102f8 state 0
dlm: vedder: process_lockqueue_reply id 390236 state 0
dlm: vedder: process_lockqueue_reply id 3f0282 state 0
dlm: vedder: process_lockqueue_reply id 420102 state 0
dlm: vedder: process_lockqueue_reply id 310290 state 0
dlm: vedder: process_lockqueue_reply id 4102b8 state 0
dlm: vedder: process_lockqueue_reply id 4303e2 state 0
dlm: vedder: process_lockqueue_reply id 38016e state 0
dlm: vedder: process_lockqueue_reply id 4a006d state 0
dlm: vedder: cancel reply ret 0
lock_dlm: unlock sb_status 0 2,8b12f0a flags 0
dlm: vedder: process_lockqueue_reply id 4902aa state 0

I'm now in the process of trying to recreate the failure with some
instrumentation recommended by Dave Teigland to dump the lkb and rsb
when we get any of the three errors shown above.

I'm also working on new instrumentation to the gfs kernel to show
whether it's sending the proper things to the lock module.

Comment 11 Robert Peterson 2007-11-02 15:55:43 UTC

Here are the messages from the instrumented DLM at the time of failure:

dlm: vedder: process_lockqueue_reply id 1103c5 state 0
dlm: vedder: process_lockqueue_reply id 1100d7 state 0
dlm: unlock ungranted 3
dlm: lkb
id 14037a
remid f03ed
flags 440000
status 0
rqmode 5
grmode 0
nodeid 4
lqstate 0
lqflags 44
dlm: rsb
name "       2              74"
nodeid 0
flags 2
ref 5
 [<82bd1597>] dlm_unlock_stage2+0x147/0x21e [dlm]
 [<82bd3ca6>] process_cluster_request+0x81f/0x8e4 [dlm]
 [<02146b29>] cache_grow+0x116/0x138
 [<02117472>] smp_apic_timer_interrupt+0x9a/0x9c
 [<82bd6f85>] midcomms_process_incoming_buffer+0x1ce/0x1f8 [dlm]
 [<02143074>] buffered_rmqueue+0x17d/0x1a5
 [<02120511>] autoremove_wake_function+0x0/0x2d
 [<02143150>] __alloc_pages+0xb4/0x2a6
 [<82bd51b0>] receive_from_sock+0x181/0x254 [dlm]
 [<82bd609f>] dlm_recvd+0x0/0x95 [dlm]
 [<82bd5f49>] process_sockets+0x56/0x91 [dlm]
 [<82bd6124>] dlm_recvd+0x85/0x95 [dlm]
 [<02133271>] kthread+0x73/0x9b
 [<021331fe>] kthread+0x0/0x9b
 [<021041f5>] kernel_thread_helper+0x5/0xb

-------- Other node:

dlm: vedder: process_lockqueue_reply id 17036a state 0
dlm: vedder: process_lockqueue_reply id 150034 state 0
dlm: vedder: cancel reply ret 0
dlm: lkb
id f03ed
remid 14037a
flags 10080000
status 0
rqmode -1
grmode 2
nodeid 3
lqstate 0
lqflags 0
dlm: rsb
name "       2              74"
nodeid 3
flags 0
ref 1
dlm: reply
rh_cmd 5
rh_lkid f03ed
lockstate 3
nodeid 34892585
status 0
lkid 14037a
lock_dlm: unlock sb_status 0 2,74 flags 0
dlm: vedder: process_lockqueue_reply id f03ed state 0

Comment 12 Robert Peterson 2007-11-05 04:21:10 UTC

Okay, just to clarify:  The test is hung because it's waiting for a
glock.  The glock is stuck because it's got the GLF_LOCK gl_flag set.
Therefore run_queue won't do anything to it.  It's in that state
because it never got its callback from a drop request to the lock
manager.  Here is the last few entries in the sequence of events from
my glock history, with my comments, AFTER the entries:

This history starts with a holder, in LM_ST_DEFERRED state.  The glock
is dequeued:
    201826: gfs_glock_dq(start): LM_ST_DEFERRED, LM_ST_DEFERRED, rc=1, flags=1
    201827: gfs_glock_dq(b4runq): LM_ST_DEFERRED, LM_ST_DEFERRED, rc=1, flags=0
    201828: gfs_glock_dq(end): LM_ST_DEFERRED, LM_ST_DEFERRED, rc=1, flags=0
When glock_dq is done, the holder has been taken off the queue, thus
flags=0 (no holders on any of the waiters queues).
    201829: gfs_glock_xmote_th: LM_ST_DEFERRED, LM_ST_EXCLUSIVE, rc=777, flags=1000
    201830: gfs_glock_xmote_th: LM_ST_DEFERRED, LM_ST_EXCLUSIVE, rc=128, flags=1000
A new request came in to promote the lock from deferred to exclusive.
The "777" is before the call to lock manager, the "128" is right after.
The "128" indicates the async bit is on as well.  The flags=1000
indicates that there the request went onto Waiters3.
    201831: RQ_PROMOTE: LM_ST_DEFERRED, LM_ST_EXCLUSIVE, rc=1, flags=1000
The lock is being promoted to exclusive.  This log entry implies that
!relaxed_state_ok().  This is who called the previous gfs_glock_xmote_th.
    201832: LM_CB_ASYNC: LM_ST_DEFERRED, LM_ST_DEFERRED, rc=1, flags=1000
Here we have the async callback from the promote.  All is well.
    201833: state_change: LM_ST_DEFERRED, LM_ST_EXCLUSIVE, rc=0, flags=1000
The state is therefore changed to exclusive.  After this, the holder
is moved from Waiters3 to gl_holders.
    201834: gfs_glock_dq(start): LM_ST_EXCLUSIVE, LM_ST_EXCLUSIVE, rc=1, flags=101
A new request came in to unlock the glock.  The rc=1 means one sole
holder.
    201835: gfs_glock_dq(b4runq): LM_ST_EXCLUSIVE, LM_ST_EXCLUSIVE, rc=1, flags=100
This is right before the run_queue at the bottom of gfs_glock_dq.  The
flags=100 indicates the previous unlock request was put on Waiters2,
so it's now waiting to demote the lock.
    201836: RQ_DEMOTE3: LM_ST_EXCLUSIVE, LM_ST_UNLOCKED, rc=1, flags=100
Since we're changing state, and since it's an unlock request,
rq_demote calls go_drop_th which calls gfs_glock_drop_th().
    201837: gfs_glock_drop_th: LM_ST_EXCLUSIVE, LM_ST_EXCLUSIVE, rc=777, flags=100
    201838: gfs_glock_drop_th: LM_ST_EXCLUSIVE, LM_ST_EXCLUSIVE, rc=128, flags=100
gfs_glock_drop_th called the lock module to do the drop.  Since the
async bit is still on, this request doesn't cause drop_bh to be called.
Instead we wait for the callback.
    201839: gfs_glock_dq(end): LM_ST_EXCLUSIVE, LM_ST_EXCLUSIVE, rc=1, flags=100
The callback never came.  The debug version of dlm gave these messages:

dlm: vedder: cancel reply ret 0
dlm: lkb
id 1f0196
remid 8102e8
flags 480000
status 0
rqmode -1
grmode 5
nodeid 5
lqstate 0
lqflags 0
dlm: rsb
name "       2         22bae75"
nodeid 5
flags 0
ref 1
dlm: reply
rh_cmd 5
rh_lkid 1f0196
lockstate 3
nodeid 582
status 0
lkid 8102e8
lock_dlm: unlock sb_status 0 2,22bae75 flags 0
dlm: vedder: process_lockqueue_reply id 1f0196 state 0

So again, it seems like the cancel never caused the callback.

I'm running the test again now with some more debug messages suggested
by Dave Teigland.

Comment 13 Robert Peterson 2007-11-05 13:47:21 UTC

Output from Dave's patch:

dlm: vedder: cancel reply ret 0
dlm: lkb
id 1c010d
remid 61020a
flags 80000
status 0
rqmode -1
grmode 5
nodeid 5
lqstate 0
lqflags 0
dlm: rsb
name "       2         8b12f09"
nodeid 5
flags 0
ref 1
dlm: reply
rh_cmd 5
rh_lkid 1c010d
lockstate 3
nodeid 5
status 0
lkid 61020a
dlm: queue_ast 1c010d status 0
dlm: lkb
id 1c010d
remid 61020a
flags 80000
status 0
rqmode -1
grmode 5
nodeid 5
lqstate 0
lqflags 0
dlm: rsb
name "       2         8b12f09"
nodeid 5
flags 0
ref 1
lock_dlm: unlock sb_status 0 2,8b12f09 flags 0
dlm: vedder: process_lockqueue_reply id 1c010d state 0

Comment 14 Robert Peterson 2007-11-06 03:51:11 UTC

Here is the output from Dave's latest patch:

From morph-01:

dlm: vedder: grant lock 1502dc on lockqueue 2 44 3 3 2 0
dlm: vedder: process_lockqueue_reply id 1502dc state 0
dlm: vedder: grant lock 140270 on lockqueue 2 44 3 3 2 0
dlm: vedder: process_lockqueue_reply id 140270 state 0
dlm: vedder: grant lock 1f025d on lockqueue 2 44 3 3 2 0
dlm: vedder: process_lockqueue_reply id 1f025d state 0
dlm: vedder: grant lock 220172 on lockqueue 2 44 3 3 2 0
dlm: vedder: process_lockqueue_reply id 220172 state 0
dlm: vedder: grant lock 3b00f0 on lockqueue 2 44 3 3 2 0
dlm: vedder: process_lockqueue_reply id 3b00f0 state 0
dlm: vedder: grant lock 2a0195 on lockqueue 2 44 3 3 2 0
dlm: vedder: process_lockqueue_reply id 2a0195 state 0
dlm: vedder: grant lock 2c0224 on lockqueue 2 44 3 3 2 0
dlm: vedder: process_lockqueue_reply id 2c0224 state 0
dlm: vedder: grant lock 3101f6 on lockqueue 2 44 3 3 2 0
dlm: vedder: process_lockqueue_reply id 3101f6 state 0
dlm: vedder: grant lock 2600b8 on lockqueue 2 44 3 3 2 0
dlm: vedder: process_lockqueue_reply id 2600b8 state 0
dlm: vedder: grant lock 3c03cc on lockqueue 2 44 3 3 2 0
dlm: vedder: process_lockqueue_reply id 3c03cc state 0
dlm: vedder: grant lock 49016a on lockqueue 2 44 3 3 2 0
dlm: vedder: process_lockqueue_reply id 49016a state 0
dlm: vedder: grant lock 470086 on lockqueue 2 44 3 3 2 0
dlm: vedder: cancel reply ret 0
dlm: lkb
id 470086
remid 2b024b
flags 80000
status 0
rqmode -1
grmode 5
nodeid 4
lqstate 0
lqflags 0
dlm: rsb
name "       2         68585d6"
nodeid 4
flags 0
ref 1
dlm: reply
rh_cmd 5
rh_lkid 470086
lockstate 3
nodeid 518
status 0
lkid 2b024b
dlm: queue_ast 470086 status 0
dlm: lkb
id 470086
remid 2b024b
flags 80000
status 0
rqmode -1
grmode 5
nodeid 4
lqstate 0
lqflags 0
dlm: rsb
name "       2         68585d6"
nodeid 4
flags 0
ref 1
Reproduced: offset=0x48
lock_dlm: unlock sb_status 0 2,68585d6 flags 0
00000000 01000100 05004800 86004700 09000001 [......H...G.....] 
00000010 03000000 06020000 00000000 4B022B00 [............K.+.] 
00000020 00000020 00000003 0040F678 9000EE81 [... .....@.x....] 
00000030 06020000 46020000 00000000 80D1EE81 [....F...........] 
00000040 D0000000 0010F478 00000000 00000000 [.......x........] 
00000050 00000000 00000000 00000000 00000000 [................] 

From morph-02:

dlm: vedder: grant lock e01f2 on lockqueue 2 44 3 3 2 0
dlm: vedder: process_lockqueue_reply id e01f2 state 0
dlm: vedder: grant lock 8029c on lockqueue 2 44 3 3 2 0
dlm: vedder: process_lockqueue_reply id 8029c state 0
dlm: vedder: grant lock 19022f on lockqueue 2 44 3 3 2 0
dlm: vedder: process_lockqueue_reply id 19022f state 0
dlm: unlock ungranted 3
dlm: lkb
id 2b024b
remid 470086
flags 440000
status 0
rqmode 5
grmode 0
nodeid 2
lqstate 0
lqflags 44
dlm: rsb
name "       2         68585d6"
nodeid 0
flags 2
ref 5
 [<82bd15fd>] dlm_unlock_stage2+0x147/0x21e [dlm]
 [<82bd3e2e>] process_cluster_request+0x8df/0x9af [dlm]
 [<82bd71c5>] midcomms_process_incoming_buffer+0x1ce/0x22a [dlm]
 [<02143074>] buffered_rmqueue+0x17d/0x1a5
 [<02120511>] autoremove_wake_function+0x0/0x2d
 [<02143150>] __alloc_pages+0xb4/0x2a6
 [<82bd5344>] receive_from_sock+0x181/0x254 [dlm]
 [<82bd6233>] dlm_recvd+0x0/0x95 [dlm]
 [<82bd60dd>] process_sockets+0x56/0x91 [dlm]
 [<82bd62b8>] dlm_recvd+0x85/0x95 [dlm]
 [<02133271>] kthread+0x73/0x9b
 [<021331fe>] kthread+0x0/0x9b
 [<021041f5>] kernel_thread_helper+0x5/0xb

From morph-03:

dlm: vedder: grant lock 600b3 on lockqueue 2 44 3 3 2 0
dlm: vedder: process_lockqueue_reply id 600b3 state 0
dlm: vedder: grant lock f0028 on lockqueue 2 44 3 3 2 0
dlm: vedder: process_lockqueue_reply id f0028 state 0
dlm: vedder: grant lock 11021c on lockqueue 2 44 3 3 2 0
dlm: vedder: process_lockqueue_reply id 11021c state 0
dlm: vedder: grant lock 1102a5 on lockqueue 2 44 3 3 2 0
dlm: vedder: process_lockqueue_reply id 1102a5 state 0
dlm: vedder: grant lock 1602d7 on lockqueue 2 44 3 3 2 0
dlm: vedder: process_lockqueue_reply id 1602d7 state 0
dlm: vedder: grant lock 1100b1 on lockqueue 2 44 3 3 2 0
dlm: vedder: process_lockqueue_reply id 1100b1 state 0
dlm: vedder: grant lock 1d03ce on lockqueue 2 44 3 3 2 0
dlm: vedder: process_lockqueue_reply id 1d03ce state 0
dlm: vedder: grant lock f02c5 on lockqueue 2 44 3 3 2 0
dlm: vedder: process_lockqueue_reply id f02c5 state 0
dlm: vedder: grant lock 150062 on lockqueue 2 44 3 3 2 0
dlm: vedder: process_lockqueue_reply id 150062 state 0
dlm: vedder: grant lock 180091 on lockqueue 2 44 3 3 2 0
dlm: vedder: process_lockqueue_reply id 180091 state 0
dlm: vedder: grant lock 11023b on lockqueue 2 44 3 3 2 0
dlm: vedder: process_lockqueue_reply id 11023b state 0

From morph-04:

dlm: vedder: grant lock 40205 on lockqueue 2 44 3 3 2 0
dlm: vedder: process_lockqueue_reply id 40205 state 0
dlm: vedder: grant lock 40324 on lockqueue 2 44 3 3 2 0
dlm: vedder: process_lockqueue_reply id 40324 state 0
dlm: vedder: grant lock 400ab on lockqueue 2 44 3 3 2 0
dlm: vedder: process_lockqueue_reply id 400ab state 0
dlm: vedder: grant lock 60318 on lockqueue 2 44 3 3 2 0
dlm: vedder: process_lockqueue_reply id 60318 state 0
dlm: vedder: grant lock 802ee on lockqueue 2 44 3 3 2 0
dlm: vedder: process_lockqueue_reply id 802ee state 0
dlm: vedder: grant lock 601cf on lockqueue 2 44 3 3 2 0
dlm: vedder: process_lockqueue_reply id 601cf state 0
dlm: vedder: grant lock 80212 on lockqueue 2 44 3 3 2 0
dlm: vedder: process_lockqueue_reply id 80212 state 0
dlm: vedder: grant lock e01f0 on lockqueue 2 44 3 3 2 0
dlm: vedder: process_lockqueue_reply id e01f0 state 0
dlm: vedder: grant lock 1003be on lockqueue 2 44 3 3 2 0
dlm: vedder: process_lockqueue_reply id 1003be state 0
dlm: vedder: grant lock 800a2 on lockqueue 2 44 3 3 2 0
dlm: vedder: process_lockqueue_reply id 800a2 state 0
dlm: vedder: grant lock b01a6 on lockqueue 2 44 3 3 2 0
dlm: vedder: process_lockqueue_reply id b01a6 state 0
dlm: vedder: grant lock 1503dc on lockqueue 2 44 3 3 2 0
dlm: vedder: process_lockqueue_reply id 1503dc state 0
dlm: vedder: grant lock 1302d7 on lockqueue 2 44 3 3 2 0
dlm: vedder: process_lockqueue_reply id 1302d7 state 0
dlm: vedder: grant lock 1303e9 on lockqueue 2 44 3 3 2 0
dlm: vedder: process_lockqueue_reply id 1303e9 state 0
dlm: vedder: grant lock 11035e on lockqueue 2 44 3 3 2 0
dlm: vedder: process_lockqueue_reply id 11035e state 0
dlm: vedder: grant lock 170317 on lockqueue 2 44 3 3 2 0
dlm: vedder: process_lockqueue_reply id 170317 state 0
dlm: vedder: grant lock 14032b on lockqueue 2 44 3 3 2 0
dlm: vedder: process_lockqueue_reply id 14032b state 0

From morph-05:

dlm: vedder: grant lock 3031b on lockqueue 2 44 3 3 2 0
dlm: vedder: process_lockqueue_reply id 3031b state 0
dlm: vedder: grant lock f00fd on lockqueue 2 44 3 3 2 0
dlm: vedder: process_lockqueue_reply id f00fd state 0
dlm: vedder: grant lock b00cb on lockqueue 2 44 3 3 2 0
dlm: vedder: process_lockqueue_reply id b00cb state 0
dlm: vedder: grant lock f0152 on lockqueue 2 44 3 3 2 0
dlm: vedder: process_lockqueue_reply id f0152 state 0

Comment 15 Robert Peterson 2007-11-06 14:21:54 UTC

With my new (only slightly) improved version of the test, I was able to
recreate this hang without the code from bug 191222, so there are no more
worries about that being defective.  I was also able to recreate it with
RHEL4.5's gfs, so apparently there are no worries about it being a
regression from 4.5 to 4.6 at all.  I might try going back to the RHEL4.4
version and try that.  However, it's looking more and more like a dlm
issue right now.  It might be more worthwhile to try the RHEL4.5 dlm
instead.

Comment 16 David Teigland 2007-11-06 21:58:06 UTC

For the entire life of the dlm, there's been an annoying issue that we've
worked around and not "fixed" directly.  It's the source of all these messages:

  process_lockqueue_reply id 2c0224 state 0

The problem that a lock master sends an async "granted" message for a convert
request *before* actually sending the reply for the original convert.  The
work-around is that the requesting node just takes the granted message as an
implicit reply to the conversion and ignores the convert reply when it
arrives later (the message above is printed when it gets the out-of-order
reply for its convert).  Apart from the annoying messages, it's never been
a problem.

Now, I believe we've found a case where it's a real problem:

1. nodeA: send convert PR->CW to nodeB
   nodeB: send granted message to nodeA
   nodeB: send convert reply to nodeA
2. nodeA: receive granted message for conversion
   complete request, sending ast to gfs
3. nodeA: send convert CW->EX to nodeB
4. nodeA: receive reply for convert in step 1, which we ordinarily
   ignore, but since another convert has been sent, we mistake this
   message as the reply for the convert in step 3, and complete
   the convert request which is *not* really completed yet
5. nodeA: send unlock to nodeB
   nodeB: complains about an unlock during a conversion

Comment 17 David Teigland 2007-11-06 22:23:42 UTC

The first attempt to fix the problem in comment 16 is to have nodeB
not send a convert reply if it has already sent a granted message.

(We already do this for cases where the conversion is granted
when first processing it, but we don't in cases where the grant
is done after processing the convert.)

Comment 18 Robert Peterson 2007-11-07 14:00:52 UTC

The patch Dave asked me to try seems to have worked.  It ran the shortened
(10 min) scenario overnight without a problem, 110 iterations so far.
Based on that, I'm reassigning to Dave, but I'll help any way I can.

Here is how I was recreating the problem, in grueling detail:

1. Reboot all nodes:
   cssh -l root morph-0{1,2,3,4,5} &
   /sbin/reboot -fin
2. When they come back, restart the cluster software:
   cssh -l root morph-0{1,2,3,4,5} &
   On every node:
   service ccsd start
   service cman start
   service fenced start
   service clvmd start
3. On morph-01, reformat the file system:
   mkfs.gfs -O -p "lock_dlm" -t "morph-cluster:vedder" -j 5 /dev/linear_1/linear_10
4. On all nodes, mount the file system to /mnt/vedder and chmod it:
   mount -tgfs /dev/linear_1/linear_10 /mnt/vedder
   chmod 777 /mnt/vedder (necessary because the test doesn't run as root)
5. Log into "try" and su to nstraz (I was too lazy to change all the
   permissions on the files).
   ssh try
   su
   su nstraz
6. Go to the test case directory in tmp and start the test for 100 iterations:
   cd
/tmp/vedder.REG.lock_dlm.morph-cluster.200710121519/3/06.gfs_vs/stripe_2_4/2.dd_io
   PATH=$PATH:/usr/tests/sts-rhel4.6/bin/
for i in `seq 1 100` ; do echo $i ; rm -fR bob ; ./restart.sh bob ; done

   (The "bob" here is a directory that contains the output of the test).

If you get a failure (hang), you should do this (in this order):

   Stop the test on "try" by holding down <ctrl-c> until it stops
   Reset all the nodes
   On "try" type "d_close -a"
   Restart from step 2 above.

If you don't follow that order, the for loop will try to restart the
test and leave programs running that will interfere with the next run.

Comment 19 Robert Peterson 2007-11-08 14:56:23 UTC

The patch committed to CVS worked properly for me as well.  I ran more
than 30 iterations (several hours) of the dd_io test case yesterday
before I stopped it to reboot the system I was running it from.  Later,
when I went to restart it, someone had deleted the /tmp/ directory I had
been using to run it.  Hopefully this fix won't impact performance because
I never tested that.  Performance issues aside, this fix gets my blessing
and I think we need to get the QE team running their official dd_io test
from scratch.  I think you can change the status to modified, no?

Comment 22 errata-xmlrpc 2007-11-21 21:56:03 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2007-0995.html

Comment 23 David Teigland 2009-01-20 20:37:23 UTC

*** Bug 228916 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.