Bug 222335 - stuck transaction lock causes journal recovery to hang in gfs
stuck transaction lock causes journal recovery to hang in gfs
Status: CLOSED INSUFFICIENT_DATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: gfs-kmod (Show other bugs)
5.0
All Linux
medium Severity medium
: ---
: ---
Assigned To: Robert Peterson
Cluster QE
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2007-01-11 13:53 EST by Corey Marthaler
Modified: 2011-01-24 08:16 EST (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2011-01-24 08:16:08 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
lock_dlm patch for debugging (1.81 KB, text/plain)
2007-01-11 14:22 EST, David Teigland
no flags Details
log from taft-01 (200.76 KB, text/plain)
2007-08-27 17:15 EDT, Corey Marthaler
no flags Details
log from taft-02 (200.57 KB, text/plain)
2007-08-27 17:16 EDT, Corey Marthaler
no flags Details
log from taft-03 (200.81 KB, text/plain)
2007-08-27 17:17 EDT, Corey Marthaler
no flags Details
log from taft-04 (200.29 KB, text/plain)
2007-08-27 17:18 EDT, Corey Marthaler
no flags Details
/var/log/messages from grant-03 (818.88 KB, application/octet-stream)
2008-11-11 17:36 EST, David Teigland
no flags Details

  None (edit)
Description Corey Marthaler 2007-01-11 13:53:39 EST
Description of problem:
Hit this running revolver on the x86_64 taft cluster. According to Dave, it
appears that gfs on one machine is holding the transaction lock in PR, and not
releasing it for another node who has requested it in EX. The other cluster
services and everything else related to groups are fine, the only issue is that
one machine is blocked waiting for a lock that another node isn't releasing for
some unknown reason.


[root@taft-01 ~]# cman_tool nodes
Node  Sts   Inc   Joined               Name
   1   M      4   2007-01-08 17:56:02  taft-01
   2   M     28   2007-01-09 15:23:16  taft-02
   3   M     12   2007-01-08 17:56:03  taft-03
   4   M     12   2007-01-08 17:56:03  taft-04



[root@taft-01 ~]# group_tool -v
type             level name     id       state node id local_done
fence            0     default  00010003 none
[1 2 3 4]
dlm              1     clvmd    00010004 none
[1 2 3 4]
dlm              1     gfs1     00020002 FAIL_START_WAIT 2 200030003 0
[1 3 4]
dlm              1     gfs2     00040002 FAIL_START_WAIT 2 200030003 0
[1 3 4]
dlm              1     gfs3     00020001 FAIL_START_WAIT 2 200030003 0
[1 3 4]
dlm              1     gfs4     00060002 FAIL_START_WAIT 2 200030003 0
[1 3 4]
gfs              2     gfs1     00010002 FAIL_START_WAIT 2 200030003 0
[1 3 4]
gfs              2     gfs2     00030002 FAIL_START_WAIT 2 200030003 0
[1 3 4]
gfs              2     gfs3     00010001 FAIL_START_WAIT 2 200030003 0
[1 3 4]
gfs              2     gfs4     00050002 FAIL_START_WAIT 2 200030003 0
[1 3 4]



[root@taft-02 ~]# group_tool -v
type             level name     id       state node id local_done
fence            0     default  00010003 none
[1 2 3 4]
dlm              1     clvmd    00010004 none
[1 2 3 4]



[root@taft-03 ~]# group_tool -v
type             level name     id       state node id local_done
fence            0     default  00010003 none
[1 2 3 4]
dlm              1     clvmd    00010004 none
[1 2 3 4]
dlm              1     gfs1     00020002 FAIL_START_WAIT 2 200030003 1
[1 3 4]
dlm              1     gfs2     00040002 FAIL_START_WAIT 2 200030003 1
[1 3 4]
dlm              1     gfs3     00020001 FAIL_START_WAIT 2 200030003 1
[1 3 4]
dlm              1     gfs4     00060002 FAIL_START_WAIT 2 200030003 1
[1 3 4]
gfs              2     gfs1     00010002 FAIL_START_WAIT 2 200030003 1
[1 3 4]
gfs              2     gfs2     00030002 FAIL_START_WAIT 2 200030003 1
[1 3 4]
gfs              2     gfs3     00010001 FAIL_START_WAIT 2 200030003 1
[1 3 4]
gfs              2     gfs4     00050002 FAIL_START_WAIT 2 200030003 1
[1 3 4]



[root@taft-04 ~]# group_tool -v
type             level name     id       state node id local_done
fence            0     default  00010003 none
[1 2 3 4]
dlm              1     clvmd    00010004 none
[1 2 3 4]
dlm              1     gfs1     00020002 FAIL_START_WAIT 2 200030003 1
[1 3 4]
dlm              1     gfs2     00040002 FAIL_START_WAIT 2 200030003 1
[1 3 4]
dlm              1     gfs3     00020001 FAIL_START_WAIT 2 200030003 1
[1 3 4]
dlm              1     gfs4     00060002 FAIL_START_WAIT 2 200030003 1
[1 3 4]
gfs              2     gfs1     00010002 FAIL_START_WAIT 2 200030003 1
[1 3 4]
gfs              2     gfs2     00030002 FAIL_START_WAIT 2 200030003 1
[1 3 4]
gfs              2     gfs3     00010001 FAIL_START_WAIT 2 200030003 1
[1 3 4]
gfs              2     gfs4     00050002 FAIL_START_WAIT 2 200030003 1
[1 3 4]


Version-Release number of selected component (if applicable):
2.6.18-1.2910.el5
cman-2.0.56-2.el5
Comment 1 David Teigland 2007-01-11 14:17:57 EST
There was a bug (now fixed) in the group_tool -v display code that
caused the FAIL_START_WAIT state to be reported for most of these groups
when in fact the state is "none" for all but group 2:gfs1.

2:gfs1 is still in FAIL_START_WAIT
nodes 3 and 4 have started 2:gfs1 (have 1 under local_done)
node 1 has _not_ started 2:gfs1 (has 0 under local_done)
node 1 has not started the group because gfs-kernel is blocked
trying to acquire the transaction lock (1,2) in EX mode.

The last gfs message related to fs gfs1 on node1:
GFS: fsid=TAFT_CLUSTER:gfs1.2: jid=0: Acquiring the transaction lock...

taft-01:
Resource ffff810214a495c0 Name (len=24) "       1               2"
Master Copy
Granted Queue
0001012a PR Remote:   4 00010232
Conversion Queue
00010357 PR (EX)
Waiting Queue

taft-04:
Resource ffff8102062d69c0 Name (len=24) "       1               2"
Local Copy, Master is node 1
Granted Queue
00010232 PR Master:     0001012a
Conversion Queue
Waiting Queue

Unknown is why taft-04 is not releasing its PR lock on 1,2.
From studying this bug on other occasions, it doesn't appear that
taft-04 is using 1,2 for anything and wouldn't be able to release it.
Comment 2 David Teigland 2007-01-11 14:22:05 EST
Created attachment 145380 [details]
lock_dlm patch for debugging

attached a patch the prints info about the transaction lock in lock_dlm.
goal is to figure out why it's not being released when we expect it to.

This is the patch I sent to Bob to try on Dec 8 when we hit this bug
on the marathon cluster.  But it didn't happen again.
Comment 3 David Teigland 2007-01-18 15:35:31 EST
This looks similar to bug 218511, could be related.
Comment 4 Corey Marthaler 2007-02-01 16:33:29 EST
Hit this again today.

2.6.18-8.el5
cman-2.0.60-1.el5
Comment 5 David Teigland 2007-03-28 11:47:33 EDT
It would be good to get the smoke cluster set up to try to reproduce this
one with the debugging patch above applied.
Comment 6 Kiersten (Kerri) Anderson 2007-04-23 13:39:54 EDT
Fixing product name. Cluster Suite components were integrated into Enterprise
Linux for verion 5.0.
Comment 7 RHEL Product and Program Management 2007-04-23 13:43:49 EDT
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.
Comment 8 David Teigland 2007-04-27 10:22:38 EDT
There's a possibility that this was a case of conversion deadlock resolution
gone wrong which was fixed in bug 234086.  If that's the case there won't be
a clear resolution of this bug, it will just have to expire over time.

But, if this problem still exists, we'll need to reproduce it with the
debugging patch above.
Comment 9 Corey Marthaler 2007-06-05 12:10:21 EDT
Can't currently reproduce, moving off the release criteria.
Comment 10 David Teigland 2007-06-05 16:05:56 EDT
I was looking for a gfs bug before, but I think this is a dlm bug.
On taft-01 dlm recovery isn't finished either, so that would be what's
blocking the gfs recovery.  We need to enable dlm debugging to figure
out how where dlm recovery is at.
Comment 11 David Teigland 2007-06-05 18:04:17 EDT
comment 10 wasn't taking into account comment 1 where I discovered that
the group_tool output is incorrect, so the diagnosis in comment 1 still
stands as correct I believe.

Corey ran into another instance of this same bug today, I'll post the
relevant bits of the log tomorrow, but I believe we end up no further than
we were before.  (Why did I get caught by the group_tool misrepresentation
again?  I thought I'd fixed that, or is the fixed version still not on the
taft cluster?)

Comment 12 David Teigland 2007-06-06 15:52:50 EDT
In yesterday's instance of this bug we have the same situation as before:

taft-01
Resource ffff810211377780 Name (len=24) "       1               2"
Master Copy
Granted Queue
00cd0001 PR Remote:   3 01610001
Conversion Queue
00160001 PR (EX)
Waiting Queue

taft-03
Resource ffff81020beeecc0 Name (len=24) "       1               2"
Local Copy, Master is node 1
Granted Queue
01610001 PR Master:     00cd0001
Conversion Queue
Waiting Queue

And gfs's view of 1,2 on taft-03 is:
Glock (1, 2)
  gl_flags = 2 
  gl_count = 2
  gl_state = 3
  req_gh = no
  req_bh = no
  lvb_count = 0
  object = no
  new_le = no
  incore_le = no
  reclaim = no
  aspace = no
  ail_bufs = no

A couple of possible reasons why taft-03 isn't releasing its lock:
- the blocking callback is in-progress in gfs for the trans lock, but stuck
somewhere prior to the lock being released
- the blocking callback has been delivered to lock_dlm, but isn't being
picked up by a lock_dlm thread (are they both busy?) for delivery back to gfs

A sysrq-t the next time this happens might give us some more clues.
Comment 13 Corey Marthaler 2007-08-27 17:14:00 EDT
Appear to have hit this bug, attaching the sysrq-t dumps.
Comment 14 Corey Marthaler 2007-08-27 17:15:32 EDT
Created attachment 174441 [details]
log from taft-01
Comment 15 Corey Marthaler 2007-08-27 17:16:20 EDT
Created attachment 174461 [details]
log from taft-02
Comment 16 Corey Marthaler 2007-08-27 17:17:01 EDT
Created attachment 174501 [details]
log from taft-03
Comment 17 Corey Marthaler 2007-08-27 17:18:01 EDT
Created attachment 174521 [details]
log from taft-04
Comment 18 Corey Marthaler 2007-08-27 17:21:07 EDT
[root@taft-01 tmp]# group_tool -v
type             level name     id       state node id local_done
fence            0     default  00010004 none
[2 3 4 5]
dlm              1     clvmd    00020004 none
[2 3 4 5]
dlm              1     gfs1     00020002 none
[2 4 5]
dlm              1     gfs2     00040002 none
[2 4 5]
gfs              2     gfs1     00010002 none
[2 4 5]
gfs              2     gfs2     00030002 FAIL_START_WAIT 3 300030003 1
[2 4 5]


[root@taft-02 tmp]# group_tool -v
type             level name     id       state node id local_done
fence            0     default  00010004 none
[2 3 4 5]
dlm              1     clvmd    00020004 none
[2 3 4 5]
dlm              1     gfs1     00020002 none
[2 4 5]
dlm              1     gfs2     00040002 none
[2 4 5]
gfs              2     gfs1     00010002 none
[2 4 5]
gfs              2     gfs2     00030002 FAIL_START_WAIT 3 300030003 0
[2 4 5]


[root@taft-03 tmp]# group_tool -v
type             level name     id       state node id local_done
fence            0     default  00010004 none
[2 3 4 5]
dlm              1     clvmd    00020004 none
[2 3 4 5]


[root@taft-04 tmp]# group_tool -v
type             level name     id       state node id local_done
fence            0     default  00010004 none
[2 3 4 5]
dlm              1     clvmd    00020004 none
[2 3 4 5]
dlm              1     gfs1     00020002 none
[2 4 5]
dlm              1     gfs2     00040002 none
[2 4 5]
gfs              2     gfs1     00010002 none
[2 4 5]
gfs              2     gfs2     00030002 FAIL_START_WAIT 3 300030003 1
[2 4 5]
Comment 19 David Teigland 2008-06-24 13:33:50 EDT
Will close in August after a year of inactivity.
Comment 20 David Teigland 2008-11-11 17:31:47 EST
Corey reproduced this on grant-01,02,03.

grant-01
type             level name     id       state node id local_done
fence            0     default  00010001 none        
[1 2 3]
dlm              1     clvmd    00070001 none        
[1 3]
dlm              1     A        00090001 none        
[1 3]
dlm              1     B        00020002 none        
[1 3]
gfs              2     A        00080001 FAIL_START_WAIT 2 200020003 0
[1 3]
gfs              2     B        00010002 none        
[1 3]

grant-03
type             level name     id       state node id local_done
fence            0     default  00010001 none        
[1 2 3]
dlm              1     clvmd    00070001 none        
[1 3]
dlm              1     A        00090001 none        
[1 3]
dlm              1     B        00020002 none        
[1 3]
gfs              2     A        00080001 FAIL_START_WAIT 2 200020003 1
[1 3]
gfs              2     B        00010002 none        
[1 3]

So gfs recovery is stuck for fs A on grant-01 (local_done is 0).  This is
confirmed by GFS recovery messages:

grant-01
GFS: fsid=GRANT:A.0: jid=1: Trying to acquire journal lock...
GFS: fsid=GRANT:B.0: jid=2: Trying to acquire journal lock...
GFS: fsid=GRANT:A.0: jid=1: Looking at journal...
GFS: fsid=GRANT:B.0: jid=2: Looking at journal...
GFS: fsid=GRANT:A.0: jid=1: Acquiring the transaction lock...
GFS: fsid=GRANT:B.0: jid=2: Acquiring the transaction lock...
GFS: fsid=GRANT:B.0: jid=2: Replaying journal...
GFS: fsid=GRANT:B.0: jid=2: Replayed 1 of 3 blocks
GFS: fsid=GRANT:B.0: jid=2: replays = 1, skips = 1, sames = 1
GFS: fsid=GRANT:B.0: jid=2: Journal replayed in 1s
GFS: fsid=GRANT:B.0: jid=2: Done

grant-03
GFS: fsid=GRANT:A.2: jid=1: Trying to acquire journal lock...
GFS: fsid=GRANT:B.1: jid=2: Trying to acquire journal lock...
GFS: fsid=GRANT:B.1: jid=2: Busy
GFS: fsid=GRANT:A.2: jid=1: Busy

The last message for fs A on grant-01 is "Trying to acquire journal lock".
And confirmed by ps ax -o pid,stat,cmd,wchan which shows gfs_recoverd on
grant-01:

11325 D<   [gfs_recoverd]              glock_wait_internal

We know that the first lock gfs_controld tries to acquire is the transaction
lock.  Looking in the dlm and gfs lock dumps confirms that gfs_recoverd for
fs A on grant-01 is waiting for the transaction lock (1,2):

grant-01
Resource ffff8101154fd680 Name (len=24) "       1               2"  
Master Copy
Granted Queue
00d30001 PR Remote:   3 00ce0001
Conversion Queue
032c0001 PR (EX)
Waiting Queue

grant-03
Resource ffff81021c530a80 Name (len=24) "       1               2"  
Local Copy, Master is node 1
Granted Queue
00ce0001 PR Master:     00d30001
Conversion Queue
Waiting Queue

grant-01
Glock (1, 2)
  gl_flags = 1 2 
  gl_count = 4
  gl_state = 3
  req_gh = yes
  req_bh = yes
  lvb_count = 0
  object = no
  new_le = no
  incore_le = no
  reclaim = no
  aspace = no
  ail_bufs = no
  Request
    owner = 11325
    gh_state = 1
    gh_flags = 2 4 5 10 12 
    error = 0
    gh_iflags = 1 
  Waiter3
    owner = 11325
    gh_state = 1
    gh_flags = 2 4 5 10 12 
    error = 0
    gh_iflags = 1 

grant-03
Glock (1, 2)
  gl_flags = 1 2 
  gl_count = 4
  gl_state = 3
  req_gh = yes
  req_bh = no
  lvb_count = 0
  object = no
  new_le = no
  incore_le = no
  reclaim = no
  aspace = no
  ail_bufs = no
  Request
    owner = -1
    gh_state = 0
    gh_flags = 0 
    error = 0
    gh_iflags = 2 4 5 
  Waiter2
    owner = -1
    gh_state = 0
    gh_flags = 0 
    error = 0
    gh_iflags = 2 4 5 

So the reason that gfs_recoverd for fs A on grant-01 can't acquire the
transaction lock in EX, is because gfs on grant-03 is still holding
it in PR.

What we didn't know before is whether or not grant-03 had received a
blocking callback for it, and if so why it wasn't releasing it (the
transaction lock).  Now, we have the following backtraces which show
that grant-03 has gotten the callback and is trying to release it.
gfs's lock_dlm1 thread handles blocking callbacks from the dlm, and
it's in the process of dropping the transaction lock, but for some
reason is stuck trying to flush the log, which is in completely
unfamiliar terrority for me.

grant-03

lock_dlm1     D ffff810123849980     0 11291     87         11292 11219
(L-TLB)
 ffff810110c9bd70 0000000000000046 ffff810110c9be2c 0000000280075ab9
 ffff810110c9be20 000000000000000a ffff81021d0d0860 ffff8101239cc0c0
 0000008181757811 00000000000016d8 ffff81021d0d0a48 0000000300000000
Call Trace:
 [<ffffffff800646ca>] __down_write_nested+0x7a/0x92
 [<ffffffff886197c5>] :gfs:log_flush_internal+0x16/0x26f
 [<ffffffff88602793>] :gfs:gfs_sync_meta+0x9/0x41
 [<ffffffff8860eac9>] :gfs:trans_go_drop_th+0x22/0x35
 [<ffffffff8860c480>] :gfs:run_queue+0x12e/0x35d
 [<ffffffff8860db32>] :gfs:blocking_cb+0x44/0x91
 [<ffffffff8861824c>] :gfs:gdlm_thread+0x609/0x668
 [<ffffffff8009ea08>] autoremove_wake_function+0x0/0x2e
 [<ffffffff8009e7f0>] keventd_create_kthread+0x0/0xc4
 [<ffffffff886182b2>] :gfs:gdlm_thread1+0x0/0xa
 [<ffffffff8009e7f0>] keventd_create_kthread+0x0/0xc4
 [<ffffffff800324f3>] kthread+0xfe/0x132
 [<ffffffff8005dfb1>] child_rip+0xa/0x11
 [<ffffffff8009e7f0>] keventd_create_kthread+0x0/0xc4
 [<ffffffff800323f5>] kthread+0x0/0x132
 [<ffffffff8005dfa7>] child_rip+0x0/0x11

gfs_logd      D ffff810123841400     0 11310     87         11311 11309
(L-TLB)
 ffff810110c75de0 0000000000000046 ffff81011da78b80 ffff81011da78b80
 0000000010000042 000000000000000a ffff81021ffa67e0 ffff810123925100
 0000005ea977e062 00000000000004c9 ffff81021ffa69c8 000000010000000a
Call Trace:
 [<ffffffff800646ca>] __down_write_nested+0x7a/0x92
 [<ffffffff88619029>] :gfs:gfs_ail_empty+0x27/0x11a
 [<ffffffff8009e7f0>] keventd_create_kthread+0x0/0xc4
 [<ffffffff8860239d>] :gfs:gfs_logd+0x18/0xc2
 [<ffffffff88602385>] :gfs:gfs_logd+0x0/0xc2
 [<ffffffff800324f3>] kthread+0xfe/0x132
 [<ffffffff8005dfb1>] child_rip+0xa/0x11
 [<ffffffff8009e7f0>] keventd_create_kthread+0x0/0xc4
 [<ffffffff800323f5>] kthread+0x0/0x132
 [<ffffffff8005dfa7>] child_rip+0x0/0x11

gfs_quotad    D ffff810123841400     0 11311     87         11312 11310
(L-TLB)
 ffff8101111ffe30 0000000000000046 ffff81011ced8c00 ffff81011ced8c00
 0000000010000042 000000000000000a ffff81021d2c2040 ffff810123925100
 0000006c5f559016 0000000000000a08 ffff81021d2c2228 000000010000000a
Call Trace:
 [<ffffffff800646ca>] __down_write_nested+0x7a/0x92
 [<ffffffff88626630>] :gfs:gfs_quota_sync+0xce/0x1c4
 [<ffffffff8009e7f0>] keventd_create_kthread+0x0/0xc4
 [<ffffffff88602534>] :gfs:gfs_quotad+0xed/0x16d
 [<ffffffff88602447>] :gfs:gfs_quotad+0x0/0x16d
 [<ffffffff800324f3>] kthread+0xfe/0x132
 [<ffffffff8005dfb1>] child_rip+0xa/0x11
 [<ffffffff8009e7f0>] keventd_create_kthread+0x0/0xc4
 [<ffffffff800323f5>] kthread+0x0/0x132
 [<ffffffff8005dfa7>] child_rip+0x0/0x11

gfs_inoded    D ffff81021d2c2040     0 11312     87         11327 11311
(L-TLB)
 ffff810110c17e40 0000000000000046 ffff81021d5e7040 ffff81021d5e7078
 ffff81011ced8c00 000000000000000a ffff81021d5e7040 ffff81021d2c2040
 00000061ea509c8a 00000000000008f4 ffff81021d5e7228 0000000136a54457
Call Trace:
 [<ffffffff8009e7f0>] keventd_create_kthread+0x0/0xc4
 [<ffffffff800646ca>] __down_write_nested+0x7a/0x92
 [<ffffffff8862ce87>] :gfs:unlinked_find+0x54/0xb3
 [<ffffffff886021e8>] :gfs:gfs_inoded+0x0/0x44
 [<ffffffff8862ceff>] :gfs:gfs_unlinked_dealloc+0x19/0xb3
 [<ffffffff886021e8>] :gfs:gfs_inoded+0x0/0x44
 [<ffffffff886021f6>] :gfs:gfs_inoded+0xe/0x44
 [<ffffffff800324f3>] kthread+0xfe/0x132
 [<ffffffff8005dfb1>] child_rip+0xa/0x11
 [<ffffffff8009e7f0>] keventd_create_kthread+0x0/0xc4
 [<ffffffff800323f5>] kthread+0x0/0x132
 [<ffffffff8005dfa7>] child_rip+0x0/0x11

d_doio        D ffff81000100c980     0 11362      1               11337
(NOTLB)
 ffff810110f3ba78 0000000000000086 00000000000003e8 ffffffff880f43f2
 0000000000000000 0000000000000009 ffff81011c7df820 ffff810123960040
 0000005e70a722b3 000000000000fa87 ffff81011c7dfa08 0000000223ae6e00
Call Trace:
 [<ffffffff880f43f2>] :dm_mod:__map_bio+0x48/0x119
 [<ffffffff8006dd27>] do_gettimeofday+0x40/0x8f
 [<ffffffff8001517e>] sync_buffer+0x0/0x3f
 [<ffffffff800638a1>] io_schedule+0x3f/0x67
 [<ffffffff800151b9>] sync_buffer+0x3b/0x3f
 [<ffffffff80063acd>] __wait_on_bit+0x40/0x6e
 [<ffffffff8001517e>] sync_buffer+0x0/0x3f
 [<ffffffff80063b67>] out_of_line_wait_on_bit+0x6c/0x78
 [<ffffffff8009ea36>] wake_bit_function+0x0/0x23
 [<ffffffff88603ef6>] :gfs:gfs_logbh_wait+0x26/0x70
 [<ffffffff88618ea2>] :gfs:disk_commit+0x2eb/0x44b
 [<ffffffff88619939>] :gfs:log_flush_internal+0x18a/0x26f
 [<ffffffff8861f7d7>] :gfs:do_write_buf+0x52b/0x67e
 [<ffffffff8861ef04>] :gfs:walk_vm+0x10e/0x311
 [<ffffffff8861f2ac>] :gfs:do_write_buf+0x0/0x67e
 [<ffffffff8861f1b3>] :gfs:__gfs_write+0xac/0xc6
 [<ffffffff800dc9bb>] do_readv_writev+0x198/0x295
 [<ffffffff8861f1f6>] :gfs:gfs_write+0x0/0x8
 [<ffffffff8861fd2a>] :gfs:gfs_llseek+0x75/0x84
 [<ffffffff800b54f0>] audit_syscall_entry+0x16e/0x1a1
 [<ffffffff800dcb42>] sys_writev+0x45/0x93
 [<ffffffff8005d28d>] tracesys+0xd5/0xe0
Comment 21 David Teigland 2008-11-11 17:34:14 EST
typo in comment 20:
"We know that the first lock gfs_controld tries to acquire" should be
"We know that the first lock gfs_recoverd tries to acquire"
Comment 22 David Teigland 2008-11-11 17:36:30 EST
Created attachment 323268 [details]
/var/log/messages from grant-03

This is the entire /var/log/messages file from grant-03 which
shows all backtraces from the sysrq-t.
Comment 23 David Teigland 2008-11-11 17:48:36 EST
The reproduction of this bug in comments 13-18 include sysrq backtraces,
but do not show the lock_dlm1 thread trying to drop the transaction
lock like the latest backtrace does.  I don't have an explanation for
that.
Comment 24 Robert Peterson 2008-12-08 16:46:09 EST
The lock_dlm1 thread is waiting for the sd_log_lock rw_sem which
prevents simultaneous calls to log_flush_internal.  The rw_sem is
held by d_doio, which seems to be in an I/O wait, waiting for its
log flush disk-commit to complete.  So why is its I/O blocked?
Comment 25 Steve Whitehouse 2008-12-10 10:47:06 EST
What has happened to this bug? It seems to have a lot of work which just stops with no conclusion. Is this still required, can we close it?
Comment 26 Steve Whitehouse 2009-01-16 12:35:22 EST
It looks to me like the issue is that waiting on I/O has not returned. Most likely this is due to waiting for some I/O that was not submitted. If this is recreatable, it ought to be possible to see with blktrace what the last bits of I/O were before it stops.

There are also some other things I spotted too:

In gfs_logbh_wait():
 if (!buffer_uptodate(bh) || buffer_dirty(bh)) {

shouldn't be testing for buffer_dirty(), but it might reasonably test for buffer_error() which makes more sense.

In gfs_logbh_init:
((unsigned long)data) & (PAGE_SIZE - 1) ought to be page_offset(data)

I can't see why there should be an issue with waiting for I/O though, so it might be more subtle.
Comment 27 Steve Whitehouse 2009-04-01 06:16:32 EDT
I have recently created a patch to fix something very similar in gfs2. I wondered why we'd never seen it before and this bug makes me think that perhaps we have.... anyway the bz in question is #483541.
Comment 29 Robert Peterson 2010-03-05 13:06:28 EST
IMHO, the biggest part of the problem here is that the transaction
glock is being bounced around when, in fact, it should remain held.
As part of RHEL4 bug #455696 I prevented that from happening with
this code:

@@ -2667,6 +2666,11 @@ clear_glock(struct gfs_glock *gl, unsigned int *unused)
        struct gfs_sbd *sdp = gl->gl_sbd;
        struct gfs_gl_hash_bucket *bucket = gl->gl_bucket;
 
+       /* If this isn't shutdown, keep the transaction glock around. */
+       if (sdp->sd_glockd_num && gl == sdp->sd_trans_gl) {
+               glock_put(gl);   /* see examine_bucket() */
+               return;
+       }
        spin_lock(&sdp->sd_reclaim_lock);
        if (!list_empty(&gl->gl_reclaim)) {
                list_del_init(&gl->gl_reclaim);

As of this writing, this patch was never shipped in a RHEL5 release
but exists as a patch for rhel5 crosswrite bug #491369.  So my
theory is that this bug is fixed by bug #491369 and the problem
should go away.  So I'm tempted to just close this as a duplicate.
Comment 30 Steve Whitehouse 2010-04-21 06:57:46 EDT
It isn't clear whether we still need this bug or not, either way I'm clear the devel ack until a solution is clearer.
Comment 31 Steve Whitehouse 2011-01-20 05:52:38 EST
Is it reasonable at this stage to close this as a dup of #491369?

The code from comment #29 appears to be already in RHEL5, so is there anything more to do here I wonder?

Note You need to log in before you can comment on or make changes to this bug.