672600 – GFS2: recovery stuck on transaction lock

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 672600 - GFS2: recovery stuck on transaction lock

Summary: GFS2: recovery stuck on transaction lock

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 6
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	6.1
Hardware:	All
OS:	Linux
Priority:	low
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	Robert Peterson
QA Contact:	Cluster QE
Docs Contact:
URL:
Whiteboard:
Depends On:	553803 570263 590878
Blocks:
TreeView+	depends on / blocked

Reported:	2011-01-25 17:19 UTC by Robert Peterson
Modified:	2011-05-23 20:38 UTC (History)
CC List:	9 users (show)
Fixed In Version:	kernel-2.6.32-112.el6
Doc Type:	Bug Fix
Doc Text:
Clone Of:	553803
Environment:
Last Closed:	2011-05-23 20:38:58 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
Patch to fix the problem (1.11 KB, patch) 2011-01-25 18:31 UTC, Robert Peterson	no flags	Details \| Diff
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2011:0542	0	normal	SHIPPED_LIVE	Important: Red Hat Enterprise Linux 6.1 kernel security, bug fix and enhancement update	2011-05-19 11:58:07 UTC

Description Robert Peterson 2011-01-25 17:19:44 UTC

+++ This bug was initially created as a clone of Bug #553803 +++

Description of problem:

While running revolver with three GFS2 file systems I got into a state where recovery for the third file system would not complete.

[GFS2] [revolver] Scenario iteration 3.6 started at Fri Jan  8 12:40:25 CST 2010
[GFS2] [revolver] Sleeping 5 minute(s) to let the I/O get its lock count up...
[GFS2] [revolver] Senario: DLM lose cluster communication on Quorum
[GFS2] [revolver]          (will need to physically kill last node)
[GFS2] [revolver] 
[GFS2] [revolver] Those picked to face the revolver... morph-04 morph-01 
[GFS2] [revolver] Killing network interfaces on morph-04...
[GFS2] [revolver] Feeling lucky morph-01? Well do ya? Go'head make my day...
[GFS2] [revolver] Didn't receive heartbeat for 2 seconds
[GFS2] [revolver] 
[GFS2] [revolver] Verifying nodes were removed from cluster
[GFS2] [revolver] Verified morph-01 was removed on morph-03
[GFS2] [revolver] Verified morph-04 was removed on morph-03
[GFS2] [revolver] Verifying that the dueler(s) are alive
...
[GFS2] [revolver] All killed nodes are back up (able to be pinged), making sure they're qarshable...
[GFS2] [revolver] still not all qarshable, sleeping another 10 seconds
[GFS2] [revolver] All killed nodes are now qarshable
[GFS2] [revolver] 
[GFS2] [revolver] Verifying that recovery properly took place (on the nodes that stayed in the cluster)
[GFS2] [revolver] checking that all of the cluster nodes are now/still cman members...
[GFS2] [revolver] checking fence recovery (state of each service)...
[GFS2] [revolver] checking dlm recovery (state of each service)...
[GFS2] [revolver] checking gfs recovery (state of each service)...
[GFS2] [revolver] gfs recovery is hung
[GFS2] [revolver] appears to be hung in FAIL_START_WAIT
[GFS2] [revolver] FI_engine: recover() method failed

[root@morph-03 ~]# group_tool
type             level name            id       state
fence            0     default         00010001 none
[1 3 4]
dlm              1     clvmd           00020001 none
[1 3 4]
dlm              1     morph-cluster0  00040001 none
[3]
dlm              1     morph-cluster1  00060001 none
[3]
dlm              1     morph-cluster2  00080001 none
[3]
gfs              2     morph-cluster0  00030001 none
[3]
gfs              2     morph-cluster1  00050001 none
[3]
gfs              2     morph-cluster2  00070001 FAIL_START_WAIT
[3]

[root@morph-03 morph-cluster:morph-cluster2]# cat /tmp/glocks.morph-cluster2 
G:  s:UN n:2/635a f:l t:EX d:EX/0 l:0 a:0 r:4
 H: s:EX f:cW e:0 p:2973 [gfs2_quotad] gfs2_statfs_sync+0x46/0x17d [gfs2]
G:  s:SH n:5/18 f: t:SH d:EX/0 l:0 a:0 r:3
 H: s:SH f:EH e:0 p:2956 [(ended)] gfs2_inode_lookup+0x12d/0x1f0 [gfs2]
G:  s:SH n:5/202b f: t:SH d:EX/0 l:0 a:0 r:3
 H: s:SH f:EH e:0 p:2956 [(ended)] gfs2_inode_lookup+0x12d/0x1f0 [gfs2]
G:  s:SH n:5/19 f: t:SH d:EX/0 l:0 a:0 r:3
 H: s:SH f:EH e:0 p:2956 [(ended)] gfs2_inode_lookup+0x12d/0x1f0 [gfs2]
G:  s:EX n:9/1 f: t:EX d:EX/0 l:0 a:0 r:3
 H: s:EX f:eH e:0 p:2956 [(ended)] gfs2_glock_nq_num+0x31/0x54 [gfs2]
G:  s:SH n:5/6374 f: t:SH d:EX/0 l:0 a:0 r:3
 H: s:SH f:EH e:0 p:2956 [(ended)] gfs2_inode_lookup+0x12d/0x1f0 [gfs2]
G:  s:SH n:5/635b f: t:SH d:EX/0 l:0 a:0 r:3
 H: s:SH f:EH e:0 p:2956 [(ended)] gfs2_inode_lookup+0x12d/0x1f0 [gfs2]
G:  s:SH n:5/299e0fb f: t:SH d:EX/0 l:0 a:0 r:3
 H: s:SH f:EH e:0 p:2988 [d_doio] gfs2_inode_lookup+0x12d/0x1f0 [gfs2]
G:  s:SH n:2/202b f: t:SH d:EX/0 l:0 a:0 r:4
 H: s:SH f:eEcH e:0 p:2956 [(ended)] init_journal+0x1e7/0x3aa [gfs2]
 I: n:5/8235 t:8 f:0x11 d:0x00000200 s:33554432/33554432
G:  s:SH n:2/403d f: t:SH d:EX/0 l:0 a:0 r:4
 H: s:SH f:ecH e:0 p:2971 [gfs2_recoverd] gfs2_recover_journal+0xbf/0x814 [gfs2]
 I: n:6/16445 t:8 f:0x10 d:0x00000200 s:33554432/33554432
G:  s:SH n:5/17 f: t:SH d:EX/0 l:0 a:0 r:3
 H: s:SH f:EH e:0 p:2956 [(ended)] gfs2_inode_lookup+0x12d/0x1f0 [gfs2]
G:  s:EX n:2/6155 f: t:EX d:EX/0 l:0 a:0 r:4
 H: s:EX f:H e:0 p:2956 [(ended)] init_per_node+0x1f6/0x27d [gfs2]
 I: n:13/24917 t:8 f:0x10 d:0x00000200 s:1048576/1048576
G:  s:SH n:5/6155 f: t:SH d:EX/0 l:0 a:0 r:3
 H: s:SH f:EH e:0 p:2956 [(ended)] gfs2_inode_lookup+0x12d/0x1f0 [gfs2]
G:  s:SH n:4/0 f: t:SH d:EX/0 l:0 a:0 r:2
G:  s:EX n:9/2 f: t:EX d:EX/0 l:0 a:0 r:3
 H: s:EX f:tecH e:0 p:2971 [gfs2_recoverd] gfs2_glock_nq_num+0x31/0x54 [gfs2]
G:  s:UN n:2/6374 f: t:UN d:EX/0 l:0 a:0 r:2
G:  s:SH n:5/6154 f: t:SH d:EX/0 l:0 a:0 r:3
 H: s:SH f:EH e:0 p:2956 [(ended)] gfs2_inode_lookup+0x12d/0x1f0 [gfs2]
G:  s:EX n:2/6153 f: t:EX d:EX/0 l:0 a:0 r:4
 H: s:EX f:H e:0 p:2956 [(ended)] init_per_node+0x15d/0x27d [gfs2]
 I: n:11/24915 t:8 f:0x10 d:0x00000201 s:16/16
G:  s:UN n:2/6359 f: t:UN d:EX/0 l:0 a:0 r:2
G:  s:SH n:1/1 f: t:SH d:EX/0 l:0 a:0 r:3
 H: s:SH f:eEH e:0 p:2956 [(ended)] gfs2_glock_nq_num+0x31/0x54 [gfs2]
G:  s:SH n:2/16 f: t:SH d:EX/0 l:0 a:0 r:3
 I: n:1/22 t:4 f:0x10 d:0x00000001 s:3864/3864
G:  s:SH n:5/635a f: t:SH d:EX/0 l:0 a:0 r:3
 H: s:SH f:EH e:0 p:2956 [(ended)] gfs2_inode_lookup+0x12d/0x1f0 [gfs2]
G:  s:SH n:2/17 f: t:SH d:EX/0 l:0 a:0 r:3
 I: n:2/23 t:4 f:0x10 d:0x00000201 s:3864/3864
G:  s:UN n:1/2 f:l t:SH d:EX/0 l:0 a:0 r:5
 H: s:SH f:epcW e:0 p:2971 [gfs2_recoverd] gfs2_recover_journal+0x182/0x814 [gfs2]
 H: s:SH f:W e:0 p:2988 [d_doio] gfs2_do_trans_begin+0xae/0x119 [gfs2]
G:  s:EX n:2/299e0fb f: t:EX d:UN/0 l:0 a:0 r:5
 H: s:EX f:H e:0 p:2988 [d_doio] gfs2_write_begin+0x47/0x34f [gfs2]
 H: s:EX f:W e:0 p:188 [pdflush] gfs2_write_inode+0x55/0x15a [gfs2]
 I: n:21/43639035 t:8 f:0x10 d:0x00000000 s:28562/28562
G:  s:SH n:5/6359 f: t:SH d:EX/0 l:0 a:0 r:3
 H: s:SH f:EH e:0 p:2956 [(ended)] gfs2_inode_lookup+0x12d/0x1f0 [gfs2]
G:  s:UN n:2/19 f: t:UN d:EX/0 l:0 a:0 r:2
G:  s:SH n:5/6153 f: t:SH d:EX/0 l:0 a:0 r:3
 H: s:SH f:EH e:0 p:2956 [(ended)] gfs2_inode_lookup+0x12d/0x1f0 [gfs2]
G:  s:UN n:1/3 f: t:UN d:EX/0 l:0 a:0 r:2
G:  s:SH n:2/604f f: t:SH d:EX/0 l:0 a:0 r:2
G:  s:SH n:5/604f f: t:SH d:EX/0 l:0 a:0 r:2
G:  s:SH n:5/403d f: t:SH d:EX/0 l:0 a:0 r:3
 H: s:SH f:EH e:0 p:2956 [(ended)] gfs2_inode_lookup+0x12d/0x1f0 [gfs2]
G:  s:EX n:2/6154 f: t:EX d:EX/0 l:0 a:0 r:4
 H: s:EX f:H e:0 p:2956 [(ended)] init_per_node+0x1ab/0x27d [gfs2]
 I: n:12/24916 t:8 f:0x10 d:0x00000201 s:24/24
G:  s:SH n:2/18 f: t:SH d:EX/0 l:0 a:0 r:3
 I: n:3/24 t:4 f:0x10 d:0x00000201 s:3864/3864
G:  s:UN n:2/635b f: t:UN d:EX/0 l:0 a:0 r:2
G:  s:SH n:5/16 f: t:SH d:EX/0 l:0 a:0 r:3
 H: s:SH f:EH e:0 p:2956 [(ended)] gfs2_inode_lookup+0x12d/0x1f0 [gfs2]


crash> bt 2973
PID: 2973   TASK: f2567550  CPU: 0   COMMAND: "gfs2_quotad"
 #0 [f24f6e4c] schedule at c061c528
 #1 [f24f6ec4] just_schedule at f8f04119
 #2 [f24f6ec8] __wait_on_bit at c061cdb2
 #3 [f24f6ee0] out_of_line_wait_on_bit at c061ce37
 #4 [f24f6f18] gfs2_glock_wait at f8f0410d
 #5 [f24f6f28] gfs2_statfs_sync at f8f197bb
 #6 [f24f6f78] quotad_check_timeo at f8f137fc
 #7 [f24f6f88] gfs2_quotad at f8f15172
 #8 [f24f6fcc] kthread at c0435f11
 #9 [f24f6fe4] kernel_thread_helper at c0405c51
crash> bt 2971
PID: 2971   TASK: f319a550  CPU: 1   COMMAND: "gfs2_recoverd"
 #0 [f2076dd0] schedule at c061c528
 #1 [f2076e48] just_schedule at f8f04119
 #2 [f2076e4c] __wait_on_bit at c061cdb2
 #3 [f2076e64] out_of_line_wait_on_bit at c061ce37
 #4 [f2076e9c] gfs2_glock_wait at f8f0410d
 #5 [f2076eac] gfs2_recover_journal at f8f16014
 #6 [f2076fc0] gfs2_recoverd at f8f166bd
 #7 [f2076fcc] kthread at c0435f11
 #8 [f2076fe4] kernel_thread_helper at c0405c51
crash> bt 2988
PID: 2988   TASK: f2096000  CPU: 1   COMMAND: "d_doio"
 #0 [f2098cac] schedule at c061c528
 #1 [f2098d24] just_schedule at f8f04119
 #2 [f2098d28] __wait_on_bit at c061cdb2
 #3 [f2098d40] out_of_line_wait_on_bit at c061ce37
 #4 [f2098d78] gfs2_glock_wait at f8f0410d
 #5 [f2098d88] gfs2_do_trans_begin at f8f1a4c6
 #6 [f2098da4] gfs2_write_begin at f8f0cc56
 #7 [f2098dd8] gfs2_file_buffered_write at f8f0e2d9
 #8 [f2098e40] __gfs2_file_aio_write_nolock at f8f0e6e4
 #9 [f2098ec4] gfs2_file_write_nolock at f8f0e897
#10 [f2098f60] gfs2_file_write at f8f0e9c8
#11 [f2098f84] vfs_write at c0475fe5
#12 [f2098f9c] sys_write at c04765d4
#13 [f2098fb8] system_call at c0404f10
    EAX: ffffffda  EBX: 00000004  ECX: 08feac00  EDX: 00000941 
    DS:  007b      ESI: 0057eca0  ES:  007b      EDI: 08feab9f
    SS:  007b      ESP: bfd31294  EBP: bfd312c8
    CS:  0073      EIP: 00f1a402  ERR: 00000004  EFLAGS: 00000246 
crash> bt 188
PID: 188    TASK: f7fa2550  CPU: 0   COMMAND: "pdflush"
 #0 [f7f8adcc] schedule at c061c528
 #1 [f7f8ae44] just_schedule at f8f04119
 #2 [f7f8ae48] __wait_on_bit at c061cdb2
 #3 [f7f8ae60] out_of_line_wait_on_bit at c061ce37
 #4 [f7f8ae98] gfs2_glock_wait at f8f0410d
 #5 [f7f8aea8] gfs2_write_inode at f8f1367d
 #6 [f7f8aef8] __writeback_single_inode at c0494433
 #7 [f7f8af34] sync_sb_inodes at c049481f
 #8 [f7f8af54] writeback_inodes at c0494a6e
 #9 [f7f8af68] wb_kupdate at c045e00f
#10 [f7f8afa4] pdflush at c045e53c
#11 [f7f8afcc] kthread at c0435f11
#12 [f7f8afe4] kernel_thread_helper at c0405c51
crash> bt 2959
PID: 2959   TASK: f2096550  CPU: 0   COMMAND: "lock_dlm"
 #0 [f208bf30] schedule at c061c528
 #1 [f208bfa8] gdlm_thread at f8e80534
 #2 [f208bfcc] kthread at c0435f11
 #3 [f208bfe4] kernel_thread_helper at c0405c51

Version-Release number of selected component (if applicable):
kernel-2.6.18-183.el5

How reproducible:
Unknown, it on iteration 3.6 in revolver


Steps to Reproduce:
1. run revolver
  
Actual results:


Expected results:


Additional info:

--- Additional comment from swhiteho on 2010-01-11 08:17:27 EST ---

I can't see anything wrong with this:

G:  s:UN n:1/2 f:l t:SH d:EX/0 l:0 a:0 r:5
 H: s:SH f:epcW e:0 p:2971 [gfs2_recoverd] gfs2_recover_journal+0x182/0x814
[gfs2]
 H: s:SH f:W e:0 p:2988 [d_doio] gfs2_do_trans_begin+0xae/0x119 [gfs2]


The question is what is holding the lock in a different mode? The only time we take the transaction lock in a mode other than shared is when freezing the fs. So it looks like another node has probably frozen the fs which is what is preventing recovery from proceeding.

--- Additional comment from nstraz on 2010-01-11 09:34:55 EST ---

The other two nodes had been rebooted by revolver, morph-03 should be the only node left with valid locks.

--- Additional comment from swhiteho on 2010-01-11 09:39:08 EST ---

Well GFS2 thinks something else is holding that lock, so I guess the next step is to look at the dlm lock dumps in that case to see what it thinks has happened.

--- Additional comment from nstraz on 2010-01-11 09:48:59 EST ---

Created attachment 382989 [details]
dlm lock dump from morph-03, morph-cluster2 lockspace

Here are the dlm locks still on morph-03.

--- Additional comment from swhiteho on 2010-01-11 10:13:30 EST ---


So this is the corresponding dlm lock:

Resource f1c2aac0 Name (len=24) "       1               2"
Master Copy
Granted Queue
012b0002 NL
Conversion Queue
Waiting Queue


and it claims that there is one NL mode lock, which seems to be local. There is nothing in the queue for some reason. I would expect to see at least the SH lock request. Its not at all obvious what is going on here.

--- Additional comment from swhiteho on 2010-04-21 08:58:32 EDT ---

This looks very similar to #570363.

--- Additional comment from swhiteho on 2010-05-11 06:34:21 EDT ---

Dave, can you take a look at this and see if you can figure out whats going wrong. We have a node stuck on the transaction lock. This lock is only ever taken in the SH (PR) mode except when the fs is being frozen. Since there is no fsfreeze going on here, we'd not expect to see anything except the following transistions:

new -> PR
PR -> NL
NL -> PR
NL -> unlocked

It looks like the stuck node has sent a request to the dlm, but for some strange reason it doesn't appear to show up in the dlm lock dump.

Bearing in mind that this report was prior to the recent dlm fix (rhel6/upstream), I suppose that it is just possible that this is the same thing. I'm not sure how we can prove that though.

--- Additional comment from teigland on 2010-05-11 17:20:57 EDT ---

I'd look at the NOEXP conditions in lock_dlm and the cases where locks granted during recovery need to be dropped by lock_dlm and reacquired.  There were definately some subtle conditions and uncommon corner cases in those parts of lock_dlm -- I was pretty sure that stuff would break as a result of the lock_dlm rewrite/rework early in RHEL5 (which went along with me transfering the lock_dlm ownership to the gfs2 folks.)

The main rule was that only locks used for doing journal recovery could be acquired until journal recovery was complete.  If the dlm granted a lock apart from this (which happens due to dlm recovery), lock_dlm needed to release them and attempt to reacquire them.

--- Additional comment from swhiteho on 2010-07-14 08:22:40 EDT ---

The rule is still the same, and the code in lock_dlm which deals with that is pretty much unchanged in rhel5. We've just fixed a similar (the same?) bug in rhel6 where the problem was that glocks with noexp which were queued after recovery had started were occasionally getting stuck. This was happening where the reply from the DLM arrived after recovery had started, but before the noexp request had been queued (i.e. there was a race with another lock acquiring process).

That was relatively easy to fix, since the GLF_FROZEN flag indicated which glocks are frozen with pending replies. It will be more complicated to fix in rhel5 since most of the knowledge regarding frozen locks is within lock_dlm, but the place which needs to be updated in order to check for the particular condition found in rhel6 is in gfs2 (in the glock state machine).

So assuming for the moment that we have the same problem here, then the fix will have to be different, and I suspect more complicated.

--- Additional comment from swhiteho on 2010-12-01 06:51:38 EST ---

This might be related to #656032

--- Additional comment from swhiteho on 2011-01-24 08:08:12 EST ---

Bob, I'm wondering whether we should use this bz to apply the transaction lock bit of the #656032 patch. I'd prefer to use the upstream solution which is the one liner here:

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=846f40455276617275284a4b76b89311b4aed0b9

There is no need to add the flush back in the umount path, because it is already there (just hidden in one of the functions that are already called in that code path).

That way we can get part of the #656032 patch in sooner rather than later, since we know that this does fix a real bug.

--- Additional comment from rpeterso on 2011-01-24 09:21:31 EST ---

Excellent idea.  Requesting ack flags to get this into a release.

--- Additional comment from rpeterso on 2011-01-25 12:06:23 EST ---

Created attachment 475208 [details]
Patch to fix the problem

Here is the RHEL5 crosswrite patch.

Comment 1 RHEL Program Management 2011-01-25 17:30:52 UTC

This request was evaluated by Red Hat Product Management for inclusion
in a Red Hat Enterprise Linux maintenance release. Product Management has 
requested further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed 
products. This request is not yet committed for inclusion in an Update release.

Comment 2 Robert Peterson 2011-01-25 18:31:06 UTC

Created attachment 475236 [details]
Patch to fix the problem

Here is the patch I posted to rhkernel-list

Comment 3 Robert Peterson 2011-01-25 18:32:26 UTC

The patch was posted to rhkernel-list.  Changing status to POST.

Comment 5 Nate Straz 2011-04-11 19:57:41 UTC

Made it through 40+ iterations of revolver this weekend without hitting this.  Calling it verified.

Comment 6 errata-xmlrpc 2011-05-23 20:38:58 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0542.html

Note You need to log in before you can comment on or make changes to this bug.