Bug 290971

Summary: gfs umount deadlock gfs:glock_wait_internal
Product: Red Hat Enterprise Linux 5 Reporter: Corey Marthaler <cmarthal>
Component: gfs-utilsAssignee: David Teigland <teigland>
Status: CLOSED CURRENTRELEASE QA Contact: GFS Bugs <gfs-bugs>
Severity: low Docs Contact:
Priority: medium    
Version: 5.0CC: sghosh
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-03-11 03:46:59 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
additional stack traces
none
a mount/unmount test none

Description Corey Marthaler 2007-09-14 15:04:23 UTC
Description of problem:
Hit this while running mount_stress with 50 filesystems. It ran for quite
awhileand the cluster appeared to be fine except for the hung umount cmd.

####### itr=152 (Thu Sep 13 22:25:16 CDT 2007) #######
unmounting on taft-02.../mnt/16.../mnt/13.../mnt/40.../mnt/33.../mnt/32...     
        


[root@taft-02 ~]# cman_tool nodes
Node  Sts   Inc   Joined               Name
   1   M    680   2007-09-10 16:56:49  taft-01
   2   M    676   2007-09-10 16:56:49  taft-02
   3   M    684   2007-09-10 16:56:52  taft-03
   4   M    696   2007-09-11 08:45:19  taft-04
[root@taft-02 ~]# cman_tool services
type             level name     id       state
fence            0     default  00010001 none
[1 2 3 4]
dlm              1     clvmd    00030001 none
[1 2 3 4]
dlm              1     32       07730003 none
[2 3]
dlm              1     3        05ee0002 none
[2]
dlm              1     37       05f00002 none
[1 2]
dlm              1     46       07320003 none
[2 3]
dlm              1     29       075a0003 none
[1 2 3]
dlm              1     49       07640003 none
[1 2 3]
dlm              1     8        07750003 none
[1 2 3]
dlm              1     19       07440003 none
[1 2 3]
dlm              1     25       05f40002 none
[1 2]
dlm              1     42       076b0003 none
[1 2 3]
dlm              1     11       07460003 none
[1 2 3]
dlm              1     6        05f60002 none
[1 2]
dlm              1     39       07360003 none
[1 2 3]
dlm              1     18       076f0003 none
[2 3]
dlm              1     2        05f80002 none
[1 2]
dlm              1     45       076d0003 none
[1 2 3]
dlm              1     20       07580003 none
[2 3]
dlm              1     15       07560003 none
[2 3]
dlm              1     50       07300003 none
[1 2 3]
dlm              1     31       07600003 none
[2 3]
dlm              1     24       05fa0002 none
[1 2]
dlm              1     47       07690003 none
[2 3]
dlm              1     36       074a0003 none
[1 2 3]
dlm              1     22       07770003 none
[1 2 3]
dlm              1     43       05fc0002 none
[1 2]
dlm              1     35       07480003 none
[2 3]
dlm              1     10       05fe0002 none
[2]
dlm              1     28       074e0003 none
[1 2 3]
dlm              1     14       073a0003 none
[2 3]
dlm              1     30       073c0003 none
[2 3]
dlm              1     48       07400003 none
[2 3]
gfs              2     32       07720003 none
[2 3]
gfs              2     3        05ed0002 none
[2]
gfs              2     37       05ef0002 none
[1 2]
gfs              2     46       07310003 none
[2 3]
gfs              2     29       07590003 none
[1 2 3]
gfs              2     49       07630003 none
[1 2 3]
gfs              2     8        07740003 none
[1 2 3]
gfs              2     19       07430003 none
[1 2 3]
gfs              2     25       05f30002 none
[1 2]
gfs              2     42       076a0003 none
[1 2 3]
gfs              2     11       07450003 none
[1 2 3]
gfs              2     6        05f50002 none
[1 2]
gfs              2     39       07350003 none
[1 2 3]
gfs              2     18       076e0003 none
[2 3]
gfs              2     2        05f70002 none
[1 2]
gfs              2     45       076c0003 none
[1 2 3]
gfs              2     20       07570003 none
[2 3]
gfs              2     15       07550003 none
[2 3]
gfs              2     50       072f0003 none
[1 2 3]
gfs              2     31       075f0003 none
[2 3]
gfs              2     24       05f90002 none
[1 2]
gfs              2     47       07680003 none
[2 3]
gfs              2     36       07490003 none
[1 2 3]
gfs              2     22       07760003 none
[1 2 3]
gfs              2     43       05fb0002 none
[1 2]
gfs              2     35       07470003 none
[2 3]
gfs              2     10       05fd0002 none
[2]
gfs              2     28       074d0003 none
[1 2 3]
gfs              2     14       07390003 none
[2 3]
gfs              2     30       073b0003 none
[2 3]
gfs              2     48       073f0003 none
[2 3]



umount.gfs    D ffffffff801405e0     0 11426  11425                     (NOTLB)
 ffff8101d3f49c18 0000000000000086 ffff8101ddbd0800 ffffffff884ca776
 0000000000000007 ffff8101d33f1860 ffff8101fff15100 0000fd546c2d9664
 0000000000009b3d ffff8101d33f1a48 ffff810200000001 ffffffff8856a173
Call Trace:
 [<ffffffff884ca776>] :dlm:dlm_put_lockspace+0x10/0x1f
 [<ffffffff8856a173>] :lock_dlm:gdlm_ast+0x0/0x2
 [<ffffffff800610f7>] wait_for_completion+0x79/0xa2
 [<ffffffff800884a1>] default_wake_function+0x0/0xe
 [<ffffffff8858fa45>] :gfs:glock_wait_internal+0x156/0x2bc
 [<ffffffff8858ff40>] :gfs:gfs_glock_nq+0x395/0x3d6
 [<ffffffff8858ff97>] :gfs:gfs_glock_nq_init+0x16/0x2a
 [<ffffffff885abdb4>] :gfs:gfs_statfs_sync+0x31/0x175
 [<ffffffff885ac405>] :gfs:gfs_make_fs_ro+0x3c/0xae
 [<ffffffff885a5115>] :gfs:gfs_put_super+0xd5/0x1ca
 [<ffffffff800d8e34>] generic_shutdown_super+0x79/0xfb
 [<ffffffff800d8edc>] kill_block_super+0x26/0x3a
 [<ffffffff800d8faa>] deactivate_super+0x6a/0x82
 [<ffffffff800e1d14>] sys_umount+0x245/0x27b
 [<ffffffff800b279c>] audit_syscall_entry+0x14d/0x180
 [<ffffffff8005b28d>] tracesys+0xd5/0xe0



Version-Release number of selected component (if applicable):
2.6.18-45.el5

Comment 1 Corey Marthaler 2007-09-14 15:07:25 UTC
Created attachment 195921 [details]
additional stack traces

Comment 2 David Teigland 2007-09-14 19:00:36 UTC
The stack trace from umount looks like the typical mix of good/bad info,
so I'm not sure it really tells us anything useful. An equally interesting but 
nonsensical trace:

kernel: dlm_recoverd  S ffffffff801405e0     0 11097     87         11104 11096
(L-TLB)     
kernel:  ffff8101d2f23ea0 0000000000000046 ffff8101d336f800 0000000000000000
kernel:  0000000000000008 ffff8101d5e30100 ffff8101d36367a0 0000fd5028b168af
kernel:  0000000000005638 ffff8101d5e302e8 0000000000000001 ffffffff884c371b
kernel: Call Trace:
kernel:  [<ffffffff884c371b>] :dlm:dlm_clear_free_entries+0x15/0x4c
kernel:  [<ffffffff884cf10b>] :dlm:dlm_recover_status+0x10/0x22
kernel:  [<ffffffff884cebd0>] :dlm:dlm_rcom_status+0x32/0x17a
kernel:  [<ffffffff80061aa4>] mutex_lock+0xd/0x1d
kernel:  [<ffffffff80061aa4>] mutex_lock+0xd/0x1d
kernel:  [<ffffffff8009b26b>] keventd_create_kthread+0x0/0x61
kernel:  [<ffffffff884cfd40>] :dlm:dlm_recoverd+0x56/0x467
kernel:  [<ffffffff884cfcea>] :dlm:dlm_recoverd+0x0/0x467
kernel:  [<ffffffff80032163>] kthread+0xfe/0x132
kernel:  [<ffffffff8005bfb1>] child_rip+0xa/0x11
kernel:  [<ffffffff8009b26b>] keventd_create_kthread+0x0/0x61
kernel:  [<ffffffff80032065>] kthread+0x0/0x132
kernel:  [<ffffffff8005bfa7>] child_rip+0x0/0x11

Perhaps I'll be able to reproduce this on a machine with kdb installed.
I'll be doing some mount/unmount stress testing soon related to some other
code changes.


Comment 3 Corey Marthaler 2007-09-17 13:40:10 UTC
Just a note that I reproduced this over the weekend.

Comment 8 David Teigland 2007-11-01 16:17:26 UTC
Created attachment 245811 [details]
a mount/unmount test

Ran the attached herd file
  collie -f bull-299601-comment52.h2 -e -A -i 0

for around 24 hours with current (and pending) upstream
code (both kernel and user) with no failures.

Comment 10 David Teigland 2007-12-04 17:48:39 UTC
I think this is fixed based on my own recent mount/unmount testing.
If it's not seen the next time the same mount_stress test is done,
then it should be closed.


Comment 11 Corey Marthaler 2007-12-19 16:26:33 UTC
I ran this test case over night and verified this bug is fixed.

2.6.18-53.1.4.el5

Comment 12 RHEL Program Management 2008-03-11 19:41:00 UTC
This request was previously evaluated by Red Hat Product Management
for inclusion in the current Red Hat Enterprise Linux release, but
Red Hat was unable to resolve it in time.  This request will be
reviewed for a future Red Hat Enterprise Linux release.

Comment 13 Subhendu Ghosh 2009-03-11 03:46:59 UTC
Closed based on comment #11