Bug 290971
| Summary: | gfs umount deadlock gfs:glock_wait_internal | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 5 | Reporter: | Corey Marthaler <cmarthal> | ||||||
| Component: | gfs-utils | Assignee: | David Teigland <teigland> | ||||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | GFS Bugs <gfs-bugs> | ||||||
| Severity: | low | Docs Contact: | |||||||
| Priority: | medium | ||||||||
| Version: | 5.0 | CC: | sghosh | ||||||
| Target Milestone: | --- | ||||||||
| Target Release: | --- | ||||||||
| Hardware: | All | ||||||||
| OS: | Linux | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2009-03-11 03:46:59 UTC | Type: | --- | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Attachments: |
|
||||||||
Created attachment 195921 [details]
additional stack traces
The stack trace from umount looks like the typical mix of good/bad info, so I'm not sure it really tells us anything useful. An equally interesting but nonsensical trace: kernel: dlm_recoverd S ffffffff801405e0 0 11097 87 11104 11096 (L-TLB) kernel: ffff8101d2f23ea0 0000000000000046 ffff8101d336f800 0000000000000000 kernel: 0000000000000008 ffff8101d5e30100 ffff8101d36367a0 0000fd5028b168af kernel: 0000000000005638 ffff8101d5e302e8 0000000000000001 ffffffff884c371b kernel: Call Trace: kernel: [<ffffffff884c371b>] :dlm:dlm_clear_free_entries+0x15/0x4c kernel: [<ffffffff884cf10b>] :dlm:dlm_recover_status+0x10/0x22 kernel: [<ffffffff884cebd0>] :dlm:dlm_rcom_status+0x32/0x17a kernel: [<ffffffff80061aa4>] mutex_lock+0xd/0x1d kernel: [<ffffffff80061aa4>] mutex_lock+0xd/0x1d kernel: [<ffffffff8009b26b>] keventd_create_kthread+0x0/0x61 kernel: [<ffffffff884cfd40>] :dlm:dlm_recoverd+0x56/0x467 kernel: [<ffffffff884cfcea>] :dlm:dlm_recoverd+0x0/0x467 kernel: [<ffffffff80032163>] kthread+0xfe/0x132 kernel: [<ffffffff8005bfb1>] child_rip+0xa/0x11 kernel: [<ffffffff8009b26b>] keventd_create_kthread+0x0/0x61 kernel: [<ffffffff80032065>] kthread+0x0/0x132 kernel: [<ffffffff8005bfa7>] child_rip+0x0/0x11 Perhaps I'll be able to reproduce this on a machine with kdb installed. I'll be doing some mount/unmount stress testing soon related to some other code changes. Just a note that I reproduced this over the weekend. Created attachment 245811 [details] a mount/unmount test Ran the attached herd file collie -f bull-299601-comment52.h2 -e -A -i 0 for around 24 hours with current (and pending) upstream code (both kernel and user) with no failures. I think this is fixed based on my own recent mount/unmount testing. If it's not seen the next time the same mount_stress test is done, then it should be closed. I ran this test case over night and verified this bug is fixed. 2.6.18-53.1.4.el5 This request was previously evaluated by Red Hat Product Management for inclusion in the current Red Hat Enterprise Linux release, but Red Hat was unable to resolve it in time. This request will be reviewed for a future Red Hat Enterprise Linux release. Closed based on comment #11 |
Description of problem: Hit this while running mount_stress with 50 filesystems. It ran for quite awhileand the cluster appeared to be fine except for the hung umount cmd. ####### itr=152 (Thu Sep 13 22:25:16 CDT 2007) ####### unmounting on taft-02.../mnt/16.../mnt/13.../mnt/40.../mnt/33.../mnt/32... [root@taft-02 ~]# cman_tool nodes Node Sts Inc Joined Name 1 M 680 2007-09-10 16:56:49 taft-01 2 M 676 2007-09-10 16:56:49 taft-02 3 M 684 2007-09-10 16:56:52 taft-03 4 M 696 2007-09-11 08:45:19 taft-04 [root@taft-02 ~]# cman_tool services type level name id state fence 0 default 00010001 none [1 2 3 4] dlm 1 clvmd 00030001 none [1 2 3 4] dlm 1 32 07730003 none [2 3] dlm 1 3 05ee0002 none [2] dlm 1 37 05f00002 none [1 2] dlm 1 46 07320003 none [2 3] dlm 1 29 075a0003 none [1 2 3] dlm 1 49 07640003 none [1 2 3] dlm 1 8 07750003 none [1 2 3] dlm 1 19 07440003 none [1 2 3] dlm 1 25 05f40002 none [1 2] dlm 1 42 076b0003 none [1 2 3] dlm 1 11 07460003 none [1 2 3] dlm 1 6 05f60002 none [1 2] dlm 1 39 07360003 none [1 2 3] dlm 1 18 076f0003 none [2 3] dlm 1 2 05f80002 none [1 2] dlm 1 45 076d0003 none [1 2 3] dlm 1 20 07580003 none [2 3] dlm 1 15 07560003 none [2 3] dlm 1 50 07300003 none [1 2 3] dlm 1 31 07600003 none [2 3] dlm 1 24 05fa0002 none [1 2] dlm 1 47 07690003 none [2 3] dlm 1 36 074a0003 none [1 2 3] dlm 1 22 07770003 none [1 2 3] dlm 1 43 05fc0002 none [1 2] dlm 1 35 07480003 none [2 3] dlm 1 10 05fe0002 none [2] dlm 1 28 074e0003 none [1 2 3] dlm 1 14 073a0003 none [2 3] dlm 1 30 073c0003 none [2 3] dlm 1 48 07400003 none [2 3] gfs 2 32 07720003 none [2 3] gfs 2 3 05ed0002 none [2] gfs 2 37 05ef0002 none [1 2] gfs 2 46 07310003 none [2 3] gfs 2 29 07590003 none [1 2 3] gfs 2 49 07630003 none [1 2 3] gfs 2 8 07740003 none [1 2 3] gfs 2 19 07430003 none [1 2 3] gfs 2 25 05f30002 none [1 2] gfs 2 42 076a0003 none [1 2 3] gfs 2 11 07450003 none [1 2 3] gfs 2 6 05f50002 none [1 2] gfs 2 39 07350003 none [1 2 3] gfs 2 18 076e0003 none [2 3] gfs 2 2 05f70002 none [1 2] gfs 2 45 076c0003 none [1 2 3] gfs 2 20 07570003 none [2 3] gfs 2 15 07550003 none [2 3] gfs 2 50 072f0003 none [1 2 3] gfs 2 31 075f0003 none [2 3] gfs 2 24 05f90002 none [1 2] gfs 2 47 07680003 none [2 3] gfs 2 36 07490003 none [1 2 3] gfs 2 22 07760003 none [1 2 3] gfs 2 43 05fb0002 none [1 2] gfs 2 35 07470003 none [2 3] gfs 2 10 05fd0002 none [2] gfs 2 28 074d0003 none [1 2 3] gfs 2 14 07390003 none [2 3] gfs 2 30 073b0003 none [2 3] gfs 2 48 073f0003 none [2 3] umount.gfs D ffffffff801405e0 0 11426 11425 (NOTLB) ffff8101d3f49c18 0000000000000086 ffff8101ddbd0800 ffffffff884ca776 0000000000000007 ffff8101d33f1860 ffff8101fff15100 0000fd546c2d9664 0000000000009b3d ffff8101d33f1a48 ffff810200000001 ffffffff8856a173 Call Trace: [<ffffffff884ca776>] :dlm:dlm_put_lockspace+0x10/0x1f [<ffffffff8856a173>] :lock_dlm:gdlm_ast+0x0/0x2 [<ffffffff800610f7>] wait_for_completion+0x79/0xa2 [<ffffffff800884a1>] default_wake_function+0x0/0xe [<ffffffff8858fa45>] :gfs:glock_wait_internal+0x156/0x2bc [<ffffffff8858ff40>] :gfs:gfs_glock_nq+0x395/0x3d6 [<ffffffff8858ff97>] :gfs:gfs_glock_nq_init+0x16/0x2a [<ffffffff885abdb4>] :gfs:gfs_statfs_sync+0x31/0x175 [<ffffffff885ac405>] :gfs:gfs_make_fs_ro+0x3c/0xae [<ffffffff885a5115>] :gfs:gfs_put_super+0xd5/0x1ca [<ffffffff800d8e34>] generic_shutdown_super+0x79/0xfb [<ffffffff800d8edc>] kill_block_super+0x26/0x3a [<ffffffff800d8faa>] deactivate_super+0x6a/0x82 [<ffffffff800e1d14>] sys_umount+0x245/0x27b [<ffffffff800b279c>] audit_syscall_entry+0x14d/0x180 [<ffffffff8005b28d>] tracesys+0xd5/0xe0 Version-Release number of selected component (if applicable): 2.6.18-45.el5