Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
For bugs related to Red Hat Enterprise Linux 5 product line. The current stable release is 5.10. For Red Hat Enterprise Linux 6 and above, please visit Red Hat JIRA https://issues.redhat.com/secure/CreateIssue!default.jspa?pid=12332745 to report new issues.

Bug 508876

Summary: umount.gfs2 hangs eating CPU
Product: Red Hat Enterprise Linux 5 Reporter: Jaroslav Kortus <jkortus>
Component: kernelAssignee: Steve Whitehouse <swhiteho>
Status: CLOSED ERRATA QA Contact: Cluster QE <mspqa-list>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 5.4CC: adas, bmarzins, cluster-maint, djansa, dzickus, edamato, lwang, phan, rpeterso, swhiteho
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-09-02 08:12:37 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 514700    
Attachments:
Description Flags
glocks from a3
none
Proposed fix none

Description Jaroslav Kortus 2009-06-30 11:26:21 UTC
Created attachment 349941 [details]
glocks from a3

Description of problem:
umount.gfs2 hangs and does not finish unmounting the filesystem

Version-Release number of selected component (if applicable):
gfs2-utils-0.1.58-1.el5
kernel-2.6.18-154.el5


How reproducible:
75%

Steps to Reproduce:
1. create GFS2 FS, mount it
2. generate load
3. umount
  
Actual results:
umount hangs on one of the nodes eating all cpu

Expected results:
umount unmounts the filesystem

Additional info:
This happens during our brawl tests. After some load is generated and the test ends umount is called. However, this call seems to hang quite often. I've seen it happening on different nodes of the cluster, even after different kind of loads. Only thing they have in common is the umount call, the other cirumcstances differ.


Glocks from a3 are attached. On a1 and a2 the FS is already unmounted.


Additional info, thread dumps and kernel dumps follow:

PID: 18744  TASK: e000000121b60000  CPU: 1   COMMAND: "dlm_astd"
 #0 [BSP:e000000121b61228] schedule at a000000100663b00
 #1 [BSP:e000000121b611d8] rwsem_down_read_failed at a0000001006678a0
 #2 [BSP:e000000121b611b8] down_read at a0000001000b7cd0
 #3 [BSP:e000000121b61188] gfs2_glock_cb at a000000203013640
 #4 [BSP:e000000121b61148] gdlm_ast at a000000202e09c70
 #5 [BSP:e000000121b61110] dlm_astd at a000000202e8c500
 #6 [BSP:e000000121b610c8] kthread at a0000001000b0550
 #7 [BSP:e000000121b610a0] kernel_thread_helper at a000000100012210
 #8 [BSP:e000000121b610a0] start_kernel_thread at a0000001000090c0

PID: 4325   TASK: e00000404cc00000  CPU: 0   COMMAND: "umount.gfs2"
(active)
crash>

[root@a3 ~]# mount
/dev/mapper/VolGroup00-LogVol00 on / type ext3 (rw)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
/dev/sda1 on /boot/efi type vfat (rw)
tmpfs on /dev/shm type tmpfs (rw)
debugfs on /sys/kernel/debug type debugfs (rw)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
none on /sys/kernel/config type configfs (rw)
/dev/mapper/brawl-brawl0 on /mnt/brawl type gfs2 (rw,hostdata=jid=2:id=589825:first=0,debug)


[brawl] [gfs2-1k-blk] [setup] <start name="mkfs" pid="10810" time="Mon Jun 29 13:27:45 2009" type="cmd" />
[brawl] [gfs2-1k-blk] [setup] [mkfs] Device:                    /dev/brawl/brawl0
[brawl] [gfs2-1k-blk] [setup] [mkfs] Blocksize:                 1024
[brawl] [gfs2-1k-blk] [setup] [mkfs] Device Size                500.00 GB (524288000 blocks)
[brawl] [gfs2-1k-blk] [setup] [mkfs] Filesystem Size:           500.00 GB (524287999 blocks)
[brawl] [gfs2-1k-blk] [setup] [mkfs] Journals:                  3
[brawl] [gfs2-1k-blk] [setup] [mkfs] Resource Groups:           2000
[brawl] [gfs2-1k-blk] [setup] [mkfs] Locking Protocol:          "lock_dlm"
[brawl] [gfs2-1k-blk] [setup] [mkfs] Lock Table:                "a_cluster:brawl0"
[brawl] [gfs2-1k-blk] [setup] [mkfs] UUID:                      436B43C0-C920-9E34-040F-D3CF4B6E9A76

[brawl] [gfs2-1k-blk] [post] <start name="umount" pid="16307" time="Mon Jun 29 16:33:40 2009" type="cmd" />
[brawl] [gfs2-1k-blk] <start name="post" pid="16306" time="Mon Jun 29 16:33:40 2009" type="cmd" />






Jun 30 05:51:50 a3 kernel: SysRq : Show CPUs
Jun 30 05:51:50 a3 kernel: BUG: warning at arch/ia64/kernel/smp.c:455/smp_call_function() (Tainted: G     )
Jun 30 05:51:50 a3 kernel:
Jun 30 05:51:50 a3 kernel: Call Trace:
Jun 30 05:51:50 a3 kernel:  [<a000000100013b40>] show_stack+0x40/0xa0
Jun 30 05:51:50 a3 kernel:                                 sp=e00000010e7efc30 bsp=e00000010e7e9460
Jun 30 05:51:50 a3 kernel:  [<a000000100013bd0>] dump_stack+0x30/0x60
Jun 30 05:51:50 a3 kernel:                                 sp=e00000010e7efe00 bsp=e00000010e7e9448
Jun 30 05:51:50 a3 kernel:  [<a000000100056400>] smp_call_function+0x120/0x400
Jun 30 05:51:50 a3 kernel:                                 sp=e00000010e7efe00 bsp=e00000010e7e9400
Jun 30 05:51:50 a3 kernel:  [<a00000010008a820>] on_each_cpu+0x40/0xe0
Jun 30 05:51:50 a3 kernel:                                 sp=e00000010e7efe20 bsp=e00000010e7e93b8
Jun 30 05:51:50 a3 kernel:  [<a0000001003db810>] sysrq_handle_showcpus+0x30/0x60
Jun 30 05:51:50 a3 kernel:                                 sp=e00000010e7efe20 bsp=e00000010e7e93a0
Jun 30 05:51:50 a3 kernel:  [<a0000001003db9a0>] __handle_sysrq+0x160/0x300
Jun 30 05:51:50 a3 kernel:                                 sp=e00000010e7efe20 bsp=e00000010e7e9350
Jun 30 05:51:50 a3 kernel:  [<a00000010020be10>] write_sysrq_trigger+0xb0/0xe0
Jun 30 05:51:50 a3 kernel:                                 sp=e00000010e7efe20 bsp=e00000010e7e9320
Jun 30 05:51:50 a3 kernel:  [<a0000001001779e0>] vfs_write+0x200/0x3a0
Jun 30 05:51:51 a3 kernel:                                 sp=e00000010e7efe20 bsp=e00000010e7e92d0
Jun 30 05:51:51 a3 kernel:  [<a000000100178530>] sys_write+0x70/0xe0
Jun 30 05:51:51 a3 kernel:                                 sp=e00000010e7efe20 bsp=e00000010e7e9258
Jun 30 05:51:51 a3 kernel:  [<a00000010000bd70>] __ia64_trace_syscall+0xd0/0x110
Jun 30 05:51:51 a3 kernel:                                 sp=e00000010e7efe30 bsp=e00000010e7e9258
Jun 30 05:51:51 a3 kernel:  [<a000000000010620>] __start_ivt_text+0xffffffff00010620/0x400
Jun 30 05:51:51 a3 kernel:                                 sp=e00000010e7f0000 bsp=e00000010e7e9258


Jun 30 05:51:51 a3 kernel: CPU1:
Jun 30 05:51:51 a3 kernel: 
Jun 30 05:51:51 a3 kernel: Call Trace:
Jun 30 05:51:51 a3 kernel:  [<a000000100013b40>] show_stack+0x40/0xa0
Jun 30 05:51:51 a3 kernel:                                 sp=e00000404cc079e0 bsp=e00000404cc01490
Jun 30 05:51:51 a3 kernel:  [<a0000001003dbc70>] showacpu+0x50/0x80
Jun 30 05:51:51 a3 kernel:                                 sp=e00000404cc07bb0 bsp=e00000404cc01470
Jun 30 05:51:51 a3 kernel:  [<a000000100055ee0>] handle_IPI+0x160/0x380
Jun 30 05:51:51 a3 kernel:                                 sp=e00000404cc07bb0 bsp=e00000404cc01438
Jun 30 05:51:51 a3 kernel:  [<a0000001000ef1b0>] handle_IRQ_event+0x130/0x260
Jun 30 05:51:51 a3 kernel:                                 sp=e00000404cc07bb0 bsp=e00000404cc013f0
Jun 30 05:51:51 a3 kernel:  [<a0000001000ef410>] __do_IRQ+0x130/0x420
Jun 30 05:51:51 a3 kernel:                                 sp=e00000404cc07bb0 bsp=e00000404cc013a8
Jun 30 05:51:51 a3 kernel:  [<a0000001000117c0>] ia64_handle_irq+0x160/0x200
Jun 30 05:51:51 a3 kernel:                                 sp=e00000404cc07bb0 bsp=e00000404cc01378
Jun 30 05:51:51 a3 kernel:  [<a00000010000bfe0>] __ia64_leave_kernel+0x0/0x280
Jun 30 05:51:51 a3 kernel:                                 sp=e00000404cc07bb0 bsp=e00000404cc01378
Jun 30 05:51:51 a3 kernel:  [<a0000001001bb610>] invalidate_inodes+0x130/0x220
Jun 30 05:51:51 a3 kernel:                                 sp=e00000404cc07d80 bsp=e00000404cc01320
Jun 30 05:51:51 a3 kernel:  [<a000000203013320>] gfs2_gl_hash_clear+0x4c0/0x520 [gfs2]
Jun 30 05:51:51 a3 kernel:                                 sp=e00000404cc07d90 bsp=e00000404cc012b8
Jun 30 05:51:51 a3 kernel:  [<a00000020303bd60>] gfs2_put_super+0x3c0/0x400 [gfs2]
Jun 30 05:51:51 a3 kernel:                                 sp=e00000404cc07d90 bsp=e00000404cc01288
Jun 30 05:51:51 a3 kernel:  [<a000000100189d10>] generic_shutdown_super+0x1b0/0x2e0
Jun 30 05:51:51 a3 kernel:                                 sp=e00000404cc07d90 bsp=e00000404cc01258
Jun 30 05:51:51 a3 kernel:  [<a000000100189e80>] kill_block_super+0x40/0x80
Jun 30 05:51:51 a3 kernel:                                 sp=e00000404cc07d90 bsp=e00000404cc01230
Jun 30 05:51:51 a3 kernel:  [<a000000203032860>] gfs2_kill_sb+0xe0/0x120 [gfs2]
Jun 30 05:51:51 a3 kernel:                                 sp=e00000404cc07d90 bsp=e00000404cc011f0
Jun 30 05:51:51 a3 kernel:  [<a00000010018a130>] deactivate_super+0x170/0x1c0
Jun 30 05:51:51 a3 kernel:                                 sp=e00000404cc07d90 bsp=e00000404cc011c0
Jun 30 05:51:51 a3 kernel:  [<a0000001001c1be0>] mntput_no_expire+0xc0/0x200
Jun 30 05:51:51 a3 kernel:                                 sp=e00000404cc07d90 bsp=e00000404cc01190
Jun 30 05:51:51 a3 kernel:  [<a00000010019aac0>] path_release_on_umount+0x40/0x60
Jun 30 05:51:51 a3 kernel:                                 sp=e00000404cc07d90 bsp=e00000404cc01170
Jun 30 05:51:52 a3 kernel:  [<a0000001001c5420>] sys_umount+0x620/0x6e0
Jun 30 05:51:52 a3 kernel:                                 sp=e00000404cc07d90 bsp=e00000404cc010f8
Jun 30 05:51:52 a3 kernel:  [<a00000010000bd70>] __ia64_trace_syscall+0xd0/0x110
Jun 30 05:51:52 a3 kernel:                                 sp=e00000404cc07e30 bsp=e00000404cc010f8
Jun 30 05:51:52 a3 kernel:  [<a000000000010620>] __start_ivt_text+0xffffffff00010620/0x400
Jun 30 05:51:52 a3 kernel:                                 sp=e00000404cc08000 bsp=e00000404cc010f8

Jun 30 05:51:52 a3 kernel: CPU0:
Jun 30 05:51:52 a3 kernel: 
Jun 30 05:51:52 a3 kernel: Call Trace:
Jun 30 05:51:52 a3 kernel:  [<a000000100013b40>] show_stack+0x40/0xa0
Jun 30 05:51:52 a3 kernel:                                 sp=e00000010e7efc50 bsp=e00000010e7e9420
Jun 30 05:51:52 a3 kernel:  [<a0000001003dbc70>] showacpu+0x50/0x80
Jun 30 05:51:52 a3 kernel:                                 sp=e00000010e7efe20 bsp=e00000010e7e9400
Jun 30 05:51:52 a3 kernel:  [<a00000010008a870>] on_each_cpu+0x90/0xe0
Jun 30 05:51:52 a3 kernel:                                 sp=e00000010e7efe20 bsp=e00000010e7e93b8
Jun 30 05:51:52 a3 kernel:  [<a0000001003db810>] sysrq_handle_showcpus+0x30/0x60
Jun 30 05:51:52 a3 kernel:                                 sp=e00000010e7efe20 bsp=e00000010e7e93a0
Jun 30 05:51:52 a3 kernel:  [<a0000001003db9a0>] __handle_sysrq+0x160/0x300
Jun 30 05:51:52 a3 kernel:                                 sp=e00000010e7efe20 bsp=e00000010e7e9350
Jun 30 05:51:52 a3 kernel:  [<a00000010020be10>] write_sysrq_trigger+0xb0/0xe0
Jun 30 05:51:52 a3 kernel:                                 sp=e00000010e7efe20 bsp=e00000010e7e9320
Jun 30 05:51:52 a3 kernel:  [<a0000001001779e0>] vfs_write+0x200/0x3a0
Jun 30 05:51:52 a3 kernel:                                 sp=e00000010e7efe20 bsp=e00000010e7e92d0
Jun 30 05:51:52 a3 kernel:  [<a000000100178530>] sys_write+0x70/0xe0
Jun 30 05:51:52 a3 kernel:                                 sp=e00000010e7efe20 bsp=e00000010e7e9258
Jun 30 05:51:52 a3 kernel:  [<a00000010000bd70>] __ia64_trace_syscall+0xd0/0x110
Jun 30 05:51:52 a3 kernel:                                 sp=e00000010e7efe30 bsp=e00000010e7e9258
Jun 30 05:51:52 a3 kernel:  [<a000000000010620>] __start_ivt_text+0xffffffff00010620/0x400
Jun 30 05:51:52 a3 kernel:                                 sp=e00000010e7f0000 bsp=e00000010e7e9258











Jun 30 05:51:15 a3 kernel: SysRq : Show CPUs
Jun 30 05:51:15 a3 kernel: BUG: warning at arch/ia64/kernel/smp.c:455/smp_call_function() (Tainted: G     )
Jun 30 05:51:15 a3 kernel:
Jun 30 05:51:15 a3 kernel: Call Trace:
Jun 30 05:51:15 a3 kernel:  [<a000000100013b40>] show_stack+0x40/0xa0
Jun 30 05:51:15 a3 kernel:                                 sp=e00000010e7efc30 bsp=e00000010e7e9460
Jun 30 05:51:15 a3 kernel:  [<a000000100013bd0>] dump_stack+0x30/0x60
Jun 30 05:51:15 a3 kernel:                                 sp=e00000010e7efe00 bsp=e00000010e7e9448
Jun 30 05:51:15 a3 kernel:  [<a000000100056400>] smp_call_function+0x120/0x400
Jun 30 05:51:15 a3 kernel:                                 sp=e00000010e7efe00 bsp=e00000010e7e9400
Jun 30 05:51:15 a3 kernel:  [<a00000010008a820>] on_each_cpu+0x40/0xe0
Jun 30 05:51:15 a3 kernel:                                 sp=e00000010e7efe20 bsp=e00000010e7e93b8
Jun 30 05:51:15 a3 kernel:  [<a0000001003db810>] sysrq_handle_showcpus+0x30/0x60
Jun 30 05:51:16 a3 kernel:                                 sp=e00000010e7efe20 bsp=e00000010e7e93a0
Jun 30 05:51:16 a3 kernel:  [<a0000001003db9a0>] __handle_sysrq+0x160/0x300
Jun 30 05:51:16 a3 kernel:                                 sp=e00000010e7efe20 bsp=e00000010e7e9350
Jun 30 05:51:16 a3 kernel:  [<a00000010020be10>] write_sysrq_trigger+0xb0/0xe0
Jun 30 05:51:16 a3 kernel:                                 sp=e00000010e7efe20 bsp=e00000010e7e9320
Jun 30 05:51:16 a3 kernel:  [<a0000001001779e0>] vfs_write+0x200/0x3a0
Jun 30 05:51:16 a3 kernel:                                 sp=e00000010e7efe20 bsp=e00000010e7e92d0
Jun 30 05:51:16 a3 kernel:  [<a000000100178530>] sys_write+0x70/0xe0
Jun 30 05:51:16 a3 kernel:                                 sp=e00000010e7efe20 bsp=e00000010e7e9258
Jun 30 05:51:16 a3 kernel:  [<a00000010000bd70>] __ia64_trace_syscall+0xd0/0x110
Jun 30 05:51:16 a3 kernel:                                 sp=e00000010e7efe30 bsp=e00000010e7e9258
Jun 30 05:51:16 a3 kernel:  [<a000000000010620>] __start_ivt_text+0xffffffff00010620/0x400
Jun 30 05:51:16 a3 kernel:                                 sp=e00000010e7f0000 bsp=e00000010e7e9258
Jun 30 05:51:16 a3 kernel: CPU1:
Jun 30 05:51:16 a3 kernel:
Jun 30 05:51:16 a3 kernel: Call Trace:
Jun 30 05:51:16 a3 kernel:  [<a000000100013b40>] show_stack+0x40/0xa0
Jun 30 05:51:16 a3 kernel:                                 sp=e0000040ffaa79f0 bsp=e0000040ffaa12b0
Jun 30 05:51:16 a3 kernel:  [<a0000001003dbc70>] showacpu+0x50/0x80
Jun 30 05:51:16 a3 kernel:                                 sp=e0000040ffaa7bc0 bsp=e0000040ffaa1290
Jun 30 05:51:16 a3 kernel:  [<a000000100055ee0>] handle_IPI+0x160/0x380
Jun 30 05:51:16 a3 kernel:                                 sp=e0000040ffaa7bc0 bsp=e0000040ffaa1258
Jun 30 05:51:16 a3 kernel:  [<a0000001000ef1b0>] handle_IRQ_event+0x130/0x260
Jun 30 05:51:17 a3 kernel:                                 sp=e0000040ffaa7bc0 bsp=e0000040ffaa1218
Jun 30 05:51:17 a3 kernel:  [<a0000001000ef410>] __do_IRQ+0x130/0x420
Jun 30 05:51:17 a3 kernel:                                 sp=e0000040ffaa7bc0 bsp=e0000040ffaa11c8
Jun 30 05:51:17 a3 kernel:  [<a0000001000117c0>] ia64_handle_irq+0x160/0x200
Jun 30 05:51:18 a3 kernel:                                 sp=e0000040ffaa7bc0 bsp=e0000040ffaa1198
Jun 30 05:51:18 a3 kernel:  [<a00000010000bfe0>] __ia64_leave_kernel+0x0/0x280
Jun 30 05:51:18 a3 kernel:                                 sp=e0000040ffaa7bc0 bsp=e0000040ffaa1198
Jun 30 05:51:18 a3 kernel:  [<a000000100011bf0>] __ia64_pal_call_static+0x90/0xc0
Jun 30 05:51:18 a3 kernel:                                 sp=e0000040ffaa7d90 bsp=e0000040ffaa1148
Jun 30 05:51:18 a3 kernel:  [<a000000100014bb0>] default_idle+0x90/0x160
Jun 30 05:51:18 a3 kernel:                                 sp=e0000040ffaa7d90 bsp=e0000040ffaa1128
Jun 30 05:51:18 a3 kernel:  [<a000000100013810>] cpu_idle+0x1f0/0x400
Jun 30 05:51:18 a3 kernel:                                 sp=e0000040ffaa7e30 bsp=e0000040ffaa10e8
Jun 30 05:51:19 a3 kernel:  [<a0000001000580f0>] start_secondary+0x3f0/0x420
Jun 30 05:51:19 a3 kernel:                                 sp=e0000040ffaa7e30 bsp=e0000040ffaa10a0
Jun 30 05:51:19 a3 kernel:  [<a0000001000085e0>] __end_ivt_text+0x6c0/0x6f0
Jun 30 05:51:19 a3 kernel:                                 sp=e0000040ffaa7e30 bsp=e0000040ffaa10a0


Jun 30 05:51:19 a3 kernel: CPU0:
Jun 30 05:51:19 a3 kernel:
Jun 30 05:51:19 a3 kernel: Call Trace:
Jun 30 05:51:19 a3 kernel:  [<a000000100013b40>] show_stack+0x40/0xa0
Jun 30 05:51:19 a3 kernel:                                 sp=e00000010e7efc50 bsp=e00000010e7e9420
Jun 30 05:51:20 a3 kernel:  [<a0000001003dbc70>] showacpu+0x50/0x80
Jun 30 05:51:20 a3 kernel:                                 sp=e00000010e7efe20 bsp=e00000010e7e9400
Jun 30 05:51:20 a3 kernel:  [<a00000010008a870>] on_each_cpu+0x90/0xe0
Jun 30 05:51:20 a3 kernel:                                 sp=e00000010e7efe20 bsp=e00000010e7e93b8
Jun 30 05:51:20 a3 kernel:  [<a0000001003db810>] sysrq_handle_showcpus+0x30/0x60
Jun 30 05:51:20 a3 kernel:                                 sp=e00000010e7efe20 bsp=e00000010e7e93a0
Jun 30 05:51:20 a3 kernel:  [<a0000001003db9a0>] __handle_sysrq+0x160/0x300
Jun 30 05:51:20 a3 kernel:                                 sp=e00000010e7efe20 bsp=e00000010e7e9350
Jun 30 05:51:21 a3 kernel:  [<a00000010020be10>] write_sysrq_trigger+0xb0/0xe0
Jun 30 05:51:21 a3 kernel:                                 sp=e00000010e7efe20 bsp=e00000010e7e9320
Jun 30 05:51:21 a3 kernel:  [<a0000001001779e0>] vfs_write+0x200/0x3a0
Jun 30 05:51:21 a3 kernel:                                 sp=e00000010e7efe20 bsp=e00000010e7e92d0
Jun 30 05:51:21 a3 kernel:  [<a000000100178530>] sys_write+0x70/0xe0
Jun 30 05:51:21 a3 kernel:                                 sp=e00000010e7efe20 bsp=e00000010e7e9258
Jun 30 05:51:21 a3 kernel:  [<a00000010000bd70>] __ia64_trace_syscall+0xd0/0x110
Jun 30 05:51:22 a3 kernel:                                 sp=e00000010e7efe30 bsp=e00000010e7e9258
Jun 30 05:51:22 a3 kernel:  [<a000000000010620>] __start_ivt_text+0xffffffff00010620/0x400
Jun 30 05:51:22 a3 kernel:                                 sp=e00000010e7f0000 bsp=e00000010e7e9258

Comment 1 Steve Whitehouse 2009-06-30 14:07:38 UTC
umount.gfs2 shouldn't exist as unmounting is handled by gfs_controld and uevents. Does it still happen if you remove umount.gfs2 and try again?

Comment 2 Steve Whitehouse 2009-06-30 14:11:03 UTC
The glocks in the dump are all in the process of being demoted, but no reply has been received from the dlm to say that the demotion has actually occurred.

Comment 3 Steve Whitehouse 2009-07-01 12:22:35 UTC
I'd forgotten (comment #1) just how old the code in RHEL5 is. It seems that it does still have the umount helper :( Still it looks from the glock dump that its not a problem in the helper anyway.

Comment 4 Steve Whitehouse 2009-07-02 16:22:09 UTC
Looks like we missed an upstream change which should have got into RHEL. The gfs2_umount_flush_sem needs moving to be around run_queue and not around the dlm reply code.

Comment 5 Robert Peterson 2009-07-02 16:36:20 UTC
Here are links to the two upstream patches that fix this problem:

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=a228df6339e0d385b8149c860d81b6007f5e9c81
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=d8348de06f704fc34d24ec068546ecb1045fc11a

Not all of the content is applicable to the RHEL5 kernel, so the
current plan is for Steve to combine the two patches into a simpler
fix for RHEL.  Reassigning to Steve.  Let me know if I can help in
any way.

Comment 6 Steve Whitehouse 2009-07-03 10:35:44 UTC
Created attachment 350404 [details]
Proposed fix

This is the fix from upstream. It moves the sem from around the code which receives messages from the dlm to being around the code which processes the particular messages which cause the race during umount. This means that other messages from the dlm can continue to be processed during this time.

Since the write side of the lock is held for all of the invalidate_inodes() routine, the read side acts like a barrier during that process.

This is entirely due to the extra set of inodes that we are carrying around, and once we are rid of those, this need for this rw semaphore will go away.

Comment 7 Steve Whitehouse 2009-07-03 10:41:10 UTC
I guess to be clear I should have said "this is the RHEL5 port of the fix which is in upstream"

Comment 8 Robert Peterson 2009-07-07 02:59:41 UTC
Waiting for ack flags.

Comment 9 RHEL Program Management 2009-07-07 14:23:22 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 11 Abhijith Das 2009-07-10 18:44:01 UTC
posted patch in comment #6 to rhkernel-list

Comment 12 Don Zickus 2009-07-14 20:57:55 UTC
in kernel-2.6.18-158.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Please do NOT transition this bugzilla state to VERIFIED until our QE team
has sent specific instructions indicating when to do so.  However feel free
to provide a comment indicating that this fix has been verified.

Comment 14 Han Pingtian 2009-07-24 02:31:24 UTC
hi Jaroslav,

Could you please try to verify this fix in kernel-2.6.18-159.el5? Since I don't have the cluster environment, I will try to code-review this fix first.

Thanks.

Comment 16 Jaroslav Kortus 2009-08-05 09:49:44 UTC
Tested with 2.6.18-160.el5 #1 SMP Mon Jul 27 17:32:15 EDT 2009 ia64 ia64 ia64 GNU/Linux (RHEL5.4 snap 5). No hang after several GFS suite runs.

Comment 18 errata-xmlrpc 2009-09-02 08:12:37 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-1243.html