Bug 570530
Summary: | cman: gfs_controld dm suspend hangs withdrawn GFS file system | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Robert Peterson <rpeterso> | ||||
Component: | cman | Assignee: | Robert Peterson <rpeterso> | ||||
Status: | CLOSED ERRATA | QA Contact: | Cluster QE <mspqa-list> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | urgent | ||||||
Version: | 5.4 | CC: | adas, bmarzins, ccaulfie, cluster-maint, edamato, jkortus, swhiteho, teigland | ||||
Target Milestone: | rc | ||||||
Target Release: | --- | ||||||
Hardware: | All | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | cman-2.0.115-33.el5 | Doc Type: | Bug Fix | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | |||||||
: | 571806 (view as bug list) | Environment: | |||||
Last Closed: | 2010-03-30 08:39:04 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 571806 | ||||||
Attachments: |
|
Description
Robert Peterson
2010-03-04 16:23:39 UTC
Created attachment 397846 [details]
Proposed patch
Here is my proposed patch that seems to fix the problem.
Sorry, bits of the problem description above were copied over from the other bug #517145 so they do not apply. This bug was opened to address the problem mentioned as (2) above. On RHEL5.4 + RHN updates the withdrawal process succeeds: Mar 4 12:35:19 a2 kernel: GFS: fsid=a_cluster:vedder0.1: withdrawing from cluster at user's request Mar 4 12:35:19 a2 kernel: GFS: fsid=a_cluster:vedder0.1: about to withdraw from the cluster Mar 4 12:35:19 a2 kernel: GFS: fsid=a_cluster:vedder0.1: telling LM to withdraw Mar 4 12:35:20 a2 kernel: GFS: fsid=a_cluster:vedder0.1: withdrawn Mar 4 12:35:20 a2 kernel: Mar 4 12:35:20 a2 kernel: Call Trace: Mar 4 12:35:20 a2 kernel: [<a000000100013b40>] show_stack+0x40/0xa0 Mar 4 12:35:20 a2 kernel: sp=e00000010e5a7bd0 bsp=e00000010e5a1298 Mar 4 12:35:20 a2 kernel: [<a000000100013bd0>] dump_stack+0x30/0x60 Mar 4 12:35:20 a2 kernel: sp=e00000010e5a7da0 bsp=e00000010e5a1280 Mar 4 12:35:20 a2 kernel: [<a00000020331df40>] gfs_lm_withdraw+0x1e0/0x220 [gfs] Mar 4 12:35:20 a2 kernel: sp=e00000010e5a7da0 bsp=e00000010e5a1218 Mar 4 12:35:20 a2 kernel: [<a000000203348600>] gfs_proc_read+0xaa0/0xd60 [gfs] Mar 4 12:35:20 a2 kernel: sp=e00000010e5a7de0 bsp=e00000010e5a11b8 Mar 4 12:35:20 a2 kernel: [<a000000100177300>] vfs_read+0x200/0x3a0 Mar 4 12:35:20 a2 kernel: sp=e00000010e5a7e20 bsp=e00000010e5a1168 Mar 4 12:35:20 a2 kernel: [<a0000001001779d0>] sys_read+0x70/0xe0 Mar 4 12:35:20 a2 kernel: sp=e00000010e5a7e20 bsp=e00000010e5a10f0 Mar 4 12:35:20 a2 kernel: [<a00000010000bd70>] __ia64_trace_syscall+0xd0/0x110 Mar 4 12:35:20 a2 kernel: sp=e00000010e5a7e30 bsp=e00000010e5a10f0 Mar 4 12:35:20 a2 kernel: [<a000000000010620>] __start_ivt_text+0xffffffff00010620/0x400 Mar 4 12:35:20 a2 kernel: sp=e00000010e5a8000 bsp=e00000010e5a10f0 The patch was pushed to the RHEL55 branch of the cluster git tree for inclusion into 5.5. Changing status to POST until I can do a build. Hm, I added a comment with some questions about this last week, but that comment seems to be completely missing... to recap, the --nolockfs looks correct; I remember when this bug was introduced by gfs2 adding the lockfs hooks, but apparently whomever added those didn't think of this problem or test it. The --noflush I'm not sure about, it depends on what happens to unflushed buffers when you suspend with --noflush. Are they all completed, with errors? Or are they left outstanding until the resume? The former should be fine, the later would be dangerous and defeat the purpose of the suspend (which is to wait for all writes to be gone so that the node doesn't need to be fenced.) Verified as in description. Setting needinfo to Bob to clarify comment 6. Regarding comment #6, what happens to the I/O depends upon the target which is installed rather than the flushing. So far as I can tell from the man page the flushing is something that was supposed to happen before the new target was installed. It should be easy enough to verify. The intent is that the new dm target remains in place until either the machine is rebooted, or a umount succeeds. That must by definition invalidate all the buffers since they are all in the address spaces of the inodes which will have been deallocated in order for umount to be successful. So either should be safe. The question is whether one or the other would make it more likely for umount to succeed. I suspect it makes no difference, but lets try it and see. The decision to implement withdraw using dmsetup suspend was based on the premise that no outstanding writes or dirty buffers would exist for the given device once dmsetup returned. Otherwise the fs is open to being corrupted. So, if there are cases where that is not true, then we need to change something so that it is, detect those cases and panic instead of withdrawing, or advise people to use the panic option. (Panic instead of withdraw is almost always preferable anyway, and should really be made the default behavior.) I did some testing on this. The --noflush option seems to make no difference. In both cases, the withdraw returns normally, but any subsequent attempt to umount will hang producing one of the following call traces (gfs and gfs2 respectively): INFO: task umount.gfs:3717 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. umount.gfs D ffff81000237eaa0 0 3717 3716 (NOTLB) ffff810066ae1c08 0000000000000086 0000000000000001 ffffffff800e3452 ffff81006afa7ac8 0000000000000007 ffff810066f3c820 ffff8100026e4100 0000024d4dd063e8 0000000000099e19 ffff810066f3ca08 0000000100000010 Call Trace: [<ffffffff800e3452>] block_read_full_page+0x259/0x276 [<ffffffff8006f1f5>] do_gettimeofday+0x40/0x90 [<ffffffff80028adc>] sync_page+0x0/0x43 [<ffffffff800647ea>] io_schedule+0x3f/0x67 [<ffffffff80028b1a>] sync_page+0x3e/0x43 [<ffffffff8006492e>] __wait_on_bit_lock+0x36/0x66 [<ffffffff8003ff92>] __lock_page+0x5e/0x64 [<ffffffff800a1bd2>] wake_bit_function+0x0/0x23 [<ffffffff8000c2e7>] do_generic_mapping_read+0x1df/0x354 [<ffffffff8000d0fb>] file_read_actor+0x0/0x159 [<ffffffff8000c5a8>] __generic_file_aio_read+0x14c/0x198 [<ffffffff800c78fb>] generic_file_read+0xac/0xc5 [<ffffffff800a1ba4>] autoremove_wake_function+0x0/0x2e [<ffffffff8012e042>] selinux_file_permission+0x9f/0xb6 [<ffffffff8000b6b0>] vfs_read+0xcb/0x171 [<ffffffff80011c01>] sys_read+0x45/0x6e [<ffffffff8005e28d>] tracesys+0xd5/0xe0 With dmsetup --nolockfs and --noflush and gfs2: INFO: task gfs2_logd:3145 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. gfs2_logd D ffff810002376420 0 3145 27 3146 3143 (L-TLB) ffff810069b67cc0 0000000000000046 ffff81007ef70c33 ffff81007fb92178 ffffffff800154ce 0000000000000009 ffff8100691e8040 ffffffff80309b60 000000499756fb9a 0000000000006545 ffff8100691e8228 0000000000000282 Call Trace: [<ffffffff800154ce>] sync_buffer+0x0/0x3f [<ffffffff8006f1f5>] do_gettimeofday+0x40/0x90 [<ffffffff800154ce>] sync_buffer+0x0/0x3f [<ffffffff800647ea>] io_schedule+0x3f/0x67 [<ffffffff80015509>] sync_buffer+0x3b/0x3f [<ffffffff80064a16>] __wait_on_bit+0x40/0x6e [<ffffffff800154ce>] sync_buffer+0x0/0x3f [<ffffffff800a198c>] keventd_create_kthread+0x0/0xc4 [<ffffffff80064ab0>] out_of_line_wait_on_bit+0x6c/0x78 [<ffffffff800a1bd2>] wake_bit_function+0x0/0x23 [<ffffffff8003aca8>] sync_dirty_buffer+0x96/0xcb [<ffffffff88626dc8>] :gfs2:log_write_header+0x10e/0x336 [<ffffffff800a198c>] keventd_create_kthread+0x0/0xc4 [<ffffffff886273ac>] :gfs2:gfs2_log_flush+0x3bc/0x472 [<ffffffff886269b5>] :gfs2:gfs2_ail1_empty+0x1a/0x95 [<ffffffff8862793c>] :gfs2:gfs2_logd+0xa2/0x15c [<ffffffff8862789a>] :gfs2:gfs2_logd+0x0/0x15c [<ffffffff80032bdc>] kthread+0xfe/0x132 [<ffffffff8005efb1>] child_rip+0xa/0x11 [<ffffffff800a198c>] keventd_create_kthread+0x0/0xc4 [<ffffffff80032ade>] kthread+0x0/0x132 [<ffffffff8005efa7>] child_rip+0x0/0x11 INFO: task umount.gfs2:3195 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. umount.gfs2 D ffff810002376420 0 3195 3194 (NOTLB) ffff8100681bfdb8 0000000000000082 ffff81000237eb18 ffff8100681bfd28 ffff81007f756080 0000000000000007 ffff8100698c5080 ffffffff80309b60 0000004afa9aff98 0000000000097576 ffff8100698c5268 0000000000000000 Call Trace: [<ffffffff80065613>] __down_write_nested+0x7a/0x92 [<ffffffff8862700f>] :gfs2:gfs2_log_flush+0x1f/0x472 [<ffffffff8862746d>] :gfs2:gfs2_meta_syncfs+0xb/0x37 [<ffffffff8862e0ac>] :gfs2:gfs2_kill_sb+0x25/0x76 [<ffffffff800e4d41>] deactivate_super+0x6a/0x82 [<ffffffff800ee830>] sys_umount+0x245/0x27b [<ffffffff800b878c>] audit_syscall_entry+0x180/0x1b3 [<ffffffff8005e28d>] tracesys+0xd5/0xe0 So we have to decide whether this is worth respinning the cman errata again this late in the build cycle to remove --noflush. My personal opinion is no, it's not worth respinning; we can deal with the umount problem and device sync issues in 5.6 or 5.5.z. Opinions? An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2010-0266.html |