Hide Forgot
Description of problem: This is basically a continuation of closed bug 1025456. Many deadlocked failure scenarios were posted to that bug however the later ones had to do with multiple asynchronous mirror device failures. Basically the following: Apr 7 02:34:37 host-119 qarshd[24136]: Running cmdline: echo offline > /sys/block/sda/device/state Apr 7 02:34:37 host-119 lvm[4128]: Primary mirror device 253:3 has failed (D). Apr 7 02:34:37 host-119 lvm[4128]: Device failure in helter_skelter-syncd_multiple_legs_4legs_1. Apr 7 02:34:37 host-119 lvm[4128]: /dev/sda1: read failed after 0 of 4096 at 0: Input/output error Apr 7 02:34:37 host-119 lvm[4128]: Couldn't find device with uuid X5QiOP-FU2w-oD1M-G34G-atn8-Ajgd-o7zxom. Apr 7 02:34:37 host-119 lvm[4128]: Couldn't find device for segment belonging to helter_skelter/syncd_multiple_legs_4legs_1_mimage_0 while checking used and assumed devices. Apr 7 02:34:37 host-119 lvm[4128]: /dev/sda1: read failed after 0 of 4096 at 4096: Input/output error Apr 7 02:34:37 host-119 lvm[4128]: Couldn't find device with uuid X5QiOP-FU2w-oD1M-G34G-atn8-Ajgd-o7zxom. Apr 7 02:34:37 host-119 lvm[4128]: Couldn't find device for segment belonging to helter_skelter/syncd_multiple_legs_4legs_1_mimage_0 while checking used and assumed devices. # Next failure comes in and hoses the original repair attempt Apr 7 02:34:37 host-119 qarshd[24138]: Running cmdline: echo offline > /sys/block/sdg/device/state Apr 7 02:34:37 host-119 kernel: sd 9:0:0:1: rejecting I/O to offline device Apr 7 02:34:37 host-119 lvm[4128]: /dev/sdg1: read failed after 0 of 4096 at 4096: Input/output error Apr 7 02:34:37 host-119 lvm[4128]: WARNING: Failed to write an MDA of VG helter_skelter. Apr 7 02:34:37 host-119 lvm[4128]: /dev/sdg1: read failed after 0 of 4096 at 0: Input/output error Apr 7 02:34:37 host-119 kernel: __ratelimit: 549 callbacks suppressed Apr 7 02:34:37 host-119 kernel: device-mapper: raid1: Write error during recovery (error = 0x2) Apr 7 02:34:37 host-119 kernel: device-mapper: raid1: Write error during recovery (error = 0x2) Apr 7 02:34:37 host-119 lvm[4128]: WARNING: Failed to write an MDA of VG helter_skelter. Apr 7 02:34:37 host-119 lvm[4128]: Internal error: Performing unsafe table load while 5 device(s) are known to be suspended: (253:3) Apr 7 02:34:37 host-119 lvm[4128]: Internal error: Performing unsafe table load while 5 device(s) are known to be suspended: (253:6) Apr 7 02:34:37 host-119 lvm[4128]: Mirror status: 1 of 4 images failed. Apr 7 02:34:37 host-119 lvm[4128]: Trying to up-convert to 4 images, 1 logs. FULL TEST OUTPUT: ================================================================================ Iteration 4.16 started at Thu Apr 7 02:33:19 CDT 2016 ================================================================================ Scenario kill_multiple_legs_synced_4_legs: Kill multiple legs of synced 4 leg mirror(s) ********* Mirror hash info for this scenario ********* * names: syncd_multiple_legs_4legs_1 syncd_multiple_legs_4legs_2 * sync: 1 * striped: 0 * leg devices: /dev/sda1 /dev/sde1 /dev/sdb1 /dev/sdg1 * log devices: /dev/sdc1 * no MDA devices: * failpv(s): /dev/sda1 /dev/sdg1 * failnode(s): host-119.virt.lab.msp.redhat.com * lvmetad: 0 * leg fault policy: allocate * log fault policy: allocate ****************************************************** Creating mirror(s) on host-119.virt.lab.msp.redhat.com... host-119.virt.lab.msp.redhat.com: lvcreate --type mirror -m 3 -n syncd_multiple_legs_4legs_1 -L 500M helter_skelter /dev/sda1:0-2400 /dev/sde1:0-2400 /dev/sdb1:0-2400 /dev/sdg1:0-2400 /dev/sdc1:0-150 host-119.virt.lab.msp.redhat.com: lvcreate --type mirror -m 3 -n syncd_multiple_legs_4legs_2 -L 500M helter_skelter /dev/sda1:0-2400 /dev/sde1:0-2400 /dev/sdb1:0-2400 /dev/sdg1:0-2400 /dev/sdc1:0-150 Current mirror/raid device structure(s): LV Attr LSize Cpy%Sync Devices syncd_multiple_legs_4legs_1 mwi-a-m--- 500.00m 45.60 syncd_multiple_legs_4legs_1_mimage_0(0),syncd_multiple_legs_4legs_1_mimage_1(0),syncd_multiple_legs_4legs_1_mimage_2(0),syncd_multiple_legs_4legs_1_mimage_3(0) [syncd_multiple_legs_4legs_1_mimage_0] Iwi-aom--- 500.00m /dev/sda1(0) [syncd_multiple_legs_4legs_1_mimage_1] Iwi-aom--- 500.00m /dev/sde1(0) [syncd_multiple_legs_4legs_1_mimage_2] Iwi-aom--- 500.00m /dev/sdb1(0) [syncd_multiple_legs_4legs_1_mimage_3] Iwi-aom--- 500.00m /dev/sdg1(0) [syncd_multiple_legs_4legs_1_mlog] lwi-aom--- 4.00m /dev/sdc1(0) syncd_multiple_legs_4legs_2 mwi-a-m--- 500.00m 4.80 syncd_multiple_legs_4legs_2_mimage_0(0),syncd_multiple_legs_4legs_2_mimage_1(0),syncd_multiple_legs_4legs_2_mimage_2(0),syncd_multiple_legs_4legs_2_mimage_3(0) [syncd_multiple_legs_4legs_2_mimage_0] Iwi-aom--- 500.00m /dev/sda1(125) [syncd_multiple_legs_4legs_2_mimage_1] Iwi-aom--- 500.00m /dev/sde1(125) [syncd_multiple_legs_4legs_2_mimage_2] Iwi-aom--- 500.00m /dev/sdb1(125) [syncd_multiple_legs_4legs_2_mimage_3] Iwi-aom--- 500.00m /dev/sdg1(125) [syncd_multiple_legs_4legs_2_mlog] lwi-aom--- 4.00m /dev/sdc1(1) Waiting until all mirror|raid volumes become fully syncd... 2/2 mirror(s) are fully synced: ( 100.00% 100.00% ) Creating ext on top of mirror(s) on host-119.virt.lab.msp.redhat.com... mke2fs 1.41.12 (17-May-2010) mke2fs 1.41.12 (17-May-2010) Mounting mirrored ext filesystems on host-119.virt.lab.msp.redhat.com... PV=/dev/sdg1 syncd_multiple_legs_4legs_1_mimage_3: 3.1 syncd_multiple_legs_4legs_2_mimage_3: 3.1 PV=/dev/sda1 syncd_multiple_legs_4legs_1_mimage_0: 4.1 syncd_multiple_legs_4legs_2_mimage_0: 4.1 PV=/dev/sdg1 syncd_multiple_legs_4legs_1_mimage_3: 3.1 syncd_multiple_legs_4legs_2_mimage_3: 3.1 PV=/dev/sda1 syncd_multiple_legs_4legs_1_mimage_0: 4.1 syncd_multiple_legs_4legs_2_mimage_0: 4.1 Writing verification files (checkit) to mirror(s) on... ---- host-119.virt.lab.msp.redhat.com ---- <start name="host-119.virt.lab.msp.redhat.com_syncd_multiple_legs_4legs_1" pid="938" time="Thu Apr 7 02:34:08 2016" type="cmd" /> <start name="host-119.virt.lab.msp.redhat.com_syncd_multiple_legs_4legs_2" pid="940" time="Thu Apr 7 02:34:08 2016" type="cmd" /> Sleeping 15 seconds to get some outsanding I/O locks before the failure Verifying files (checkit) on mirror(s) on... ---- host-119.virt.lab.msp.redhat.com ---- Disabling device sda on host-119.virt.lab.msp.redhat.com Disabling device sdg on host-119.virt.lab.msp.redhat.com Getting recovery check start time from /var/log/messages: Apr 7 02:34 Attempting I/O to cause mirror down conversion(s) on host-119.virt.lab.msp.redhat.com dd if=/dev/zero of=/mnt/syncd_multiple_legs_4legs_1/ddfile count=10 bs=4M oflag=direct [...] Apr 7 02:34:37 host-119 qarshd[24136]: Running cmdline: echo offline > /sys/block/sda/device/state Apr 7 02:34:37 host-119 kernel: sd 2:0:0:1: rejecting I/O to offline device Apr 7 02:34:37 host-119 lvm[4128]: Primary mirror device 253:3 has failed (D). Apr 7 02:34:37 host-119 lvm[4128]: Device failure in helter_skelter-syncd_multiple_legs_4legs_1. Apr 7 02:34:37 host-119 lvm[4128]: /dev/sda1: read failed after 0 of 4096 at 0: Input/output error Apr 7 02:34:37 host-119 lvm[4128]: Couldn't find device with uuid X5QiOP-FU2w-oD1M-G34G-atn8-Ajgd-o7zxom. Apr 7 02:34:37 host-119 lvm[4128]: Couldn't find device for segment belonging to helter_skelter/syncd_multiple_legs_4legs_1_mimage_0 while checking used and assumed devices. Apr 7 02:34:37 host-119 lvm[4128]: /dev/sda1: read failed after 0 of 4096 at 4096: Input/output error Apr 7 02:34:37 host-119 lvm[4128]: Couldn't find device with uuid X5QiOP-FU2w-oD1M-G34G-atn8-Ajgd-o7zxom. Apr 7 02:34:37 host-119 lvm[4128]: Couldn't find device for segment belonging to helter_skelter/syncd_multiple_legs_4legs_1_mimage_0 while checking used and assumed devices. Apr 7 02:34:37 host-119 qarshd[24138]: Running cmdline: echo offline > /sys/block/sdg/device/state Apr 7 02:34:37 host-119 kernel: sd 9:0:0:1: rejecting I/O to offline device Apr 7 02:34:37 host-119 lvm[4128]: /dev/sdg1: read failed after 0 of 4096 at 4096: Input/output error Apr 7 02:34:37 host-119 lvm[4128]: WARNING: Failed to write an MDA of VG helter_skelter. Apr 7 02:34:37 host-119 lvm[4128]: /dev/sdg1: read failed after 0 of 4096 at 0: Input/output error Apr 7 02:34:37 host-119 kernel: __ratelimit: 549 callbacks suppressed Apr 7 02:34:37 host-119 kernel: device-mapper: raid1: Write error during recovery (error = 0x2) Apr 7 02:34:37 host-119 kernel: device-mapper: raid1: Write error during recovery (error = 0x2) Apr 7 02:34:37 host-119 lvm[4128]: WARNING: Failed to write an MDA of VG helter_skelter. Apr 7 02:34:37 host-119 lvm[4128]: Internal error: Performing unsafe table load while 5 device(s) are known to be suspended: (253:3) Apr 7 02:34:37 host-119 lvm[4128]: Internal error: Performing unsafe table load while 5 device(s) are known to be suspended: (253:6) Apr 7 02:34:37 host-119 lvm[4128]: Mirror status: 1 of 4 images failed. Apr 7 02:34:37 host-119 lvm[4128]: Trying to up-convert to 4 images, 1 logs. Apr 7 02:34:37 host-119 kernel: device-mapper: raid1: Write error during recovery (error = 0x2) Apr 7 02:34:37 host-119 kernel: device-mapper: raid1: Write error during recovery (error = 0x2) Apr 7 02:34:37 host-119 lvm[4128]: WARNING: Failed to write an MDA of VG helter_skelter. Apr 7 02:34:37 host-119 lvm[4128]: WARNING: Failed to write an MDA of VG helter_skelter. Apr 7 02:34:37 host-119 lvm[4128]: WARNING: Failed to replace 2 of 4 images in volume syncd_multiple_legs_4legs_1 Apr 7 02:34:37 host-119 lvm[4128]: WARNING: Failed to replace 1 of 1 logs in volume syncd_multiple_legs_4legs_1 Apr 7 02:34:37 host-119 kernel: device-mapper: raid1: Write error during recovery (error = 0x2) Apr 7 02:34:37 host-119 kernel: device-mapper: raid1: Write error during recovery (error = 0x2) Apr 7 02:34:37 host-119 lvm[4128]: /dev/sdg1: read failed after 0 of 4096 at 26838958080: Input/output error Apr 7 02:34:37 host-119 lvm[4128]: /dev/sdg1: read failed after 0 of 4096 at 26839048192: Input/output error Apr 7 02:34:37 host-119 lvm[4128]: Couldn't find device with uuid GgTSFL-vdux-Hz1e-xuif-x40d-cD3N-rmXI1c. Apr 7 02:34:37 host-119 lvm[4128]: helter_skelter/syncd_multiple_legs_4legs_1: Converted: 0.0% Apr 7 02:34:37 host-119 lvm[4128]: Primary mirror device 253:9 has failed (D). Apr 7 02:34:37 host-119 lvm[4128]: Secondary mirror device 253:12 has failed (D). Apr 7 02:34:37 host-119 lvm[4128]: Device failure in helter_skelter-syncd_multiple_legs_4legs_2. Apr 7 02:34:42 host-119 qarshd[24251]: Running cmdline: pvs -a Apr 7 02:34:43 host-119 qarshd[24257]: Running cmdline: dd if=/dev/zero of=/mnt/syncd_multiple_legs_4legs_1/ddfile count=10 bs=4M oflag=direct Apr 7 02:34:52 host-119 lvm[4128]: helter_skelter/syncd_multiple_legs_4legs_1: Converted: 0.0% Apr 7 02:35:07 host-119 lvm[4128]: helter_skelter/syncd_multiple_legs_4legs_1: Converted: 0.0% Apr 7 02:35:22 host-119 lvm[4128]: helter_skelter/syncd_multiple_legs_4legs_1: Converted: 0.0% Apr 7 02:35:37 host-119 lvm[4128]: helter_skelter/syncd_multiple_legs_4legs_1: Converted: 0.0% Apr 7 02:37:08 host-119 lvm[4128]: helter_skelter/syncd_multiple_legs_4legs_1: Converted: 0.0% Apr 7 02:37:10 host-119 kernel: INFO: task kjournald:24089 blocked for more than 120 seconds. Apr 7 02:37:10 host-119 kernel: Not tainted 2.6.32-639.el6.x86_64 #1 Apr 7 02:37:10 host-119 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Apr 7 02:37:10 host-119 kernel: kjournald D 0000000000000000 0 24089 2 0x00000080 Apr 7 02:37:10 host-119 kernel: ffff880039c9bc50 0000000000000046 ffff880000000000 001a1035701c9d42 Apr 7 02:37:10 host-119 kernel: ffff880022d346a0 ffff880036ac8d70 0000000002ef968a ffffffffa8fa6962 Apr 7 02:37:10 host-119 kernel: 00000000126a1500 ffffffff81aa8340 ffff88003d629068 ffff880039c9bfd8 Apr 7 02:37:10 host-119 kernel: Call Trace: Apr 7 02:37:10 host-119 kernel: [<ffffffff811d12a0>] ? sync_buffer+0x0/0x50 Apr 7 02:37:10 host-119 kernel: [<ffffffff811d12a0>] ? sync_buffer+0x0/0x50 Apr 7 02:37:10 host-119 kernel: [<ffffffff81547ce3>] io_schedule+0x73/0xc0 Apr 7 02:37:10 host-119 kernel: [<ffffffff811d12e0>] sync_buffer+0x40/0x50 Apr 7 02:37:10 host-119 kernel: [<ffffffff815487cf>] __wait_on_bit+0x5f/0x90 Apr 7 02:37:10 host-119 kernel: [<ffffffff811d12a0>] ? sync_buffer+0x0/0x50 Apr 7 02:37:10 host-119 kernel: [<ffffffff81548878>] out_of_line_wait_on_bit+0x78/0x90 Apr 7 02:37:10 host-119 kernel: [<ffffffff810a6b20>] ? wake_bit_function+0x0/0x50 Apr 7 02:37:10 host-119 kernel: [<ffffffff811d1296>] __wait_on_buffer+0x26/0x30 Apr 7 02:37:10 host-119 kernel: [<ffffffffa044dfdf>] journal_commit_transaction+0x9df/0x12f0 [jbd] Apr 7 02:37:10 host-119 kernel: [<ffffffff8108f12c>] ? lock_timer_base+0x3c/0x70 Apr 7 02:37:10 host-119 kernel: [<ffffffff8108fd2b>] ? try_to_del_timer_sync+0x7b/0xe0 Apr 7 02:37:10 host-119 kernel: [<ffffffffa0452ea5>] kjournald+0xe5/0x240 [jbd] Apr 7 02:37:10 host-119 kernel: [<ffffffff810a6aa0>] ? autoremove_wake_function+0x0/0x40 Apr 7 02:37:10 host-119 kernel: [<ffffffffa0452dc0>] ? kjournald+0x0/0x240 [jbd] Apr 7 02:37:10 host-119 kernel: [<ffffffff810a660e>] kthread+0x9e/0xc0 Apr 7 02:37:10 host-119 kernel: [<ffffffff8100c28a>] child_rip+0xa/0x20 Apr 7 02:37:10 host-119 kernel: [<ffffffff810a6570>] ? kthread+0x0/0xc0 Apr 7 02:37:10 host-119 kernel: [<ffffffff8100c280>] ? child_rip+0x0/0x20 Apr 7 02:37:10 host-119 kernel: INFO: task kjournald:24094 blocked for more than 120 seconds. Apr 7 02:37:10 host-119 kernel: Not tainted 2.6.32-639.el6.x86_64 #1 Apr 7 02:37:10 host-119 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Apr 7 02:37:10 host-119 kernel: kjournald D 0000000000000000 0 24094 2 0x00000080 Apr 7 02:37:10 host-119 kernel: ffff880039863c50 0000000000000046 ffff880000000000 001a1036191ba87f Apr 7 02:37:10 host-119 kernel: ffff88003b1b0be0 ffff880036ac81a0 0000000002ef99de ffffffffa8fa6962 Apr 7 02:37:10 host-119 kernel: 00000000126a1500 ffffffff81aa8340 ffff88003cfdf068 ffff880039863fd8 Apr 7 02:37:10 host-119 kernel: Call Trace: Apr 7 02:37:10 host-119 kernel: [<ffffffff811d12a0>] ? sync_buffer+0x0/0x50 Apr 7 02:37:10 host-119 kernel: [<ffffffff811d12a0>] ? sync_buffer+0x0/0x50 Apr 7 02:37:10 host-119 kernel: [<ffffffff81547ce3>] io_schedule+0x73/0xc0 Apr 7 02:37:10 host-119 kernel: [<ffffffff811d12e0>] sync_buffer+0x40/0x50 Apr 7 02:37:10 host-119 kernel: [<ffffffff815487cf>] __wait_on_bit+0x5f/0x90 Apr 7 02:37:10 host-119 kernel: [<ffffffff811d12a0>] ? sync_buffer+0x0/0x50 Apr 7 02:37:10 host-119 kernel: [<ffffffff81548878>] out_of_line_wait_on_bit+0x78/0x90 Apr 7 02:37:10 host-119 kernel: [<ffffffff810a6b20>] ? wake_bit_function+0x0/0x50 Apr 7 02:37:10 host-119 kernel: [<ffffffff811d1296>] __wait_on_buffer+0x26/0x30 Apr 7 02:37:10 host-119 kernel: [<ffffffffa044dfdf>] journal_commit_transaction+0x9df/0x12f0 [jbd] Apr 7 02:37:10 host-119 kernel: [<ffffffff8108f12c>] ? lock_timer_base+0x3c/0x70 Apr 7 02:37:10 host-119 kernel: [<ffffffff8108fd2b>] ? try_to_del_timer_sync+0x7b/0xe0 Apr 7 02:37:10 host-119 kernel: [<ffffffffa0452ea5>] kjournald+0xe5/0x240 [jbd] Apr 7 02:37:10 host-119 kernel: [<ffffffff810a6aa0>] ? autoremove_wake_function+0x0/0x40 Apr 7 02:37:10 host-119 kernel: [<ffffffffa0452dc0>] ? kjournald+0x0/0x240 [jbd] Apr 7 02:37:10 host-119 kernel: [<ffffffff810a660e>] kthread+0x9e/0xc0 Apr 7 02:37:10 host-119 kernel: [<ffffffff8100c28a>] child_rip+0xa/0x20 Apr 7 02:37:10 host-119 kernel: [<ffffffff810a6570>] ? kthread+0x0/0xc0 Apr 7 02:37:10 host-119 kernel: [<ffffffff8100c280>] ? child_rip+0x0/0x20 Version-Release number of selected component (if applicable): 2.6.32-639.el6.x86_64 lvm2-2.02.143-7.el6 BUILT: Wed Apr 6 10:08:33 CDT 2016 lvm2-libs-2.02.143-7.el6 BUILT: Wed Apr 6 10:08:33 CDT 2016 lvm2-cluster-2.02.143-7.el6 BUILT: Wed Apr 6 10:08:33 CDT 2016 udev-147-2.72.el6 BUILT: Tue Mar 1 06:14:05 CST 2016 device-mapper-1.02.117-7.el6 BUILT: Wed Apr 6 10:08:33 CDT 2016 device-mapper-libs-1.02.117-7.el6 BUILT: Wed Apr 6 10:08:33 CDT 2016 device-mapper-event-1.02.117-7.el6 BUILT: Wed Apr 6 10:08:33 CDT 2016 device-mapper-event-libs-1.02.117-7.el6 BUILT: Wed Apr 6 10:08:33 CDT 2016 device-mapper-persistent-data-0.6.2-0.1.rc7.el6 BUILT: Tue Mar 22 08:58:09 CDT 2016 cmirror-2.02.143-7.el6 BUILT: Wed Apr 6 10:08:33 CDT 2016 How reproducible: Often
Red Hat Enterprise Linux 6 is in the Production 3 Phase. During the Production 3 Phase, Critical impact Security Advisories (RHSAs) and selected Urgent Priority Bug Fix Advisories (RHBAs) may be released as they become available. The official life cycle policy can be reviewed here: http://redhat.com/rhel/lifecycle This issue does not meet the inclusion criteria for the Production 3 Phase and will be marked as CLOSED/WONTFIX. If this remains a critical requirement, please contact Red Hat Customer Support to request a re-evaluation of the issue, citing a clear business justification. Note that a strong business justification will be required for re-evaluation. Red Hat Customer Support can be contacted via the Red Hat Customer Portal at the following URL: https://access.redhat.com/