Hide Forgot
Description of problem: This is basically a dup of rhel7 bug 1185441. Unlike that bug however I was able to hit this failing just one device (the secondary log image). ================================================================================ Iteration 1.15 started at Wed Apr 6 20:30:12 CDT 2016 ================================================================================ Scenario kill_secondary_log_2_legs_2_logs: Kill secondary log of synced 2 leg redundant log mirror(s) ********* Mirror hash info for this scenario ********* * names: syncd_secondary_log_2legs_2logs_1 syncd_secondary_log_2legs_2logs_2 * sync: 1 * striped: 0 * leg devices: /dev/sdd1 /dev/sda1 * log devices: /dev/sdc1 /dev/sde1 * no MDA devices: * failpv(s): /dev/sde1 * failnode(s): host-137.virt.lab.msp.redhat.com * lvmetad: 1 * leg fault policy: remove * log fault policy: remove ****************************************************** Creating mirror(s) on host-137.virt.lab.msp.redhat.com... host-137.virt.lab.msp.redhat.com: lvcreate --type mirror --mirrorlog mirrored -m 1 -n syncd_secondary_log_2legs_2logs_1 -L 500M helter_skelter /dev/sdd1:0-2400 /dev/sda1:0-2400 /dev/sdc1:0-150 /dev/sde1:0-150 host-137.virt.lab.msp.redhat.com: lvcreate --type mirror --mirrorlog mirrored -m 1 -n syncd_secondary_log_2legs_2logs_2 -L 500M helter_skelter /dev/sdd1:0-2400 /dev/sda1:0-2400 /dev/sdc1:0-150 /dev/sde1:0-150 Current mirror/raid device structure(s): LV Attr LSize Cpy%Sync Devices syncd_secondary_log_2legs_2logs_1 mwi-a-m--- 500.00m 48.00 syncd_secondary_log_2legs_2logs_1_mimage_0(0),syncd_secondary_log_2legs_2logs_1_mimage_1(0) [syncd_secondary_log_2legs_2logs_1_mimage_0] Iwi-aom--- 500.00m /dev/sdd1(0) [syncd_secondary_log_2legs_2logs_1_mimage_1] Iwi-aom--- 500.00m /dev/sda1(0) [syncd_secondary_log_2legs_2logs_1_mlog] mwi-aom--- 4.00m 100.00 syncd_secondary_log_2legs_2logs_1_mlog_mimage_0(0),syncd_secondary_log_2legs_2logs_1_mlog_mimage_1(0) [syncd_secondary_log_2legs_2logs_1_mlog_mimage_0] iwi-aom--- 4.00m /dev/sdc1(0) [syncd_secondary_log_2legs_2logs_1_mlog_mimage_1] iwi-aom--- 4.00m /dev/sde1(0) syncd_secondary_log_2legs_2logs_2 mwi-a-m--- 500.00m 8.80 syncd_secondary_log_2legs_2logs_2_mimage_0(0),syncd_secondary_log_2legs_2logs_2_mimage_1(0) [syncd_secondary_log_2legs_2logs_2_mimage_0] Iwi-aom--- 500.00m /dev/sdd1(125) [syncd_secondary_log_2legs_2logs_2_mimage_1] Iwi-aom--- 500.00m /dev/sda1(125) [syncd_secondary_log_2legs_2logs_2_mlog] mwi-aom--- 4.00m 100.00 syncd_secondary_log_2legs_2logs_2_mlog_mimage_0(0),syncd_secondary_log_2legs_2logs_2_mlog_mimage_1(0) [syncd_secondary_log_2legs_2logs_2_mlog_mimage_0] iwi-aom--- 4.00m /dev/sdc1(1) [syncd_secondary_log_2legs_2logs_2_mlog_mimage_1] iwi-aom--- 4.00m /dev/sde1(1) Waiting until all mirror|raid volumes become fully syncd... 2/2 mirror(s) are fully synced: ( 100.00% 100.00% ) Creating ext on top of mirror(s) on host-137.virt.lab.msp.redhat.com... mke2fs 1.41.12 (17-May-2010) mke2fs 1.41.12 (17-May-2010) Mounting mirrored ext filesystems on host-137.virt.lab.msp.redhat.com... PV=/dev/sde1 syncd_secondary_log_2legs_2logs_1_mlog_mimage_1: 2 syncd_secondary_log_2legs_2logs_2_mlog_mimage_1: 2 PV=/dev/sde1 syncd_secondary_log_2legs_2logs_1_mlog_mimage_1: 2 syncd_secondary_log_2legs_2logs_2_mlog_mimage_1: 2 Writing verification files (checkit) to mirror(s) on... ---- host-137.virt.lab.msp.redhat.com ---- <start name="host-137.virt.lab.msp.redhat.com_syncd_secondary_log_2legs_2logs_1" pid="20488" time="Wed Apr 6 20:30:54 2016" type="cmd" /> <start name="host-137.virt.lab.msp.redhat.com_syncd_secondary_log_2legs_2logs_2" pid="20490" time="Wed Apr 6 20:30:54 2016" type="cmd" /> Sleeping 15 seconds to get some outsanding I/O locks before the failure Verifying files (checkit) on mirror(s) on... ---- host-137.virt.lab.msp.redhat.com ---- Disabling device sde on host-137.virt.lab.msp.redhat.com Getting recovery check start time from /var/log/messages: Apr 6 20:31 Attempting I/O to cause mirror down conversion(s) on host-137.virt.lab.msp.redhat.com dd if=/dev/zero of=/mnt/syncd_secondary_log_2legs_2logs_1/ddfile count=10 bs=4M oflag=direct 10+0 records in 10+0 records out 41943040 bytes (42 MB) copied, 0.389584 s, 108 MB/s dd if=/dev/zero of=/mnt/syncd_secondary_log_2legs_2logs_2/ddfile count=10 bs=4M oflag=direct Apr 6 20:31:22 host-137 qarshd[27055]: Running cmdline: echo offline > /sys/block/sde/device/state Apr 6 20:31:22 host-137 kernel: sd 6:0:0:1: rejecting I/O to offline device Apr 6 20:31:22 host-137 lvm[4181]: Secondary mirror device 253:3 has failed (D). Apr 6 20:31:22 host-137 lvm[4181]: Device failure in helter_skelter-syncd_secondary_log_2legs_2logs_1_mlog. Apr 6 20:31:22 host-137 kernel: sd 6:0:0:1: rejecting I/O to offline device Apr 6 20:31:22 host-137 lvm[4181]: /dev/sde1: read failed after 0 of 2048 at 0: Input/output error Apr 6 20:31:22 host-137 lvm[4181]: No PV label found on /dev/sde1. Apr 6 20:31:22 host-137 lvm[4181]: WARNING: Device for PV 0FvEsq-csxS-CP39-J1Ct-43pL-J2Pf-Wj2tvu not found or rejected by a filter. Apr 6 20:31:22 host-137 lvm[4181]: /dev/sde1: read failed after 0 of 512 at 4096: Input/output error Apr 6 20:31:22 host-137 lvm[4181]: WARNING: Failed to write an MDA of VG helter_skelter. Apr 6 20:31:22 host-137 kernel: sd 6:0:0:1: rejecting I/O to offline device Apr 6 20:31:22 host-137 kernel: sd 6:0:0:1: rejecting I/O to offline device Apr 6 20:31:22 host-137 lvm[4181]: WARNING: Failed to write an MDA of VG helter_skelter. Apr 6 20:31:22 host-137 kernel: sd 6:0:0:1: rejecting I/O to offline device Apr 6 20:31:22 host-137 kernel: sd 6:0:0:1: rejecting I/O to offline device Apr 6 20:31:22 host-137 lvm[4181]: WARNING: Failed to write an MDA of VG helter_skelter. Apr 6 20:31:22 host-137 kernel: sd 6:0:0:1: rejecting I/O to offline device Apr 6 20:31:22 host-137 kernel: sd 6:0:0:1: rejecting I/O to offline device Apr 6 20:31:22 host-137 lvm[4181]: WARNING: Failed to write an MDA of VG helter_skelter. Apr 6 20:31:22 host-137 kernel: sd 6:0:0:1: rejecting I/O to offline device Apr 6 20:31:22 host-137 kernel: sd 6:0:0:1: rejecting I/O to offline device Apr 6 20:31:22 host-137 lvm[4181]: Mirror log status: 1 of 2 images failed. Apr 6 20:31:23 host-137 kernel: sd 6:0:0:1: rejecting I/O to offline device Apr 6 20:31:23 host-137 lvm[4181]: Secondary mirror device 253:9 has failed (D). Apr 6 20:31:23 host-137 lvm[4181]: Device failure in helter_skelter-syncd_secondary_log_2legs_2logs_2_mlog. Apr 6 20:31:23 host-137 lvmetad[2491]: update_metadata from pv 4vAIVR-jP7F-l4lW-lXwO-iLEe-0oFx-RwEz2a same seqno 640 with unmatching data for helter_skelter 1jzatQ-Av0X-A0se-n6iQ-CE7H-T3Ia-ROb4tc Apr 6 20:31:23 host-137 lvmetad[2491]: Cannot use VG metadata for helter_skelter 1jzatQ-Av0X-A0se-n6iQ-CE7H-T3Ia-ROb4tc from PV 4vAIVR-jP7F-l4lW-lXwO-iLEe-0oFx-RwEz2a on 2081 Apr 6 20:31:23 host-137 lvm[4181]: /dev/sde1: read failed after 0 of 2048 at 0: Input/output error Apr 6 20:31:23 host-137 lvm[4181]: No PV label found on /dev/sde1. Apr 6 20:31:23 host-137 kernel: sd 6:0:0:1: rejecting I/O to offline device Apr 6 20:31:23 host-137 lvmetad[2491]: update_metadata from pv naupvF-h2R5-1wbf-pKdU-cRRq-qLq3-JIHBq3 same seqno 640 with unmatching data for helter_skelter 1jzatQ-Av0X-A0se-n6iQ-CE7H-T3Ia-ROb4tc Apr 6 20:31:23 host-137 lvmetad[2491]: Cannot use VG metadata for helter_skelter 1jzatQ-Av0X-A0se-n6iQ-CE7H-T3Ia-ROb4tc from PV naupvF-h2R5-1wbf-pKdU-cRRq-qLq3-JIHBq3 on 2097 Apr 6 20:31:23 host-137 lvmetad[2491]: update_metadata from pv 5ih1Rm-D8Cf-vapU-bNu8-y8FS-VMu4-WIOH4c same seqno 640 with unmatching data for helter_skelter 1jzatQ-Av0X-A0se-n6iQ-CE7H-T3Ia-ROb4tc Apr 6 20:31:23 host-137 lvmetad[2491]: Cannot use VG metadata for helter_skelter 1jzatQ-Av0X-A0se-n6iQ-CE7H-T3Ia-ROb4tc from PV 5ih1Rm-D8Cf-vapU-bNu8-y8FS-VMu4-WIOH4c on 2049 Apr 6 20:31:23 host-137 lvm[4181]: WARNING: Device for PV 0FvEsq-csxS-CP39-J1Ct-43pL-J2Pf-Wj2tvu not found or rejected by a filter. Apr 6 20:31:23 host-137 lvm[4181]: syncd_secondary_log_2legs_2logs_2 is consistent. Nothing to repair. Apr 6 20:31:27 host-137 qarshd[27100]: Running cmdline: pvs -a Apr 6 20:31:28 host-137 kernel: sd 6:0:0:1: rejecting I/O to offline device Apr 6 20:31:28 host-137 kernel: sd 6:0:0:1: rejecting I/O to offline device Apr 6 20:31:28 host-137 kernel: sd 6:0:0:1: rejecting I/O to offline device Apr 6 20:31:28 host-137 kernel: sd 6:0:0:1: rejecting I/O to offline device Apr 6 20:31:28 host-137 qarshd[27103]: Talking to peer 10.15.80.224:42180 (IPv4) Apr 6 20:31:28 host-137 qarshd[27103]: Running cmdline: tail -n1 /var/log/messages | cut -c 1-12 Apr 6 20:31:28 host-137 qarshd[27107]: Talking to peer 10.15.80.224:42182 (IPv4) Apr 6 20:31:28 host-137 qarshd[27107]: Running cmdline: dd if=/dev/zero of=/mnt/syncd_secondary_log_2legs_2logs_1/ddfile count=10 bs=4M oflag=direct Apr 6 20:31:28 host-137 qarshd[27109]: Talking to peer 10.15.80.224:42186 (IPv4) Apr 6 20:31:29 host-137 qarshd[27109]: Running cmdline: dd if=/dev/zero of=/mnt/syncd_secondary_log_2legs_2logs_2/ddfile count=10 bs=4M oflag=direct Apr 6 20:35:12 host-137 kernel: INFO: task kmirrord:26939 blocked for more than 120 seconds. Apr 6 20:35:12 host-137 kernel: Not tainted 2.6.32-639.el6.x86_64 #1 Apr 6 20:35:12 host-137 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Apr 6 20:35:12 host-137 kernel: kmirrord D 0000000000000000 0 26939 2 0x00000080 Apr 6 20:35:12 host-137 kernel: ffff88003d9afa70 0000000000000046 ffff88003e1daab0 ffff88003e1daab0 Apr 6 20:35:12 host-137 kernel: ffff88003d9afaa0 ffffffff8127dfb0 7fffffffffffffff ffff88003a2953c0 Apr 6 20:35:12 host-137 kernel: 0000000000000003 ffff88000fd0000a ffff88003e1db068 ffff88003d9affd8 Apr 6 20:35:12 host-137 kernel: Call Trace: Apr 6 20:35:12 host-137 kernel: [<ffffffff8127dfb0>] ? generic_make_request+0x240/0x5a0 Apr 6 20:35:12 host-137 kernel: [<ffffffff8127e380>] ? submit_bio+0x70/0x120 Apr 6 20:35:12 host-137 kernel: [<ffffffff81548415>] schedule_timeout+0x215/0x2e0 Apr 6 20:35:12 host-137 kernel: [<ffffffff8127e380>] ? submit_bio+0x70/0x120 Apr 6 20:35:12 host-137 kernel: [<ffffffff81548073>] wait_for_common+0x123/0x180 Apr 6 20:35:12 host-137 kernel: [<ffffffff8106b9a0>] ? default_wake_function+0x0/0x20 Apr 6 20:35:12 host-137 kernel: [<ffffffff815481ad>] wait_for_completion+0x1d/0x20 Apr 6 20:35:12 host-137 kernel: [<ffffffffa000aedc>] sync_io+0x9c/0xf0 [dm_mod] Apr 6 20:35:12 host-137 kernel: [<ffffffff81130d65>] ? mempool_kmalloc+0x15/0x20 Apr 6 20:35:12 host-137 kernel: [<ffffffff81130f43>] ? mempool_alloc+0x63/0x140 Apr 6 20:35:12 host-137 kernel: [<ffffffffa000b157>] dm_io+0x1b7/0x1c0 [dm_mod] Apr 6 20:35:12 host-137 kernel: [<ffffffffa000a9c0>] ? vm_get_page+0x0/0x70 [dm_mod] Apr 6 20:35:12 host-137 kernel: [<ffffffffa000a930>] ? vm_next_page+0x0/0x30 [dm_mod] Apr 6 20:35:12 host-137 kernel: [<ffffffffa0025c61>] disk_flush+0x91/0x170 [dm_log] Apr 6 20:35:12 host-137 kernel: [<ffffffffa002c740>] ? dm_rh_inc+0xc0/0xd0 [dm_region_hash] Apr 6 20:35:12 host-137 kernel: [<ffffffffa002c0d3>] dm_rh_flush+0x13/0x20 [dm_region_hash] Apr 6 20:35:12 host-137 kernel: [<ffffffffa003526f>] do_mirror+0x27f/0x748 [dm_mirror] Apr 6 20:35:12 host-137 kernel: [<ffffffffa0034ff0>] ? do_mirror+0x0/0x748 [dm_mirror] Apr 6 20:35:12 host-137 kernel: [<ffffffff8109fda0>] worker_thread+0x170/0x2a0 Apr 6 20:35:12 host-137 kernel: [<ffffffff810a6aa0>] ? autoremove_wake_function+0x0/0x40 Apr 6 20:35:12 host-137 kernel: [<ffffffff8109fc30>] ? worker_thread+0x0/0x2a0 Apr 6 20:35:12 host-137 kernel: [<ffffffff810a660e>] kthread+0x9e/0xc0 Apr 6 20:35:12 host-137 kernel: [<ffffffff8100c28a>] child_rip+0xa/0x20 Apr 6 20:35:12 host-137 kernel: [<ffffffff810a6570>] ? kthread+0x0/0xc0 Apr 6 20:35:12 host-137 kernel: [<ffffffff8100c280>] ? child_rip+0x0/0x20 Apr 6 20:35:12 host-137 kernel: INFO: task flush-253:13:27007 blocked for more than 120 seconds. Apr 6 20:35:12 host-137 kernel: Not tainted 2.6.32-639.el6.x86_64 #1 Apr 6 20:35:12 host-137 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Apr 6 20:35:12 host-137 kernel: flush-253:13 D 0000000000000000 0 27007 2 0x00000080 Apr 6 20:35:12 host-137 kernel: ffff88003b1db840 0000000000000046 ffff880000000000 001edccf87557470 Apr 6 20:35:12 host-137 kernel: ffff880020a9b5c0 ffff88003df43280 0000000000ceda84 ffffffffa8fa6960 Apr 6 20:35:12 host-137 kernel: 00000000021bf4e9 ffffffff81aa8340 ffff880039e27068 ffff88003b1dbfd8 Apr 6 20:35:12 host-137 kernel: Call Trace: Apr 6 20:35:12 host-137 kernel: [<ffffffff811d12a0>] ? sync_buffer+0x0/0x50 Apr 6 20:35:12 host-137 kernel: [<ffffffff81547ce3>] io_schedule+0x73/0xc0 Apr 6 20:35:12 host-137 kernel: [<ffffffff811d12e0>] sync_buffer+0x40/0x50 Apr 6 20:35:12 host-137 kernel: [<ffffffff8154859a>] __wait_on_bit_lock+0x5a/0xc0 Apr 6 20:35:12 host-137 kernel: [<ffffffff811d12a0>] ? sync_buffer+0x0/0x50 Apr 6 20:35:12 host-137 kernel: [<ffffffff81548678>] out_of_line_wait_on_bit_lock+0x78/0x90 Apr 6 20:35:12 host-137 kernel: [<ffffffff810a6b20>] ? wake_bit_function+0x0/0x50 Apr 6 20:35:12 host-137 kernel: [<ffffffff811d1610>] ? end_buffer_async_write+0x0/0x190 Apr 6 20:35:12 host-137 kernel: [<ffffffff811d1486>] __lock_buffer+0x36/0x40 Apr 6 20:35:12 host-137 kernel: [<ffffffff811d2795>] __block_write_full_page+0x305/0x330 Apr 6 20:35:12 host-137 kernel: [<ffffffff811d1610>] ? end_buffer_async_write+0x0/0x190 Apr 6 20:35:12 host-137 kernel: [<ffffffff811d28a0>] block_write_full_page_endio+0xe0/0x120 Apr 6 20:35:12 host-137 kernel: [<ffffffffa046cd00>] ? buffer_unmapped+0x0/0x20 [ext3] Apr 6 20:35:12 host-137 kernel: [<ffffffff811d28f5>] block_write_full_page+0x15/0x20 Apr 6 20:35:12 host-137 kernel: [<ffffffffa046d8ad>] ext3_ordered_writepage+0x1ed/0x240 [ext3] Apr 6 20:35:12 host-137 kernel: [<ffffffff81142437>] __writepage+0x17/0x40 Apr 6 20:35:12 host-137 kernel: [<ffffffff811436fd>] write_cache_pages+0x1fd/0x4c0 Apr 6 20:35:12 host-137 kernel: [<ffffffff81142420>] ? __writepage+0x0/0x40 Apr 6 20:35:12 host-137 kernel: [<ffffffff811439e4>] generic_writepages+0x24/0x30 Apr 6 20:35:12 host-137 kernel: [<ffffffff81143a25>] do_writepages+0x35/0x40 Apr 6 20:35:12 host-137 kernel: [<ffffffff811c703d>] writeback_single_inode+0xdd/0x290 Apr 6 20:35:12 host-137 kernel: [<ffffffff811c743d>] writeback_sb_inodes+0xbd/0x170 Apr 6 20:35:12 host-137 kernel: [<ffffffff811c759b>] writeback_inodes_wb+0xab/0x1b0 Apr 6 20:35:12 host-137 kernel: [<ffffffff811c7993>] wb_writeback+0x2f3/0x410 Apr 6 20:35:12 host-137 kernel: [<ffffffff8108fdb2>] ? del_timer_sync+0x22/0x30 Apr 6 20:35:12 host-137 kernel: [<ffffffff811c7c55>] wb_do_writeback+0x1a5/0x240 Apr 6 20:35:12 host-137 kernel: [<ffffffff811c7d53>] bdi_writeback_task+0x63/0x1b0 Apr 6 20:35:12 host-137 kernel: [<ffffffff810a6927>] ? bit_waitqueue+0x17/0xd0 Apr 6 20:35:12 host-137 kernel: [<ffffffff81152bf0>] ? bdi_start_fn+0x0/0x100 Apr 6 20:35:12 host-137 kernel: [<ffffffff81152c76>] bdi_start_fn+0x86/0x100 Apr 6 20:35:12 host-137 kernel: [<ffffffff81152bf0>] ? bdi_start_fn+0x0/0x100 Apr 6 20:35:12 host-137 kernel: [<ffffffff810a660e>] kthread+0x9e/0xc0 Apr 6 20:35:12 host-137 kernel: [<ffffffff8100c28a>] child_rip+0xa/0x20 Apr 6 20:35:12 host-137 kernel: [<ffffffff810a6570>] ? kthread+0x0/0xc0 Apr 6 20:35:12 host-137 kernel: [<ffffffff8100c280>] ? child_rip+0x0/0x20 Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
Version-Release number of selected component (if applicable): 2.6.32-639.el6.x86_64 lvm2-2.02.143-7.el6 BUILT: Wed Apr 6 10:08:33 CDT 2016 lvm2-libs-2.02.143-7.el6 BUILT: Wed Apr 6 10:08:33 CDT 2016 lvm2-cluster-2.02.143-7.el6 BUILT: Wed Apr 6 10:08:33 CDT 2016 udev-147-2.72.el6 BUILT: Tue Mar 1 06:14:05 CST 2016 device-mapper-1.02.117-7.el6 BUILT: Wed Apr 6 10:08:33 CDT 2016 device-mapper-libs-1.02.117-7.el6 BUILT: Wed Apr 6 10:08:33 CDT 2016 device-mapper-event-1.02.117-7.el6 BUILT: Wed Apr 6 10:08:33 CDT 2016 device-mapper-event-libs-1.02.117-7.el6 BUILT: Wed Apr 6 10:08:33 CDT 2016 device-mapper-persistent-data-0.6.2-0.1.rc7.el6 BUILT: Tue Mar 22 08:58:09 CDT 2016 cmirror-2.02.143-7.el6 BUILT: Wed Apr 6 10:08:33 CDT 2016
Red Hat Enterprise Linux 6 is in the Production 3 Phase. During the Production 3 Phase, Critical impact Security Advisories (RHSAs) and selected Urgent Priority Bug Fix Advisories (RHBAs) may be released as they become available. The official life cycle policy can be reviewed here: http://redhat.com/rhel/lifecycle This issue does not meet the inclusion criteria for the Production 3 Phase and will be marked as CLOSED/WONTFIX. If this remains a critical requirement, please contact Red Hat Customer Support to request a re-evaluation of the issue, citing a clear business justification. Note that a strong business justification will be required for re-evaluation. Red Hat Customer Support can be contacted via the Red Hat Customer Portal at the following URL: https://access.redhat.com/