Hide Forgot
Description of problem: ./helter_skelter -f -e kill_secondary_and_log_2_devices [...] ================================================================================ Iteration 0.3 started at Wed Sep 28 18:38:26 CDT 2016 ================================================================================ Scenario kill_secondary_and_log_2_devices: Kill secondary and log of synced 2 device alloc anywhere mirror(s) ********* Mirror hash info for this scenario ********* * names: syncd_secondary_log_2devices_1 * sync: 1 * striped: 0 * leg devices: /dev/mapper/mpathg1 /dev/mapper/mpathe1 * log devices: /dev/mapper/mpathe1 * no MDA devices: * failpv(s): /dev/mapper/mpathe1 * failnode(s): mckinley-01 mckinley-02 mckinley-03 mckinley-04 * lvmetad: 0 * leg fault policy: remove * log fault policy: remove ****************************************************** Creating mirror(s) on mckinley-04... mckinley-04: lvcreate -ay --type mirror --alloc anywhere -m 1 -n syncd_secondary_log_2devices_1 -L 500M helter_skelter /dev/mapper/mpathg1:0-2400 /dev/mapper/mpathe1:0-2400 WARNING: gfs2 signature detected on /dev/helter_skelter/syncd_secondary_log_2devices_1 at offset 65536. Wipe it? [y/n]: [n] Aborted wiping of gfs2. 1 existing signature left on the device. Current mirror/raid device structure(s): LV Attr LSize Cpy%Sync Devices syncd_secondary_log_2devices_1 mwa-a-m--- 500.00m 1.60 syncd_secondary_log_2devices_1_mimage_0(0),syncd_secondary_log_2devices_1_mimage_1(0) [syncd_secondary_log_2devices_1_mimage_0] Iwa-aom--- 500.00m /dev/mapper/mpathg1(0) [syncd_secondary_log_2devices_1_mimage_1] Iwa-aom--- 500.00m /dev/mapper/mpathe1(0) [syncd_secondary_log_2devices_1_mlog] lwa-aom--- 4.00m /dev/mapper/mpathe1(125) Waiting until all mirror|raid volumes become fully syncd... 0/1 mirror(s) are fully synced: ( 26.90% ) 1/1 mirror(s) are fully synced: ( 100.00% ) Creating gfs2 on top of mirror(s) on mckinley-01... mkfs.gfs2 -o align=0 -J 32M -j 4 -p lock_dlm -t MCKINLEY:gfs1 /dev/helter_skelter/syncd_secondary_log_2devices_1 -O Mounting mirrored gfs2 filesystems on mckinley-01... Mounting mirrored gfs2 filesystems on mckinley-02... Mounting mirrored gfs2 filesystems on mckinley-03... Mounting mirrored gfs2 filesystems on mckinley-04... PV=/dev/mapper/mpathe1 syncd_secondary_log_2devices_1_mimage_1: 6 syncd_secondary_log_2devices_1_mlog: 2 PV=/dev/mapper/mpathe1 syncd_secondary_log_2devices_1_mimage_1: 6 syncd_secondary_log_2devices_1_mlog: 2 Writing verification files (checkit) to mirror(s) on... ---- mckinley-01 ---- ---- mckinley-02 ---- ---- mckinley-03 ---- ---- mckinley-04 ---- Sleeping 15 seconds to get some outsanding I/O locks before the failure Verifying files (checkit) on mirror(s) on... ---- mckinley-01 ---- ---- mckinley-02 ---- ---- mckinley-03 ---- ---- mckinley-04 ---- Failing path sdf on mckinley-01 Failing path sdn on mckinley-01 Failing path sdv on mckinley-01 Failing path sdad on mckinley-01 Failing path sdg on mckinley-02 Failing path sdo on mckinley-02 Failing path sdw on mckinley-02 Failing path sdae on mckinley-02 Failing path sdg on mckinley-03 Failing path sdo on mckinley-03 Failing path sdw on mckinley-03 Failing path sdae on mckinley-03 Failing path sdg on mckinley-04 Failing path sdo on mckinley-04 Failing path sdw on mckinley-04 Failing path sdae on mckinley-04 Getting recovery check start time from /var/log/messages: Sep 28 18:41 Attempting I/O to cause mirror down conversion(s) on mckinley-01 dd if=/dev/zero of=/mnt/syncd_secondary_log_2devices_1/ddfile count=10 bs=4M [root@mckinley-01 ~]# lvs -a -o +devices /dev/mapper/mpathe: read failed after 0 of 4096 at 0: Input/output error /dev/mapper/mpathe: read failed after 0 of 4096 at 268435390464: Input/output error /dev/mapper/mpathe: read failed after 0 of 4096 at 268435447808: Input/output error /dev/mapper/mpathe: read failed after 0 of 4096 at 4096: Input/output error /dev/mapper/mpathe1: read failed after 0 of 2048 at 0: Input/output error /dev/mapper/mpathe1: read failed after 0 of 2048 at 268431851520: Input/output error /dev/mapper/mpathe1: read failed after 0 of 2048 at 268431970304: Input/output error /dev/mapper/mpathe1: read failed after 0 of 2048 at 4096: Input/output error Couldn't find device with uuid hkhWA7-05OK-M5DV-xJeo-DIBc-LvLw-HKSDv7. WARNING: Couldn't find all devices for LV helter_skelter/syncd_secondary_log_2devices_1_mimage_1 while checking used and assumed devices. WARNING: Couldn't find all devices for LV helter_skelter/syncd_secondary_log_2devices_1_mlog while checking used and assumed devices. LV VG Attr LSize Log Cpy%Sync Devices syncd_secondary_log_2devices_1 helter_skelter mwa-aom-p- 500.00m [syncd_secondary_log_2devices_1_mlog] 100.00 syncd_secondary_log_2devices_1_mimage_0(0),syncd_secondary_log_2devices_1_mimage_1(0) [syncd_secondary_log_2devices_1_mimage_0] helter_skelter iwa-aom--- 500.00m /dev/mapper/mpathg1(0) [syncd_secondary_log_2devices_1_mimage_1] helter_skelter iwa-aom-p- 500.00m [unknown](0) [syncd_secondary_log_2devices_1_mlog] helter_skelter lwa-aom-p- 4.00m [unknown](125) Sep 28 18:40:58 mckinley-01 qarshd[49969]: Running cmdline: multipathd fail path sdv Sep 28 18:40:58 mckinley-01 multipathd: mpathe: fail path 65:80 (operator) Sep 28 18:40:58 mckinley-01 kernel: device-mapper: multipath: Failing path 65:80. Sep 28 18:40:59 mckinley-01 systemd: Started qarsh Per-Connection Server (10.15.80.224:53666). Sep 28 18:40:59 mckinley-01 systemd: Starting qarsh Per-Connection Server (10.15.80.224:53666)... Sep 28 18:40:59 mckinley-01 qarshd[49974]: Talking to peer ::ffff:10.15.80.224:53666 (IPv6) Sep 28 18:40:59 mckinley-01 qarshd[49974]: Running cmdline: multipathd fail path sdad Sep 28 18:40:59 mckinley-01 multipathd: mpathe: fail path 65:208 (operator) Sep 28 18:40:59 mckinley-01 kernel: device-mapper: multipath: Failing path 65:208. Sep 28 18:40:59 mckinley-01 kernel: blk_update_request: 14 callbacks suppressed Sep 28 18:40:59 mckinley-01 kernel: blk_update_request: I/O error, dev dm-11, sector 1089599 Sep 28 18:40:59 mckinley-01 cmirrord[11251]: [LbMWHQ6e] rw_log: write failure: Input/output error Sep 28 18:40:59 mckinley-01 cmirrord[11251]: [LbMWHQ6e] Error writing to disk log Sep 28 18:40:59 mckinley-01 multipathd: sdn: mark as failed Sep 28 18:40:59 mckinley-01 multipathd: mpathe: remaining active paths: 2 Sep 28 18:40:59 mckinley-01 multipathd: sdad: mark as failed Sep 28 18:40:59 mckinley-01 multipathd: mpathe: remaining active paths: 1 Sep 28 18:40:59 mckinley-01 multipathd: sdv: mark as failed Sep 28 18:40:59 mckinley-01 multipathd: mpathe: remaining active paths: 0 Sep 28 18:41:00 mckinley-01 cmirrord[11251]: [LbMWHQ6e] rw_log: write failure: Input/output error Sep 28 18:41:00 mckinley-01 kernel: blk_update_request: I/O error, dev dm-11, sector 1089599 Sep 28 18:41:00 mckinley-01 cmirrord[11251]: [LbMWHQ6e] Error writing to disk log Sep 28 18:41:01 mckinley-01 cmirrord[11251]: [LbMWHQ6e] rw_log: write failure: Input/output error Sep 28 18:41:01 mckinley-01 cmirrord[11251]: [LbMWHQ6e] Error writing to disk log Sep 28 18:41:01 mckinley-01 kernel: blk_update_request: I/O error, dev dm-11, sector 1089599 Sep 28 18:41:02 mckinley-01 multipathd: sdf: unusable path Sep 28 18:41:02 mckinley-01 cmirrord[11251]: [LbMWHQ6e] rw_log: write failure: Input/output error Sep 28 18:41:02 mckinley-01 cmirrord[11251]: [LbMWHQ6e] Error writing to disk log Sep 28 18:41:02 mckinley-01 kernel: blk_update_request: I/O error, dev dm-11, sector 1089599 Sep 28 18:41:03 mckinley-01 multipathd: sdn: unusable path Sep 28 18:41:03 mckinley-01 multipathd: sdv: unusable path Sep 28 18:41:03 mckinley-01 multipathd: sdad: unusable path Sep 28 18:41:04 mckinley-01 cmirrord[11251]: [LbMWHQ6e] rw_log: write failure: Input/output error Sep 28 18:41:04 mckinley-01 cmirrord[11251]: [LbMWHQ6e] Error writing to disk log Sep 28 18:41:04 mckinley-01 kernel: blk_update_request: I/O error, dev dm-11, sector 1089599 Sep 28 18:41:05 mckinley-01 cmirrord[11251]: [LbMWHQ6e] rw_log: write failure: Input/output error Sep 28 18:41:05 mckinley-01 kernel: blk_update_request: I/O error, dev dm-11, sector 1089599 Sep 28 18:41:05 mckinley-01 cmirrord[11251]: [LbMWHQ6e] Error writing to disk log Sep 28 18:41:06 mckinley-01 cmirrord[11251]: [LbMWHQ6e] rw_log: write failure: Input/output error Sep 28 18:41:06 mckinley-01 cmirrord[11251]: [LbMWHQ6e] Error writing to disk log Sep 28 18:41:06 mckinley-01 kernel: blk_update_request: I/O error, dev dm-11, sector 1089599 Sep 28 18:41:07 mckinley-01 multipathd: sdf: unusable path Sep 28 18:41:07 mckinley-01 kernel: blk_update_request: I/O error, dev dm-11, sector 0 Sep 28 18:41:07 mckinley-01 kernel: blk_update_request: I/O error, dev dm-11, sector 524287872 Sep 28 18:41:07 mckinley-01 kernel: blk_update_request: I/O error, dev dm-11, sector 524287984 Sep 28 18:41:07 mckinley-01 kernel: blk_update_request: I/O error, dev dm-11, sector 0 Sep 28 18:41:07 mckinley-01 kernel: blk_update_request: I/O error, dev dm-11, sector 8 Sep 28 18:41:07 mckinley-01 kernel: blk_update_request: I/O error, dev dm-11, sector 0 Sep 28 18:41:07 mckinley-01 kernel: blk_update_request: I/O error, dev dm-11, sector 63 Sep 28 18:41:07 mckinley-01 kernel: blk_update_request: I/O error, dev dm-11, sector 524281023 Sep 28 18:41:08 mckinley-01 kernel: sd 7:0:0:4: Asymmetric access state changed Sep 28 18:41:08 mckinley-01 kernel: sd 7:0:2:4: Asymmetric access state changed Sep 28 18:41:08 mckinley-01 kernel: sd 7:0:1:4: Asymmetric access state changed Sep 28 18:41:08 mckinley-01 kernel: sd 7:0:3:4: Asymmetric access state changed Sep 28 18:41:08 mckinley-01 systemd: Started qarsh Per-Connection Server (10.15.80.224:53796). Sep 28 18:41:08 mckinley-01 systemd: Starting qarsh Per-Connection Server (10.15.80.224:53796)... Sep 28 18:41:08 mckinley-01 qarshd[50012]: Talking to peer ::ffff:10.15.80.224:53796 (IPv6) Sep 28 18:41:08 mckinley-01 multipathd: sdn: unusable path Sep 28 18:41:08 mckinley-01 multipathd: sdv: unusable path Sep 28 18:41:08 mckinley-01 multipathd: sdad: unusable path Sep 28 18:41:08 mckinley-01 qarshd[50012]: Running cmdline: dd if=/dev/zero of=/mnt/syncd_secondary_log_2devices_1/ddfile count=10 bs=4M Sep 28 18:41:08 mckinley-01 cmirrord[11251]: [LbMWHQ6e] rw_log: write failure: Input/output error Sep 28 18:41:08 mckinley-01 cmirrord[11251]: [LbMWHQ6e] Error writing to disk log [...] Sep 28 18:43:29 mckinley-01 kernel: INFO: task kworker/u194:1:49875 blocked for more than 120 seconds. Sep 28 18:43:29 mckinley-01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Sep 28 18:43:29 mckinley-01 kernel: kworker/u194:1 D ffff881ff665b9b8 0 49875 2 0x00000080 Sep 28 18:43:29 mckinley-01 kernel: Workqueue: writeback bdi_writeback_workfn (flush-253:22) Sep 28 18:43:30 mckinley-01 kernel: ffff881ff665b850 0000000000000046 ffff881ffe6fde20 ffff881ff665bfd8 Sep 28 18:43:30 mckinley-01 kernel: ffff881ff665bfd8 ffff881ff665bfd8 ffff881ffe6fde20 ffff881ffead6c40 Sep 28 18:43:30 mckinley-01 kernel: 0000000000000000 7fffffffffffffff ffffffff81688ef0 ffff881ff665b9b8 Sep 28 18:43:30 mckinley-01 kernel: Call Trace: Sep 28 18:43:30 mckinley-01 kernel: [<ffffffff81688ef0>] ? bit_wait+0x50/0x50 Sep 28 18:43:30 mckinley-01 kernel: [<ffffffff8168ae79>] schedule+0x29/0x70 Sep 28 18:43:30 mckinley-01 kernel: [<ffffffff816888c9>] schedule_timeout+0x239/0x2d0 Sep 28 18:43:30 mckinley-01 kernel: [<ffffffff81688ef0>] ? bit_wait+0x50/0x50 Sep 28 18:43:30 mckinley-01 kernel: [<ffffffff8168a41e>] io_schedule_timeout+0xae/0x130 Sep 28 18:43:30 mckinley-01 kernel: [<ffffffff8168a4b8>] io_schedule+0x18/0x20 Sep 28 18:43:30 mckinley-01 kernel: [<ffffffff81688f01>] bit_wait_io+0x11/0x50 Sep 28 18:43:30 mckinley-01 kernel: [<ffffffff81688c1f>] __wait_on_bit_lock+0x5f/0xc0 Sep 28 18:43:30 mckinley-01 kernel: [<ffffffff8117fd14>] __lock_page+0x74/0x90 Sep 28 18:43:30 mckinley-01 kernel: [<ffffffff810b1590>] ? wake_bit_function+0x40/0x40 Sep 28 18:43:30 mckinley-01 kernel: [<ffffffff8118be80>] write_cache_pages+0x420/0x4d0 Sep 28 18:43:30 mckinley-01 kernel: [<ffffffff8123ea20>] ? clean_buffers+0x60/0x60 Sep 28 18:43:30 mckinley-01 kernel: [<ffffffffa066ac80>] ? gfs2_writepages+0x20/0x20 [gfs2] Sep 28 18:43:30 mckinley-01 kernel: [<ffffffff8123e959>] mpage_writepages+0x69/0xd0 Sep 28 18:43:30 mckinley-01 kernel: [<ffffffffa066ac80>] ? gfs2_writepages+0x20/0x20 [gfs2] Sep 28 18:43:30 mckinley-01 kernel: [<ffffffffa066ac75>] gfs2_writepages+0x15/0x20 [gfs2] Sep 28 18:43:30 mckinley-01 kernel: [<ffffffff8118d02e>] do_writepages+0x1e/0x40 Sep 28 18:43:30 mckinley-01 kernel: [<ffffffff812283e0>] __writeback_single_inode+0x40/0x210 Sep 28 18:43:30 mckinley-01 kernel: [<ffffffff812290ce>] writeback_sb_inodes+0x25e/0x420 Sep 28 18:43:30 mckinley-01 kernel: [<ffffffff8122932f>] __writeback_inodes_wb+0x9f/0xd0 Sep 28 18:43:30 mckinley-01 kernel: [<ffffffff81229b73>] wb_writeback+0x263/0x2f0 Sep 28 18:43:30 mckinley-01 kernel: [<ffffffff8118c6d0>] ? bdi_dirty_limit+0x40/0xe0 Sep 28 18:43:30 mckinley-01 kernel: [<ffffffff8122ba6c>] bdi_writeback_workfn+0x1cc/0x460 Sep 28 18:43:30 mckinley-01 kernel: [<ffffffff810a7e0b>] process_one_work+0x17b/0x470 Sep 28 18:43:30 mckinley-01 kernel: [<ffffffff810a8c46>] worker_thread+0x126/0x410 Sep 28 18:43:30 mckinley-01 kernel: [<ffffffff810a8b20>] ? rescuer_thread+0x460/0x460 Sep 28 18:43:30 mckinley-01 kernel: [<ffffffff810b03ff>] kthread+0xcf/0xe0 Sep 28 18:43:30 mckinley-01 kernel: [<ffffffff810b0330>] ? kthread_create_on_node+0x140/0x140 Sep 28 18:43:30 mckinley-01 kernel: [<ffffffff81695d18>] ret_from_fork+0x58/0x90 Sep 28 18:43:30 mckinley-01 kernel: [<ffffffff810b0330>] ? kthread_create_on_node+0x140/0x140 Version-Release number of selected component (if applicable): 3.10.0-505.el7.x86_64 lvm2-2.02.166-1.el7 BUILT: Wed Sep 28 02:26:52 CDT 2016 lvm2-libs-2.02.166-1.el7 BUILT: Wed Sep 28 02:26:52 CDT 2016 lvm2-cluster-2.02.166-1.el7 BUILT: Wed Sep 28 02:26:52 CDT 2016 device-mapper-1.02.135-1.el7 BUILT: Wed Sep 28 02:26:52 CDT 2016 device-mapper-libs-1.02.135-1.el7 BUILT: Wed Sep 28 02:26:52 CDT 2016 device-mapper-event-1.02.135-1.el7 BUILT: Wed Sep 28 02:26:52 CDT 2016 device-mapper-event-libs-1.02.135-1.el7 BUILT: Wed Sep 28 02:26:52 CDT 2016 device-mapper-persistent-data-0.6.3-1.el7 BUILT: Fri Jul 22 05:29:13 CDT 2016 cmirror-2.02.166-1.el7 BUILT: Wed Sep 28 02:26:52 CDT 2016
This is reproducible. Another similar scenario: kill_secondary_and_log_synced_4_legs
Even the basic log only failure scenario can cause this. ================================================================================ Iteration 0.1 started at Thu Jul 6 16:37:21 CDT 2017 ================================================================================ ACTUAL LEG ORDER: /dev/sda1 /dev/sdb1 /dev/sdc1 Scenario kill_random_devices: Kill random devices (log and legs) ********* Mirror info for this scenario ********* * mirrors: mirror_1 * leg devices: /dev/sda1 /dev/sdb1 /dev/sdc1 * log devices: /dev/sdd1 * failpv(s): /dev/sdd1 * failnode(s): host-117 host-121 host-122 * lvmetad: 0 * leg fault policy: allocate * log fault policy: allocate ************************************************* Current mirror/raid device structure(s): LV Attr LSize Cpy%Sync Devices mirror_1 mwi-aom--- 2.00g 100.00 mirror_1_mimage_0(0),mirror_1_mimage_1(0),mirror_1_mimage_2(0) [mirror_1_mimage_0] iwi-aom--- 2.00g /dev/sda1(0) [mirror_1_mimage_1] iwi-aom--- 2.00g /dev/sdb1(0) [mirror_1_mimage_2] iwi-aom--- 2.00g /dev/sdc1(0) [mirror_1_mlog] lwi-aom--- 4.00m /dev/sdd1(0) PV=/dev/sdd1 mirror_1_mlog: 1.2 PV=/dev/sdd1 mirror_1_mlog: 1.2 Writing verification files (checkit) to mirror(s) on... ---- host-117 ---- ---- host-121 ---- ---- host-122 ---- <start name="host-117_mirror_1" pid="434" time="Thu Jul 6 16:37:30 2017 -0500" type="cmd" /> <start name="host-121_mirror_1" pid="435" time="Thu Jul 6 16:37:30 2017 -0500" type="cmd" /> <start name="host-122_mirror_1" pid="437" time="Thu Jul 6 16:37:30 2017 -0500" type="cmd" /> Sleeping 15 seconds to get some outsanding I/O locks before the failure Verifying files (checkit) on mirror(s) on... ---- host-117 ---- ---- host-121 ---- ---- host-122 ---- Disabling device sdd on host-117rescan device... Disabling device sdd on host-121rescan device... Disabling device sdd on host-122rescan device... [DEADLOCK] 3.10.0-689.el7.x86_64 lvm2-2.02.171-8.el7 BUILT: Wed Jun 28 13:28:58 CDT 2017 lvm2-libs-2.02.171-8.el7 BUILT: Wed Jun 28 13:28:58 CDT 2017 lvm2-cluster-2.02.171-8.el7 BUILT: Wed Jun 28 13:28:58 CDT 2017 device-mapper-1.02.140-8.el7 BUILT: Wed Jun 28 13:28:58 CDT 2017 device-mapper-libs-1.02.140-8.el7 BUILT: Wed Jun 28 13:28:58 CDT 2017 device-mapper-event-1.02.140-8.el7 BUILT: Wed Jun 28 13:28:58 CDT 2017 device-mapper-event-libs-1.02.140-8.el7 BUILT: Wed Jun 28 13:28:58 CDT 2017 device-mapper-persistent-data-0.7.0-0.1.rc6.el7 BUILT: Mon Mar 27 10:15:46 CDT 2017 cmirror-2.02.171-8.el7 BUILT: Wed Jun 28 13:28:58 CDT 2017