Bug 1304045
Summary: | type mirror no longer able to survive device failure | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Corey Marthaler <cmarthal> | ||||||
Component: | lvm2 | Assignee: | Heinz Mauelshagen <heinzm> | ||||||
lvm2 sub component: | Mirroring and RAID (RHEL6) | QA Contact: | cluster-qe <cluster-qe> | ||||||
Status: | CLOSED WORKSFORME | Docs Contact: | |||||||
Severity: | high | ||||||||
Priority: | unspecified | CC: | agk, heinzm, jbrassow, msnitzer, prajnoha, prockai, zkabelac | ||||||
Version: | 6.8 | Keywords: | Regression, TestBlocker | ||||||
Target Milestone: | rc | ||||||||
Target Release: | --- | ||||||||
Hardware: | x86_64 | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2016-02-29 16:02:04 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Corey Marthaler
2016-02-02 18:03:46 UTC
This isn't just a cluster mirror issue. It appears many single machine mirror failure cases are no longer passing in 6.8. I ran this same scenario on both 6.7 and 7.3 and both passed 10/10 iterations. However on 6.8 it fails pretty quickly. 2.6.32-615.el6.x86_64 lvm2-2.02.141-2.el6 BUILT: Wed Feb 10 07:49:03 CST 2016 lvm2-libs-2.02.141-2.el6 BUILT: Wed Feb 10 07:49:03 CST 2016 lvm2-cluster-2.02.141-2.el6 BUILT: Wed Feb 10 07:49:03 CST 2016 udev-147-2.66.el6 BUILT: Mon Jan 18 02:42:20 CST 2016 device-mapper-1.02.115-2.el6 BUILT: Wed Feb 10 07:49:03 CST 2016 device-mapper-libs-1.02.115-2.el6 BUILT: Wed Feb 10 07:49:03 CST 2016 device-mapper-event-1.02.115-2.el6 BUILT: Wed Feb 10 07:49:03 CST 2016 device-mapper-event-libs-1.02.115-2.el6 BUILT: Wed Feb 10 07:49:03 CST 2016 device-mapper-persistent-data-0.6.1-1.el6 BUILT: Wed Feb 10 05:09:45 CST 2016 cmirror-2.02.141-2.el6 BUILT: Wed Feb 10 07:49:03 CST 2016 Iteration 4.1 started at Thu Feb 11 15:08:55 CST 2016 ================================================================================ Scenario kill_primary_synced_2_legs: Kill primary leg of synced 2 leg mirror(s) ********* Mirror hash info for this scenario ********* * names: syncd_primary_2legs_1 syncd_primary_2legs_2 * sync: 1 * striped: 0 * leg devices: /dev/sdg1 /dev/sdb1 * log devices: /dev/sde1 * no MDA devices: * failpv(s): /dev/sdg1 * failnode(s): host-115.virt.lab.msp.redhat.com * lvmetad: 1 * leg fault policy: allocate * log fault policy: allocate ****************************************************** Creating mirror(s) on host-115.virt.lab.msp.redhat.com... host-115.virt.lab.msp.redhat.com: lvcreate --type mirror -m 1 -n syncd_primary_2legs_1 -L 500M helter_skelter /dev/sdg1:0-2400 /dev/sdb1:0-2400 /dev/sde1:0-150 host-115.virt.lab.msp.redhat.com: lvcreate --type mirror -m 1 -n syncd_primary_2legs_2 -L 500M helter_skelter /dev/sdg1:0-2400 /dev/sdb1:0-2400 /dev/sde1:0-150 Current mirror/raid device structure(s): LV Attr LSize Cpy%Sync Devices syncd_primary_2legs_1 mwi-a-m--- 500.00m 15.20 syncd_primary_2legs_1_mimage_0(0),syncd_primary_2legs_1_mimage_1(0) [syncd_primary_2legs_1_mimage_0] Iwi-aom--- 500.00m /dev/sdg1(0) [syncd_primary_2legs_1_mimage_1] Iwi-aom--- 500.00m /dev/sdb1(0) [syncd_primary_2legs_1_mlog] lwi-aom--- 4.00m /dev/sde1(0) syncd_primary_2legs_2 mwi-a-m--- 500.00m 0.80 syncd_primary_2legs_2_mimage_0(0),syncd_primary_2legs_2_mimage_1(0) [syncd_primary_2legs_2_mimage_0] Iwi-aom--- 500.00m /dev/sdg1(125) [syncd_primary_2legs_2_mimage_1] Iwi-aom--- 500.00m /dev/sdb1(125) [syncd_primary_2legs_2_mlog] lwi-aom--- 4.00m /dev/sde1(1) Waiting until all mirror|raid volumes become fully syncd... 1/2 mirror(s) are fully synced: ( 100.00% 94.70% ) 2/2 mirror(s) are fully synced: ( 100.00% 100.00% ) Creating ext on top of mirror(s) on host-115.virt.lab.msp.redhat.com... mke2fs 1.41.12 (17-May-2010) mke2fs 1.41.12 (17-May-2010) Mounting mirrored ext filesystems on host-115.virt.lab.msp.redhat.com... PV=/dev/sdg1 syncd_primary_2legs_1_mimage_0: 5.1 syncd_primary_2legs_2_mimage_0: 5.1 PV=/dev/sdg1 syncd_primary_2legs_1_mimage_0: 5.1 syncd_primary_2legs_2_mimage_0: 5.1 Writing verification files (checkit) to mirror(s) on... ---- host-115.virt.lab.msp.redhat.com ---- Sleeping 15 seconds to get some outsanding I/O locks before the failure Verifying files (checkit) on mirror(s) on... ---- host-115.virt.lab.msp.redhat.com ---- Disabling device sdg on host-115.virt.lab.msp.redhat.comrescan device... /dev/sdg1: read failed after 0 of 512 at 16104947712: Input/output error /dev/sdg1: read failed after 0 of 2048 at 0: Input/output error Getting recovery check start time from /var/log/messages: Feb 11 15:10 Attempting I/O to cause mirror down conversion(s) on host-115.virt.lab.msp.redhat.com dd if=/dev/zero of=/mnt/syncd_primary_2legs_1/ddfile count=10 bs=4M 10+0 records in 10+0 records out 41943040 bytes (42 MB) copied, 0.225524 s, 186 MB/s dd if=/dev/zero of=/mnt/syncd_primary_2legs_2/ddfile count=10 bs=4M 10+0 records in 10+0 records out 41943040 bytes (42 MB) copied, 0.337796 s, 124 MB/s Verifying current sanity of lvm after the failure Current mirror/raid device structure(s): /dev/sdg1: open failed: No such device or address Device /dev/sdg1 has size of 0 sectors which is smaller than corresponding PV size of 31455207 sectors. Was device resized? One or more devices used as PVs in VG helter_skelter have changed sizes. LV Attr LSize Cpy%Sync Devices syncd_primary_2legs_1 mwi-aom--- 500.00m 100.00 syncd_primary_2legs_1_mimage_0(0),syncd_primary_2legs_1_mimage_1(0) [syncd_primary_2legs_1_mimage_0] iwi-aom--- 500.00m /dev/sdb1(0) [syncd_primary_2legs_1_mimage_1] iwi-aom--- 500.00m /dev/sdc1(0) [syncd_primary_2legs_1_mlog] lwi-aom--- 4.00m /dev/sdf1(0) syncd_primary_2legs_2 mwi-aom--- 500.00m 100.00 syncd_primary_2legs_2_mimage_0(0),syncd_primary_2legs_2_mimage_1(0) [syncd_primary_2legs_2_mimage_0] iwi-aom--- 500.00m /dev/sdb1(125) [syncd_primary_2legs_2_mimage_1] iwi-aom--- 500.00m /dev/sdc1(125) Verifying FAILED device /dev/sdg1 is *NOT* in the volume(s) /dev/sdg1: open failed: No such device or address Device /dev/sdg1 has size of 0 sectors which is smaller than corresponding PV size of 31455207 sectors. Was device resized? One or more devices used as PVs in VG helter_skelter have changed sizes. olog: 1 Verifying LOG device(s) /dev/sde1 *ARE* in the mirror(s) /dev/sdg1: open failed: No such device or address Device /dev/sdg1 has size of 0 sectors which is smaller than corresponding PV size of 31455207 sectors. Was device resized? One or more devices used as PVs in VG helter_skelter have changed sizes. log device /dev/sde1 should still be present on host-115.virt.lab.msp.redhat.com seaching for *any* log after this failure... /dev/sdg1: open failed: No such device or address Device /dev/sdg1 has size of 0 sectors which is smaller than corresponding PV size of 31455207 sectors. Was device resized? One or more devices used as PVs in VG helter_skelter have changed sizes. /dev/sdg1: open failed: No such device or address Device /dev/sdg1 has size of 0 sectors which is smaller than corresponding PV size of 31455207 sectors. Was device resized? One or more devices used as PVs in VG helter_skelter have changed sizes. NO LOG WAS FOUND for syncd_primary_2legs_2 on host-115.virt.lab.msp.redhat.com [root@host-115 ~]# lvs -a -o +devices /dev/sdg1: open failed: No such device or address Device /dev/sdg1 has size of 0 sectors which is smaller than corresponding PV size of 31455207 sectors. Was device resized? One or more devices used as PVs in VG helter_skelter have changed sizes. LV VG Attr LSize Log Cpy%Sync Devices syncd_primary_2legs_1 helter_skelter mwi-aom--- 500.00m [syncd_primary_2legs_1_mlog] 100.00 syncd_primary_2legs_1_mimage_0(0),syncd_primary_2legs_1_mimage_1(0) [syncd_primary_2legs_1_mimage_0] helter_skelter iwi-aom--- 500.00m /dev/sdb1(0) [syncd_primary_2legs_1_mimage_1] helter_skelter iwi-aom--- 500.00m /dev/sdc1(0) [syncd_primary_2legs_1_mlog] helter_skelter lwi-aom--- 4.00m /dev/sdf1(0) syncd_primary_2legs_2 helter_skelter mwi-aom--- 500.00m 100.00 syncd_primary_2legs_2_mimage_0(0),syncd_primary_2legs_2_mimage_1(0) [syncd_primary_2legs_2_mimage_0] helter_skelter iwi-aom--- 500.00m /dev/sdb1(125) [syncd_primary_2legs_2_mimage_1] helter_skelter iwi-aom--- 500.00m /dev/sdc1(125) Created attachment 1123307 [details]
log from host-115
Another 6.8 failure, same test case: ================================================================================ Iteration 1.1 started at Thu Feb 11 16:46:54 CST 2016 ================================================================================ Scenario kill_primary_synced_2_legs: Kill primary leg of synced 2 leg mirror(s) ********* Mirror hash info for this scenario ********* * names: syncd_primary_2legs_1 syncd_primary_2legs_2 * sync: 1 * striped: 0 * leg devices: /dev/sde1 /dev/sdb1 * log devices: /dev/sdc1 * no MDA devices: * failpv(s): /dev/sde1 * failnode(s): host-116.virt.lab.msp.redhat.com * lvmetad: 1 * leg fault policy: allocate * log fault policy: allocate ****************************************************** Creating mirror(s) on host-116.virt.lab.msp.redhat.com... host-116.virt.lab.msp.redhat.com: lvcreate --type mirror -m 1 -n syncd_primary_2legs_1 -L 500M helter_skelter /dev/sde1:0-2400 /dev/sdb1:0-2400 /dev/sdc1:0-150 host-116.virt.lab.msp.redhat.com: lvcreate --type mirror -m 1 -n syncd_primary_2legs_2 -L 500M helter_skelter /dev/sde1:0-2400 /dev/sdb1:0-2400 /dev/sdc1:0-150 Current mirror/raid device structure(s): LV Attr LSize Cpy%Sync Devices syncd_primary_2legs_1 mwi-a-m--- 500.00m 52.80 syncd_primary_2legs_1_mimage_0(0),syncd_primary_2legs_1_mimage_1(0) [syncd_primary_2legs_1_mimage_0] Iwi-aom--- 500.00m /dev/sde1(0) [syncd_primary_2legs_1_mimage_1] Iwi-aom--- 500.00m /dev/sdb1(0) [syncd_primary_2legs_1_mlog] lwi-aom--- 4.00m /dev/sdc1(0) syncd_primary_2legs_2 mwi-a-m--- 500.00m 5.60 syncd_primary_2legs_2_mimage_0(0),syncd_primary_2legs_2_mimage_1(0) [syncd_primary_2legs_2_mimage_0] Iwi-aom--- 500.00m /dev/sde1(125) [syncd_primary_2legs_2_mimage_1] Iwi-aom--- 500.00m /dev/sdb1(125) [syncd_primary_2legs_2_mlog] lwi-aom--- 4.00m /dev/sdc1(1) Waiting until all mirror|raid volumes become fully syncd... 2/2 mirror(s) are fully synced: ( 100.00% 100.00% ) Creating ext on top of mirror(s) on host-116.virt.lab.msp.redhat.com... mke2fs 1.41.12 (17-May-2010) mke2fs 1.41.12 (17-May-2010) Mounting mirrored ext filesystems on host-116.virt.lab.msp.redhat.com... PV=/dev/sde1 syncd_primary_2legs_1_mimage_0: 5.1 syncd_primary_2legs_2_mimage_0: 5.1 PV=/dev/sde1 syncd_primary_2legs_1_mimage_0: 5.1 syncd_primary_2legs_2_mimage_0: 5.1 Writing verification files (checkit) to mirror(s) on... ---- host-116.virt.lab.msp.redhat.com ---- Sleeping 15 seconds to get some outsanding I/O locks before the failure Verifying files (checkit) on mirror(s) on... ---- host-116.virt.lab.msp.redhat.com ---- Disabling device sde on host-116.virt.lab.msp.redhat.comrescan device... <fail name="host-116.virt.lab.msp.redhat.com_syncd_primary_2legs_2" pid="25029" time="Thu Feb 11 16:48:05 2016" type="cmd" duration="30" ec="1" /> ALL STOP! /dev/sde1: read failed after 0 of 512 at 16104947712: Input/output error /dev/sde1: read failed after 0 of 512 at 4096: Input/output error /dev/sde1: read failed after 0 of 2048 at 0: Input/output error <stop name="host-116.virt.lab.msp.redhat.com_syncd_primary_2legs_1" pid="25028" time="Thu Feb 11 16:48:08 2016" type="cmd" duration="33" signal="2" /> Getting recovery check start time from /var/log/messages: Feb 11 16:48 Attempting I/O to cause mirror down conversion(s) on host-116.virt.lab.msp.redhat.com dd if=/dev/zero of=/mnt/syncd_primary_2legs_1/ddfile count=10 bs=4M 10+0 records in 10+0 records out 41943040 bytes (42 MB) copied, 0.127945 s, 328 MB/s dd if=/dev/zero of=/mnt/syncd_primary_2legs_2/ddfile count=10 bs=4M 10+0 records in 10+0 records out 41943040 bytes (42 MB) copied, 0.166799 s, 251 MB/s Verifying current sanity of lvm after the failure Current mirror/raid device structure(s): LV Attr LSize Cpy%Sync Devices syncd_primary_2legs_1 mwi-aom--- 500.00m 0.00 syncd_primary_2legs_1_mimage_0(0),syncd_primary_2legs_1_mimage_1(0) [syncd_primary_2legs_1_mimage_0] Iwi-aom--- 500.00m /dev/sdb1(0) [syncd_primary_2legs_1_mimage_1] Iwi-aom--- 500.00m /dev/sde1(0) [syncd_primary_2legs_1_mlog] lwi-aom--- 4.00m /dev/sda1(1) syncd_primary_2legs_2 mwi-aom--- 500.00m 100.00 syncd_primary_2legs_2_mimage_0(0),syncd_primary_2legs_2_mimage_1(0) [syncd_primary_2legs_2_mimage_0] iwi-aom--- 500.00m /dev/sdb1(125) [syncd_primary_2legs_2_mimage_1] iwi-aom--- 500.00m /dev/sdd1(0) [syncd_primary_2legs_2_mlog] lwi-aom--- 4.00m /dev/sda1(0) Verifying FAILED device /dev/sde1 is *NOT* in the volume(s) failed device /dev/sde1 should no longer be in volume on host-116.virt.lab.msp.redhat.com Created attachment 1123308 [details]
log from host-116
Bug 1307111 is likely related to this one. Closing this bug as it hasn't been seen with the latest rpms. The issue that mirror failure test scenarios lead to is bug 1307111. |