Bug 1304045
| Summary: | type mirror no longer able to survive device failure | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Corey Marthaler <cmarthal> | ||||||
| Component: | lvm2 | Assignee: | Heinz Mauelshagen <heinzm> | ||||||
| lvm2 sub component: | Mirroring and RAID (RHEL6) | QA Contact: | cluster-qe <cluster-qe> | ||||||
| Status: | CLOSED WORKSFORME | Docs Contact: | |||||||
| Severity: | high | ||||||||
| Priority: | unspecified | CC: | agk, heinzm, jbrassow, msnitzer, prajnoha, prockai, zkabelac | ||||||
| Version: | 6.8 | Keywords: | Regression, TestBlocker | ||||||
| Target Milestone: | rc | ||||||||
| Target Release: | --- | ||||||||
| Hardware: | x86_64 | ||||||||
| OS: | Linux | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2016-02-29 16:02:04 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Attachments: |
|
||||||||
This isn't just a cluster mirror issue. It appears many single machine mirror failure cases are no longer passing in 6.8.
I ran this same scenario on both 6.7 and 7.3 and both passed 10/10 iterations. However on 6.8 it fails pretty quickly.
2.6.32-615.el6.x86_64
lvm2-2.02.141-2.el6 BUILT: Wed Feb 10 07:49:03 CST 2016
lvm2-libs-2.02.141-2.el6 BUILT: Wed Feb 10 07:49:03 CST 2016
lvm2-cluster-2.02.141-2.el6 BUILT: Wed Feb 10 07:49:03 CST 2016
udev-147-2.66.el6 BUILT: Mon Jan 18 02:42:20 CST 2016
device-mapper-1.02.115-2.el6 BUILT: Wed Feb 10 07:49:03 CST 2016
device-mapper-libs-1.02.115-2.el6 BUILT: Wed Feb 10 07:49:03 CST 2016
device-mapper-event-1.02.115-2.el6 BUILT: Wed Feb 10 07:49:03 CST 2016
device-mapper-event-libs-1.02.115-2.el6 BUILT: Wed Feb 10 07:49:03 CST 2016
device-mapper-persistent-data-0.6.1-1.el6 BUILT: Wed Feb 10 05:09:45 CST 2016
cmirror-2.02.141-2.el6 BUILT: Wed Feb 10 07:49:03 CST 2016
Iteration 4.1 started at Thu Feb 11 15:08:55 CST 2016
================================================================================
Scenario kill_primary_synced_2_legs: Kill primary leg of synced 2 leg mirror(s)
********* Mirror hash info for this scenario *********
* names: syncd_primary_2legs_1 syncd_primary_2legs_2
* sync: 1
* striped: 0
* leg devices: /dev/sdg1 /dev/sdb1
* log devices: /dev/sde1
* no MDA devices:
* failpv(s): /dev/sdg1
* failnode(s): host-115.virt.lab.msp.redhat.com
* lvmetad: 1
* leg fault policy: allocate
* log fault policy: allocate
******************************************************
Creating mirror(s) on host-115.virt.lab.msp.redhat.com...
host-115.virt.lab.msp.redhat.com: lvcreate --type mirror -m 1 -n syncd_primary_2legs_1 -L 500M helter_skelter /dev/sdg1:0-2400 /dev/sdb1:0-2400 /dev/sde1:0-150
host-115.virt.lab.msp.redhat.com: lvcreate --type mirror -m 1 -n syncd_primary_2legs_2 -L 500M helter_skelter /dev/sdg1:0-2400 /dev/sdb1:0-2400 /dev/sde1:0-150
Current mirror/raid device structure(s):
LV Attr LSize Cpy%Sync Devices
syncd_primary_2legs_1 mwi-a-m--- 500.00m 15.20 syncd_primary_2legs_1_mimage_0(0),syncd_primary_2legs_1_mimage_1(0)
[syncd_primary_2legs_1_mimage_0] Iwi-aom--- 500.00m /dev/sdg1(0)
[syncd_primary_2legs_1_mimage_1] Iwi-aom--- 500.00m /dev/sdb1(0)
[syncd_primary_2legs_1_mlog] lwi-aom--- 4.00m /dev/sde1(0)
syncd_primary_2legs_2 mwi-a-m--- 500.00m 0.80 syncd_primary_2legs_2_mimage_0(0),syncd_primary_2legs_2_mimage_1(0)
[syncd_primary_2legs_2_mimage_0] Iwi-aom--- 500.00m /dev/sdg1(125)
[syncd_primary_2legs_2_mimage_1] Iwi-aom--- 500.00m /dev/sdb1(125)
[syncd_primary_2legs_2_mlog] lwi-aom--- 4.00m /dev/sde1(1)
Waiting until all mirror|raid volumes become fully syncd...
1/2 mirror(s) are fully synced: ( 100.00% 94.70% )
2/2 mirror(s) are fully synced: ( 100.00% 100.00% )
Creating ext on top of mirror(s) on host-115.virt.lab.msp.redhat.com...
mke2fs 1.41.12 (17-May-2010)
mke2fs 1.41.12 (17-May-2010)
Mounting mirrored ext filesystems on host-115.virt.lab.msp.redhat.com...
PV=/dev/sdg1
syncd_primary_2legs_1_mimage_0: 5.1
syncd_primary_2legs_2_mimage_0: 5.1
PV=/dev/sdg1
syncd_primary_2legs_1_mimage_0: 5.1
syncd_primary_2legs_2_mimage_0: 5.1
Writing verification files (checkit) to mirror(s) on...
---- host-115.virt.lab.msp.redhat.com ----
Sleeping 15 seconds to get some outsanding I/O locks before the failure
Verifying files (checkit) on mirror(s) on...
---- host-115.virt.lab.msp.redhat.com ----
Disabling device sdg on host-115.virt.lab.msp.redhat.comrescan device...
/dev/sdg1: read failed after 0 of 512 at 16104947712: Input/output error
/dev/sdg1: read failed after 0 of 2048 at 0: Input/output error
Getting recovery check start time from /var/log/messages: Feb 11 15:10
Attempting I/O to cause mirror down conversion(s) on host-115.virt.lab.msp.redhat.com
dd if=/dev/zero of=/mnt/syncd_primary_2legs_1/ddfile count=10 bs=4M
10+0 records in
10+0 records out
41943040 bytes (42 MB) copied, 0.225524 s, 186 MB/s
dd if=/dev/zero of=/mnt/syncd_primary_2legs_2/ddfile count=10 bs=4M
10+0 records in
10+0 records out
41943040 bytes (42 MB) copied, 0.337796 s, 124 MB/s
Verifying current sanity of lvm after the failure
Current mirror/raid device structure(s):
/dev/sdg1: open failed: No such device or address
Device /dev/sdg1 has size of 0 sectors which is smaller than corresponding PV size of 31455207 sectors. Was device resized?
One or more devices used as PVs in VG helter_skelter have changed sizes.
LV Attr LSize Cpy%Sync Devices
syncd_primary_2legs_1 mwi-aom--- 500.00m 100.00 syncd_primary_2legs_1_mimage_0(0),syncd_primary_2legs_1_mimage_1(0)
[syncd_primary_2legs_1_mimage_0] iwi-aom--- 500.00m /dev/sdb1(0)
[syncd_primary_2legs_1_mimage_1] iwi-aom--- 500.00m /dev/sdc1(0)
[syncd_primary_2legs_1_mlog] lwi-aom--- 4.00m /dev/sdf1(0)
syncd_primary_2legs_2 mwi-aom--- 500.00m 100.00 syncd_primary_2legs_2_mimage_0(0),syncd_primary_2legs_2_mimage_1(0)
[syncd_primary_2legs_2_mimage_0] iwi-aom--- 500.00m /dev/sdb1(125)
[syncd_primary_2legs_2_mimage_1] iwi-aom--- 500.00m /dev/sdc1(125)
Verifying FAILED device /dev/sdg1 is *NOT* in the volume(s)
/dev/sdg1: open failed: No such device or address
Device /dev/sdg1 has size of 0 sectors which is smaller than corresponding PV size of 31455207 sectors. Was device resized?
One or more devices used as PVs in VG helter_skelter have changed sizes.
olog: 1
Verifying LOG device(s) /dev/sde1 *ARE* in the mirror(s)
/dev/sdg1: open failed: No such device or address
Device /dev/sdg1 has size of 0 sectors which is smaller than corresponding PV size of 31455207 sectors. Was device resized?
One or more devices used as PVs in VG helter_skelter have changed sizes.
log device /dev/sde1 should still be present on host-115.virt.lab.msp.redhat.com
seaching for *any* log after this failure...
/dev/sdg1: open failed: No such device or address
Device /dev/sdg1 has size of 0 sectors which is smaller than corresponding PV size of 31455207 sectors. Was device resized?
One or more devices used as PVs in VG helter_skelter have changed sizes.
/dev/sdg1: open failed: No such device or address
Device /dev/sdg1 has size of 0 sectors which is smaller than corresponding PV size of 31455207 sectors. Was device resized?
One or more devices used as PVs in VG helter_skelter have changed sizes.
NO LOG WAS FOUND for syncd_primary_2legs_2 on host-115.virt.lab.msp.redhat.com
[root@host-115 ~]# lvs -a -o +devices
/dev/sdg1: open failed: No such device or address
Device /dev/sdg1 has size of 0 sectors which is smaller than corresponding PV size of 31455207 sectors. Was device resized?
One or more devices used as PVs in VG helter_skelter have changed sizes.
LV VG Attr LSize Log Cpy%Sync Devices
syncd_primary_2legs_1 helter_skelter mwi-aom--- 500.00m [syncd_primary_2legs_1_mlog] 100.00 syncd_primary_2legs_1_mimage_0(0),syncd_primary_2legs_1_mimage_1(0)
[syncd_primary_2legs_1_mimage_0] helter_skelter iwi-aom--- 500.00m /dev/sdb1(0)
[syncd_primary_2legs_1_mimage_1] helter_skelter iwi-aom--- 500.00m /dev/sdc1(0)
[syncd_primary_2legs_1_mlog] helter_skelter lwi-aom--- 4.00m /dev/sdf1(0)
syncd_primary_2legs_2 helter_skelter mwi-aom--- 500.00m 100.00 syncd_primary_2legs_2_mimage_0(0),syncd_primary_2legs_2_mimage_1(0)
[syncd_primary_2legs_2_mimage_0] helter_skelter iwi-aom--- 500.00m /dev/sdb1(125)
[syncd_primary_2legs_2_mimage_1] helter_skelter iwi-aom--- 500.00m /dev/sdc1(125)
Created attachment 1123307 [details]
log from host-115
Another 6.8 failure, same test case:
================================================================================
Iteration 1.1 started at Thu Feb 11 16:46:54 CST 2016
================================================================================
Scenario kill_primary_synced_2_legs: Kill primary leg of synced 2 leg mirror(s)
********* Mirror hash info for this scenario *********
* names: syncd_primary_2legs_1 syncd_primary_2legs_2
* sync: 1
* striped: 0
* leg devices: /dev/sde1 /dev/sdb1
* log devices: /dev/sdc1
* no MDA devices:
* failpv(s): /dev/sde1
* failnode(s): host-116.virt.lab.msp.redhat.com
* lvmetad: 1
* leg fault policy: allocate
* log fault policy: allocate
******************************************************
Creating mirror(s) on host-116.virt.lab.msp.redhat.com...
host-116.virt.lab.msp.redhat.com: lvcreate --type mirror -m 1 -n syncd_primary_2legs_1 -L 500M helter_skelter /dev/sde1:0-2400 /dev/sdb1:0-2400 /dev/sdc1:0-150
host-116.virt.lab.msp.redhat.com: lvcreate --type mirror -m 1 -n syncd_primary_2legs_2 -L 500M helter_skelter /dev/sde1:0-2400 /dev/sdb1:0-2400 /dev/sdc1:0-150
Current mirror/raid device structure(s):
LV Attr LSize Cpy%Sync Devices
syncd_primary_2legs_1 mwi-a-m--- 500.00m 52.80 syncd_primary_2legs_1_mimage_0(0),syncd_primary_2legs_1_mimage_1(0)
[syncd_primary_2legs_1_mimage_0] Iwi-aom--- 500.00m /dev/sde1(0)
[syncd_primary_2legs_1_mimage_1] Iwi-aom--- 500.00m /dev/sdb1(0)
[syncd_primary_2legs_1_mlog] lwi-aom--- 4.00m /dev/sdc1(0)
syncd_primary_2legs_2 mwi-a-m--- 500.00m 5.60 syncd_primary_2legs_2_mimage_0(0),syncd_primary_2legs_2_mimage_1(0)
[syncd_primary_2legs_2_mimage_0] Iwi-aom--- 500.00m /dev/sde1(125)
[syncd_primary_2legs_2_mimage_1] Iwi-aom--- 500.00m /dev/sdb1(125)
[syncd_primary_2legs_2_mlog] lwi-aom--- 4.00m /dev/sdc1(1)
Waiting until all mirror|raid volumes become fully syncd...
2/2 mirror(s) are fully synced: ( 100.00% 100.00% )
Creating ext on top of mirror(s) on host-116.virt.lab.msp.redhat.com...
mke2fs 1.41.12 (17-May-2010)
mke2fs 1.41.12 (17-May-2010)
Mounting mirrored ext filesystems on host-116.virt.lab.msp.redhat.com...
PV=/dev/sde1
syncd_primary_2legs_1_mimage_0: 5.1
syncd_primary_2legs_2_mimage_0: 5.1
PV=/dev/sde1
syncd_primary_2legs_1_mimage_0: 5.1
syncd_primary_2legs_2_mimage_0: 5.1
Writing verification files (checkit) to mirror(s) on...
---- host-116.virt.lab.msp.redhat.com ----
Sleeping 15 seconds to get some outsanding I/O locks before the failure
Verifying files (checkit) on mirror(s) on...
---- host-116.virt.lab.msp.redhat.com ----
Disabling device sde on host-116.virt.lab.msp.redhat.comrescan device...
<fail name="host-116.virt.lab.msp.redhat.com_syncd_primary_2legs_2" pid="25029" time="Thu Feb 11 16:48:05 2016" type="cmd" duration="30" ec="1" />
ALL STOP!
/dev/sde1: read failed after 0 of 512 at 16104947712: Input/output error
/dev/sde1: read failed after 0 of 512 at 4096: Input/output error
/dev/sde1: read failed after 0 of 2048 at 0: Input/output error
<stop name="host-116.virt.lab.msp.redhat.com_syncd_primary_2legs_1" pid="25028" time="Thu Feb 11 16:48:08 2016" type="cmd" duration="33" signal="2" />
Getting recovery check start time from /var/log/messages: Feb 11 16:48
Attempting I/O to cause mirror down conversion(s) on host-116.virt.lab.msp.redhat.com
dd if=/dev/zero of=/mnt/syncd_primary_2legs_1/ddfile count=10 bs=4M
10+0 records in
10+0 records out
41943040 bytes (42 MB) copied, 0.127945 s, 328 MB/s
dd if=/dev/zero of=/mnt/syncd_primary_2legs_2/ddfile count=10 bs=4M
10+0 records in
10+0 records out
41943040 bytes (42 MB) copied, 0.166799 s, 251 MB/s
Verifying current sanity of lvm after the failure
Current mirror/raid device structure(s):
LV Attr LSize Cpy%Sync Devices
syncd_primary_2legs_1 mwi-aom--- 500.00m 0.00 syncd_primary_2legs_1_mimage_0(0),syncd_primary_2legs_1_mimage_1(0)
[syncd_primary_2legs_1_mimage_0] Iwi-aom--- 500.00m /dev/sdb1(0)
[syncd_primary_2legs_1_mimage_1] Iwi-aom--- 500.00m /dev/sde1(0)
[syncd_primary_2legs_1_mlog] lwi-aom--- 4.00m /dev/sda1(1)
syncd_primary_2legs_2 mwi-aom--- 500.00m 100.00 syncd_primary_2legs_2_mimage_0(0),syncd_primary_2legs_2_mimage_1(0)
[syncd_primary_2legs_2_mimage_0] iwi-aom--- 500.00m /dev/sdb1(125)
[syncd_primary_2legs_2_mimage_1] iwi-aom--- 500.00m /dev/sdd1(0)
[syncd_primary_2legs_2_mlog] lwi-aom--- 4.00m /dev/sda1(0)
Verifying FAILED device /dev/sde1 is *NOT* in the volume(s)
failed device /dev/sde1 should no longer be in volume on host-116.virt.lab.msp.redhat.com
Created attachment 1123308 [details]
log from host-116
Bug 1307111 is likely related to this one. Closing this bug as it hasn't been seen with the latest rpms. The issue that mirror failure test scenarios lead to is bug 1307111. |
Description of problem: This is the most basic cmirror device failure case and it causes everything to go south. ./helter_skelter -l /usr/tests/sts-rhel6.8 -r /usr/tests/sts-rhel6.8 -R ../../../resource-STSRHTS3987.xml -i 1 -e kill_primary_synced_2_legs 8 disk(s) to be used: host-123=/dev/sdc /dev/sdh /dev/sdg /dev/sdf /dev/sda /dev/sde /dev/sdd /dev/sdb host-124=/dev/sdb /dev/sda /dev/sdh /dev/sdd /dev/sdg /dev/sde /dev/sdf /dev/sdc host-125=/dev/sda /dev/sde /dev/sdh /dev/sdf /dev/sdc /dev/sdg /dev/sdb /dev/sdd on host-123... dicing /dev/sdc into 1... dicing /dev/sdh into 1... dicing /dev/sdg into 1... dicing /dev/sdf into 1... dicing /dev/sda into 1... dicing /dev/sde into 1... dicing /dev/sdd into 1... dicing /dev/sdb into 1... re-reading disks on host-125... re-reading disks on host-123... re-reading disks on host-124... Zeroing out the new partitions.../dev/sdc1.../dev/sdh1.../dev/sdg1.../dev/sdf1.../dev/sda1.../dev/sde1.../dev/sdd1.../dev/sdb1... Reload the device cache on all nodes... creating lvm devices... Create 7 PV(s) for helter_skelter on host-123 Create VG helter_skelter on host-123 ================================================================================ Iteration 0.1 started at Tue Feb 2 11:48:34 CST 2016 ================================================================================ Scenario kill_primary_synced_2_legs: Kill primary leg of synced 2 leg mirror(s) ********* Mirror hash info for this scenario ********* * names: syncd_primary_2legs_1 * sync: 1 * striped: 0 * leg devices: /dev/sdh1 /dev/sdf1 * log devices: /dev/sdb1 * no MDA devices: * failpv(s): /dev/sdh1 * failnode(s): host-123 host-124 host-125 * lvmetad: 0 * leg fault policy: remove * log fault policy: allocate ****************************************************** Creating mirror(s) on host-123... host-123: lvcreate --type mirror -m 1 -n syncd_primary_2legs_1 -L 500M helter_skelter /dev/sdh1:0-2400 /dev/sdf1:0-2400 /dev/sdb1:0-150 Current mirror/raid device structure(s): LV Attr LSize Cpy%Sync Devices syncd_primary_2legs_1 mwi-a-m--- 500.00m 5.60 syncd_primary_2legs_1_mimage_0(0),syncd_primary_2legs_1_mimage_1(0) [syncd_primary_2legs_1_mimage_0] Iwi-aom--- 500.00m /dev/sdh1(0) [syncd_primary_2legs_1_mimage_1] Iwi-aom--- 500.00m /dev/sdf1(0) [syncd_primary_2legs_1_mlog] lwi-aom--- 4.00m /dev/sdb1(0) Waiting until all mirror|raid volumes become fully syncd... 0/1 mirror(s) are fully synced: ( 74.80% ) 1/1 mirror(s) are fully synced: ( 100.00% ) Creating gfs2 on top of mirror(s) on host-123... Mounting mirrored gfs2 filesystems on host-123... Mounting mirrored gfs2 filesystems on host-124... Mounting mirrored gfs2 filesystems on host-125... PV=/dev/sdh1 syncd_primary_2legs_1_mimage_0: 6 PV=/dev/sdh1 syncd_primary_2legs_1_mimage_0: 6 Writing verification files (checkit) to mirror(s) on... ---- host-123 ---- ---- host-124 ---- ---- host-125 ---- Sleeping 15 seconds to get some outsanding I/O locks before the failure Verifying files (checkit) on mirror(s) on... ---- host-123 ---- ---- host-124 ---- ---- host-125 ---- Disabling device sdh on host-123 Disabling device sda on host-124 Disabling device sde on host-125 Getting recovery check start time from /var/log/messages: Feb 2 11:50 Attempting I/O to cause mirror down conversion(s) on host-123 10+0 records in 10+0 records out 41943040 bytes (42 MB) copied, 0.272817 s, 154 MB/s Verifying current sanity of lvm after the failure Current mirror/raid device structure(s): Couldn't find device with uuid Vwn1jQ-ndo2-YhuJ-VaqX-bllI-a25v-oR2hbt. LV Attr LSize Cpy%Sync Devices syncd_primary_2legs_1 -wi-ao---- 500.00m /dev/sdf1(0) Verifying FAILED device /dev/sdh1 is *NOT* in the volume(s) Couldn't find device with uuid Vwn1jQ-ndo2-YhuJ-VaqX-bllI-a25v-oR2hbt. olog: 4 Verifying LOG device(s) /dev/sdb1 are *NOT* in the mirror/linear(s) Couldn't find device with uuid Vwn1jQ-ndo2-YhuJ-VaqX-bllI-a25v-oR2hbt. Verifying LEG device /dev/sdf1 *IS* in the volume(s) Couldn't find device with uuid Vwn1jQ-ndo2-YhuJ-VaqX-bllI-a25v-oR2hbt. verify the dm devices associated with /dev/sdh1 have been removed as expected Checking REMOVAL of syncd_primary_2legs_1_mimage_0 on: host-123 host-124 host-125 Verifying files (checkit) on mirror(s) on... ---- host-123 ---- ---- host-124 ---- Can not open ufyybfhhhrcmxqbbiuexuqrwvl: Input/output error <fail name="host-124_syncd_primary_2legs_1" pid="24757" time="Tue Feb 2 11:50:23 2016" type="cmd" duration="55" ec="1" /> ALL STOP! checkit write verify failed Feb 2 11:49:23 host-124 qarshd[3931]: Running cmdline: mount /dev/helter_skelter/syncd_primary_2legs_1 /mnt/syncd_primary_2legs_1 Feb 2 11:49:23 host-124 kernel: Slow work thread pool: Starting up Feb 2 11:49:23 host-124 kernel: Slow work thread pool: Ready Feb 2 11:49:23 host-124 kernel: GFS2 (built Jan 28 2016 16:50:00) installed Feb 2 11:49:23 host-124 kernel: GFS2: fsid=: Trying to join cluster "lock_dlm", "STSRHTS3987:gfs1" Feb 2 11:49:23 host-124 kernel: GFS2: fsid=STSRHTS3987:gfs1.1: Joined cluster. Now mounting FS... Feb 2 11:49:23 host-124 kernel: GFS2: fsid=STSRHTS3987:gfs1.1: jid=1, already locked for use Feb 2 11:49:23 host-124 kernel: GFS2: fsid=STSRHTS3987:gfs1.1: jid=1: Looking at journal... Feb 2 11:49:23 host-124 kernel: GFS2: fsid=STSRHTS3987:gfs1.1: jid=1: Done Feb 2 11:49:26 host-124 qarshd[3952]: Running cmdline: mkdir -p /mnt/syncd_primary_2legs_1/checkit Feb 2 11:49:27 host-124 qarshd[3954]: Running cmdline: /usr/tests/sts-rhel6.8/bin/checkit -w /mnt/syncd_primary_2legs_1/checkit -f /tmp/checkit_syncd_primary_2legs_1 -n 100 Feb 2 11:49:28 host-124 qarshd[3956]: Running cmdline: /usr/tests/sts-rhel6.8/bin/xiogen -f buffered -m random -s write,writev -t 100 -T 1000000 -F 1000000:/mnt/syncd_primary_2legs_1/host-124_load | /usr/tests/sts-rhel6.8/bin/xdoio -v Feb 2 11:49:51 host-124 qarshd[3962]: Running cmdline: /usr/tests/sts-rhel6.8/bin/checkit -w /mnt/syncd_primary_2legs_1/checkit -f /tmp/checkit_syncd_primary_2legs_1 -v Feb 2 11:49:58 host-124 qarshd[3964]: Running cmdline: echo offline > /sys/block/sda/device/state Feb 2 11:49:59 host-124 kernel: sd 2:0:0:1: rejecting I/O to offline device Feb 2 11:49:59 host-124 lvm[2846]: Primary mirror device 253:3 has failed (D). Feb 2 11:49:59 host-124 lvm[2846]: Device failure in helter_skelter-syncd_primary_2legs_1. Feb 2 11:49:59 host-124 lvm[2846]: /dev/sda1: read failed after 0 of 2048 at 0: Input/output error Feb 2 11:49:59 host-124 lvm[2846]: /dev/sda1: read failed after 0 of 512 at 16104947712: Input/output error Feb 2 11:49:59 host-124 lvm[2846]: /dev/sda1: read failed after 0 of 512 at 16105054208: Input/output error Feb 2 11:49:59 host-124 lvm[2846]: /dev/sda1: read failed after 0 of 512 at 0: Input/output error Feb 2 11:49:59 host-124 lvm[2846]: /dev/sda1: read failed after 0 of 512 at 4096: Input/output error Feb 2 11:49:59 host-124 lvm[2846]: Couldn't find device with uuid Vwn1jQ-ndo2-YhuJ-VaqX-bllI-a25v-oR2hbt. Feb 2 11:50:00 host-124 kernel: Buffer I/O error on device dm-5, logical block 25436 Feb 2 11:50:00 host-124 kernel: lost page write due to I/O error on dm-5 Feb 2 11:50:00 host-124 kernel: Buffer I/O error on device dm-5, logical block 25437 Feb 2 11:50:00 host-124 kernel: lost page write due to I/O error on dm-5 Feb 2 11:50:00 host-124 kernel: Buffer I/O error on device dm-5, logical block 25523 Feb 2 11:50:01 host-124 lvm[2846]: Mirror status: 1 of 2 images failed. Feb 2 11:50:12 host-124 qarshd[4017]: Running cmdline: dmsetup ls | grep syncd_primary_2legs_1_mimage_0 Feb 2 11:50:21 host-124 qarshd[4021]: Running cmdline: /usr/tests/sts-rhel6.8/bin/checkit -w /mnt/syncd_primary_2legs_1/checkit -f /tmp/checkit_syncd_primary_2legs_1 -v Feb 2 11:50:23 host-124 kernel: GFS2: fsid=STSRHTS3987:gfs1.1: fatal: I/O error Feb 2 11:50:23 host-124 kernel: GFS2: fsid=STSRHTS3987:gfs1.1: block = 32251 Feb 2 11:50:23 host-124 kernel: GFS2: fsid=STSRHTS3987:gfs1.1: function = gfs2_ail1_empty_one, file = fs/gfs2/log.c, line = 199 Feb 2 11:50:23 host-124 kernel: GFS2: fsid=STSRHTS3987:gfs1.1: about to withdraw this file system Feb 2 11:50:24 host-124 kernel: GFS2: fsid=STSRHTS3987:gfs1.1: telling LM to unmount Feb 2 11:50:24 host-124 kernel: GFS2: fsid=STSRHTS3987:gfs1.1: withdrawn Feb 2 11:50:24 host-124 kernel: Pid: 3948, comm: gfs2_logd Not tainted 2.6.32-610.el6.x86_64 #1 Feb 2 11:50:24 host-124 kernel: Call Trace: Feb 2 11:50:24 host-124 kernel: [<ffffffffa04b0fa8>] ? gfs2_lm_withdraw+0x128/0x160 [gfs2] Feb 2 11:50:24 host-124 kernel: [<ffffffff81065990>] ? __dequeue_entity+0x30/0x50 Feb 2 11:50:24 host-124 kernel: [<ffffffffa04b1020>] ? gfs2_io_error_bh_i+0x40/0x50 [gfs2] Feb 2 11:50:24 host-124 kernel: [<ffffffff8108f260>] ? process_timeout+0x0/0x10 Feb 2 11:50:24 host-124 kernel: [<ffffffffa0497a2c>] ? gfs2_ail1_empty+0x11c/0x1b0 [gfs2] Feb 2 11:50:24 host-124 kernel: [<ffffffffa0498f2c>] ? gfs2_logd+0x7c/0x140 [gfs2] Feb 2 11:50:24 host-124 kernel: [<ffffffffa0498eb0>] ? gfs2_logd+0x0/0x140 [gfs2] Feb 2 11:50:24 host-124 kernel: [<ffffffff810a662e>] ? kthread+0x9e/0xc0 Feb 2 11:50:24 host-124 kernel: [<ffffffffa045eff0>] ? process_send_sockets+0x0/0x260 [dlm] Feb 2 11:50:24 host-124 kernel: [<ffffffff8100c28a>] ? child_rip+0xa/0x20 Feb 2 11:50:24 host-124 kernel: [<ffffffff810a6590>] ? kthread+0x0/0xc0 Feb 2 11:50:24 host-124 kernel: [<ffffffff8100c280>] ? child_rip+0x0/0x20 Feb 2 11:52:35 host-124 kernel: INFO: task gfs2_logd:3948 blocked for more than 120 seconds. Feb 2 11:52:35 host-124 kernel: Not tainted 2.6.32-610.el6.x86_64 #1 Feb 2 11:52:35 host-124 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Feb 2 11:52:35 host-124 kernel: gfs2_logd D 0000000000000000 0 3948 2 0x00000080 Feb 2 11:52:35 host-124 kernel: ffff88003c777c10 0000000000000046 ffff880000000000 0000abd9b1dfc40f Feb 2 11:52:35 host-124 kernel: ffff88003c777bd0 ffff88003c78d2f0 000000000003c33e ffffffffa94f14fd Feb 2 11:52:35 host-124 kernel: 00000000207aa707 ffffffff81aa8340 ffff880037af5ad8 ffff88003c777fd8 Feb 2 11:52:35 host-124 kernel: Call Trace: Feb 2 11:52:35 host-124 kernel: [<ffffffff8112e2a0>] ? sync_page+0x0/0x50 Feb 2 11:52:35 host-124 kernel: [<ffffffff8112e2a0>] ? sync_page+0x0/0x50 Feb 2 11:52:35 host-124 kernel: [<ffffffff81545853>] io_schedule+0x73/0xc0 Feb 2 11:52:35 host-124 kernel: [<ffffffff8112e2dd>] sync_page+0x3d/0x50 Feb 2 11:52:35 host-124 kernel: [<ffffffff8154633f>] __wait_on_bit+0x5f/0x90 Feb 2 11:52:35 host-124 kernel: [<ffffffff8112e513>] wait_on_page_bit+0x73/0x80 Feb 2 11:52:35 host-124 kernel: [<ffffffff810a6b40>] ? wake_bit_function+0x0/0x50 Feb 2 11:52:35 host-124 kernel: [<ffffffff81144635>] ? pagevec_lookup_tag+0x25/0x40 Feb 2 11:52:35 host-124 kernel: [<ffffffff8112e93b>] wait_on_page_writeback_range+0xfb/0x190 Feb 2 11:52:35 host-124 kernel: [<ffffffff8112e9ff>] filemap_fdatawait+0x2f/0x40 Feb 2 11:52:35 host-124 kernel: [<ffffffffa0498b09>] gfs2_log_flush+0x259/0x600 [gfs2] Feb 2 11:52:35 host-124 kernel: [<ffffffffa0498f89>] gfs2_logd+0xd9/0x140 [gfs2] Feb 2 11:52:35 host-124 kernel: [<ffffffffa0498eb0>] ? gfs2_logd+0x0/0x140 [gfs2] Feb 2 11:52:35 host-124 kernel: [<ffffffff810a662e>] kthread+0x9e/0xc0 Feb 2 11:52:35 host-124 kernel: [<ffffffffa045eff0>] ? process_send_sockets+0x0/0x260 [dlm] Feb 2 11:52:35 host-124 kernel: [<ffffffff8100c28a>] child_rip+0xa/0x20 Feb 2 11:52:35 host-124 kernel: [<ffffffff810a6590>] ? kthread+0x0/0xc0 Feb 2 11:52:35 host-124 kernel: [<ffffffff8100c280>] ? child_rip+0x0/0x20 [root@host-124 ~]# lvs [...] stat("/dev/helter_skelter/syncd_primary_2legs_1", {st_mode=S_IFBLK|0660, st_rdev=makedev(253, 5), ...}) = 0 open("/dev/helter_skelter/syncd_primary_2legs_1", O_RDONLY|O_DIRECT|O_NOATIME) = 5 fstat(5, {st_mode=S_IFBLK|0660, st_rdev=makedev(253, 5), ...}) = 0 ioctl(5, BLKBSZGET, 4096) = 0 ioctl(5, BLKPBSZGET, 512) = 0 lseek(5, 0, SEEK_SET) = 0 read(5, Version-Release number of selected component (if applicable): 2.6.32-610.el6.x86_64 lvm2-2.02.140-3.el6 BUILT: Thu Jan 21 05:40:10 CST 2016 lvm2-libs-2.02.140-3.el6 BUILT: Thu Jan 21 05:40:10 CST 2016 lvm2-cluster-2.02.140-3.el6 BUILT: Thu Jan 21 05:40:10 CST 2016 udev-147-2.69.el6 BUILT: Thu Jan 28 08:41:45 CST 2016 device-mapper-1.02.114-3.el6 BUILT: Thu Jan 21 05:40:10 CST 2016 device-mapper-libs-1.02.114-3.el6 BUILT: Thu Jan 21 05:40:10 CST 2016 device-mapper-event-1.02.114-3.el6 BUILT: Thu Jan 21 05:40:10 CST 2016 device-mapper-event-libs-1.02.114-3.el6 BUILT: Thu Jan 21 05:40:10 CST 2016 device-mapper-persistent-data-0.6.0-2.el6 BUILT: Thu Jan 21 02:40:25 CST 2016 cmirror-2.02.140-3.el6 BUILT: Thu Jan 21 05:40:10 CST 2016 How reproducible: Everytime