Bug 1281525
| Summary: | external origin raid volumes are not monitored | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Corey Marthaler <cmarthal> |
| Component: | lvm2 | Assignee: | Zdenek Kabelac <zkabelac> |
| lvm2 sub component: | Thin Provisioning | QA Contact: | cluster-qe <cluster-qe> |
| Status: | CLOSED ERRATA | Docs Contact: | |
| Severity: | medium | ||
| Priority: | unspecified | CC: | agk, heinzm, jbrassow, mcsontos, msnitzer, prajnoha, prockai, rbednar, thornber, zkabelac |
| Version: | 7.2 | ||
| Target Milestone: | rc | ||
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | lvm2-2.02.175-3.el7 | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2018-04-10 15:18:32 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 1469559 | ||
|
Description
Corey Marthaler
2015-11-12 16:28:05 UTC
[root@host-109 ~]# grep allocate /etc/lvm/lvm.conf
raid_fault_policy = "allocate"
A 'pvscan --cache' is required to even detect a failure...
Disabling device sdc on host-109.virt.lab.msp.redhat.com
Getting recovery check start time from /var/log/messages: Nov 12 09:41
Attempting I/O to cause mirror down conversion(s) on host-109.virt.lab.msp.redhat.com
dd if=/dev/zero of=/mnt/synced_primary_raid1_2legs_1/ddfile count=10 bs=4M
10+0 records in
10+0 records out
41943040 bytes (42 MB) copied, 0.0613936 s, 683 MB/s
rescan PVs
/dev/sdc1: read failed after 0 of 4096 at 26838958080: Input/output error
/dev/sdc1: read failed after 0 of 4096 at 26839048192: Input/output error
/dev/sdc1: read failed after 0 of 4096 at 0: Input/output error
/dev/sdc1: read failed after 0 of 4096 at 4096: Input/output error
Current mirror/raid device structure(s):
WARNING: Device for PV fj7JLI-3UcH-aFLI-DKKy-90hZ-342f-9zwEsg not found or rejected by a filter.
LV Attr LSize Cpy%Sync Devices
POOL twi-aotz-- 500.00m POOL_tdata(0)
[POOL_tdata] Twi-ao---- 500.00m /dev/sdd1(127)
[POOL_tmeta] ewi-ao---- 4.00m /dev/sdd1(252)
[lvol0_pmspare] ewi------- 4.00m /dev/sdd1(126)
snap1_synced_primary_raid1_2legs_1 Vwi-a-tzp- 500.00m
snap2_synced_primary_raid1_2legs_1 Vwi-a-tzp- 500.00m
snap3_synced_primary_raid1_2legs_1 Vwi-a-tzp- 500.00m
synced_primary_raid1_2legs_1 Vwi-aotzp- 500.00m
synced_primary_raid1_2legs_1_extorig ori---r-p- 500.00m synced_primary_raid1_2legs_1_extorig_rimage_0(0),synced_primary_raid1_2legs_1_extorig_rimage_1(0),synced_primary_raid1_2legs_1_extorig_rimage_2(0)
[synced_primary_raid1_2legs_1_extorig_rimage_0] Iwi-aor-p- 500.00m unknown device(1)
[synced_primary_raid1_2legs_1_extorig_rimage_1] Iwi-aor-r- 500.00m /dev/sdd1(1)
[synced_primary_raid1_2legs_1_extorig_rimage_2] Iwi-aor-r- 500.00m /dev/sdg1(1)
[synced_primary_raid1_2legs_1_extorig_rmeta_0] ewi-aor-p- 4.00m unknown device(0)
[synced_primary_raid1_2legs_1_extorig_rmeta_1] ewi-aor-r- 4.00m /dev/sdd1(0)
[synced_primary_raid1_2legs_1_extorig_rmeta_2] ewi-aor-r- 4.00m /dev/sdg1(0)
(ALLOCATE POLICY) there should not be an 'unknown' device associated with synced_primary_raid1_2legs_1_extorig_rimage_0 on host-109.virt.lab.msp.redhat.com
synced_primary_raid1_2legs_1_extorig synced_primary_raid1_2legs_1_extorig_rimage_0(0),synced_primary_raid1_2legs_1_extorig_rimage_1(0),synced_primary_raid1_2legs_1_extorig_rimage_2(0)
[synced_primary_raid1_2legs_1_extorig_rimage_0] unknown device(1)
similar to bug 1280450, I suspect there is just something wrong with evaluating the type of LV (which is why sync % isn't printed right and why the LV isn't being monitored). Fixed by upstream commit: https://www.redhat.com/archives/lvm-devel/2017-October/msg00045.html Marking verified with latest rpms. External raid origin volumes and are monitored and repaired according to current raid fault policy. Also I was not able to observe 'unknown' devices any more after triggering a repair as shown in Comment 1. =============SCENARIO============= virt-374: pvcreate /dev/sdi1 /dev/sdb1 /dev/sdg1 /dev/sde1 /dev/sdj1 /dev/sdh1 /dev/sdc1 virt-374: vgcreate black_bird /dev/sdi1 /dev/sdb1 /dev/sdg1 /dev/sde1 /dev/sdj1 /dev/sdh1 /dev/sdc1 Enabling raid allocate fault policies on: virt-374 ================================================================================ Iteration 0.1 started at Tue Dec 12 11:15:54 CET 2017 ================================================================================ Scenario kill_primary_synced_raid1_2legs: Kill primary leg of synced 2 leg raid1 volume(s) ********* RAID hash info for this scenario ********* * names: synced_primary_raid1_2legs_1 * sync: 1 * type: raid1 * -m |-i value: 2 * leg devices: /dev/sdj1 /dev/sdb1 /dev/sdc1 * spanned legs: 0 * manual repair: 0 * no MDA devices: * failpv(s): /dev/sdj1 * additional snap: /dev/sdb1 * failnode(s): virt-374 * lvmetad: 1 * raid fault policy: allocate ****************************************************** Creating raids(s) on virt-374... virt-374: lvcreate --type raid1 -m 2 -n synced_primary_raid1_2legs_1 -L 500M black_bird /dev/sdj1:0-2400 /dev/sdb1:0-2400 /dev/sdc1:0-2400 Current mirror/raid device structure(s): LV Attr LSize Cpy%Sync Devices synced_primary_raid1_2legs_1 rwi-a-r--- 500.00m 6.26 synced_primary_raid1_2legs_1_rimage_0(0),synced_primary_raid1_2legs_1_rimage_1(0),synced_primary_raid1_2legs_1_rimage_2(0) [synced_primary_raid1_2legs_1_rimage_0] Iwi-aor--- 500.00m /dev/sdj1(1) [synced_primary_raid1_2legs_1_rimage_1] Iwi-aor--- 500.00m /dev/sdb1(1) [synced_primary_raid1_2legs_1_rimage_2] Iwi-aor--- 500.00m /dev/sdc1(1) [synced_primary_raid1_2legs_1_rmeta_0] ewi-aor--- 4.00m /dev/sdj1(0) [synced_primary_raid1_2legs_1_rmeta_1] ewi-aor--- 4.00m /dev/sdb1(0) [synced_primary_raid1_2legs_1_rmeta_2] ewi-aor--- 4.00m /dev/sdc1(0) root -wi-ao---- <6.20g /dev/vda2(205) swap -wi-ao---- 820.00m /dev/vda2(0) Waiting until all mirror|raid volumes become fully syncd... 1/1 mirror(s) are fully synced: ( 100.00% ) Sleeping 15 sec Creating thin pool for external origin on device not to be failed lvcreate --type thin-pool -n POOL -L 500M black_bird /dev/sdb1 Convert mirror/raid volume(s) to External Origin volume(s) on virt-374... lvconvert --thinpool black_bird/POOL --originname synced_primary_raid1_2legs_1_extorig -T synced_primary_raid1_2legs_1 --yes Activating external origin in order for it to be repaired after failure lvchange -ay black_bird/synced_primary_raid1_2legs_1_extorig Creating xfs on top of mirror(s) on virt-374... Mounting mirrored xfs filesystems on virt-374... Current mirror/raid device structure(s): LV Attr LSize Cpy%Sync Devices POOL twi-aotz-- 500.00m POOL_tdata(0) [POOL_tdata] Twi-ao---- 500.00m /dev/sdb1(127) [POOL_tmeta] ewi-ao---- 4.00m /dev/sdb1(252) [lvol0_pmspare] ewi------- 4.00m /dev/sdb1(126) synced_primary_raid1_2legs_1 Vwi-aotz-- 500.00m synced_primary_raid1_2legs_1_extorig ori-a-r--- 500.00m 100.00 synced_primary_raid1_2legs_1_extorig_rimage_0(0),synced_primary_raid1_2legs_1_extorig_rimage_1(0),synced_primary_raid1_2legs_1_extorig_rimage_2(0) [synced_primary_raid1_2legs_1_extorig_rimage_0] iwi-aor--- 500.00m /dev/sdj1(1) [synced_primary_raid1_2legs_1_extorig_rimage_1] iwi-aor--- 500.00m /dev/sdb1(1) [synced_primary_raid1_2legs_1_extorig_rimage_2] iwi-aor--- 500.00m /dev/sdc1(1) [synced_primary_raid1_2legs_1_extorig_rmeta_0] ewi-aor--- 4.00m /dev/sdj1(0) [synced_primary_raid1_2legs_1_extorig_rmeta_1] ewi-aor--- 4.00m /dev/sdb1(0) [synced_primary_raid1_2legs_1_extorig_rmeta_2] ewi-aor--- 4.00m /dev/sdc1(0) root -wi-ao---- <6.20g /dev/vda2(205) swap -wi-ao---- 820.00m /dev/vda2(0) PV=/dev/sdj1 synced_primary_raid1_2legs_1_extorig_rimage_0: 1.0 synced_primary_raid1_2legs_1_extorig_rmeta_0: 1.0 Creating a snapshot volume of each of the raids Writing verification files (checkit) to mirror(s) on... ---- virt-374 ---- <start name="virt-374_synced_primary_raid1_2legs_1" pid="22928" time="Tue Dec 12 11:16:30 2017 +0100" type="cmd" /> Sleeping 15 seconds to get some outsanding I/O locks before the failure lvcreate -k n -s /dev/black_bird/synced_primary_raid1_2legs_1 -n snap1_synced_primary_raid1_2legs_1 WARNING: Sum of all thin volume sizes (1000.00 MiB) exceeds the size of thin pool black_bird/POOL (500.00 MiB). lvcreate -k n -s /dev/black_bird/synced_primary_raid1_2legs_1 -n snap2_synced_primary_raid1_2legs_1 WARNING: Sum of all thin volume sizes (1.46 GiB) exceeds the size of thin pool black_bird/POOL (500.00 MiB). lvcreate -k n -s /dev/black_bird/synced_primary_raid1_2legs_1 -n snap3_synced_primary_raid1_2legs_1 WARNING: Sum of all thin volume sizes (1.95 GiB) exceeds the size of thin pool black_bird/POOL (500.00 MiB). Verifying files (checkit) on mirror(s) on... ---- virt-374 ---- Disabling device sdj on virt-374rescan device... /dev/sdj1: read failed after 0 of 4096 at 32212123648: Input/output error /dev/sdj1: read failed after 0 of 4096 at 32212209664: Input/output error /dev/sdj1: read failed after 0 of 4096 at 0: Input/output error /dev/sdj1: read failed after 0 of 4096 at 4096: Input/output error Attempting I/O to cause mirror down conversion(s) on virt-374 dd if=/dev/zero of=/mnt/synced_primary_raid1_2legs_1/ddfile count=10 bs=4M 10+0 records in 10+0 records out 41943040 bytes (42 MB) copied, 0.367308 s, 114 MB/s rescan PVs due issues w/ spanned legs involving raid[4,5] or virt volumes in 7.2 Verifying current sanity of lvm after the failure Current mirror/raid device structure(s): WARNING: Device for PV UQkwE4-mQmN-iZp8-dT3D-YCbs-38Ay-qqLsI5 not found or rejected by a filter. LV Attr LSize Cpy%Sync Devices POOL twi-aotz-- 500.00m POOL_tdata(0) [POOL_tdata] Twi-ao---- 500.00m /dev/sdb1(127) [POOL_tmeta] ewi-ao---- 4.00m /dev/sdb1(252) bb_snap1 swi-a-s--- 252.00m /dev/sdb1(253) [lvol0_pmspare] ewi------- 4.00m /dev/sdb1(126) snap1_synced_primary_raid1_2legs_1 Vwi-a-tz-- 500.00m snap2_synced_primary_raid1_2legs_1 Vwi-a-tz-- 500.00m snap3_synced_primary_raid1_2legs_1 Vwi-a-tz-- 500.00m synced_primary_raid1_2legs_1 owi-aotz-- 500.00m synced_primary_raid1_2legs_1_extorig ori-a-r--- 500.00m 100.00 synced_primary_raid1_2legs_1_extorig_rimage_0(0),synced_primary_raid1_2legs_1_extorig_rimage_1(0),synced_primary_raid1_2legs_1_extorig_rimage_2(0) [synced_primary_raid1_2legs_1_extorig_rimage_0] iwi-aor--- 500.00m /dev/sdi1(1) [synced_primary_raid1_2legs_1_extorig_rimage_1] iwi-aor--- 500.00m /dev/sdb1(1) [synced_primary_raid1_2legs_1_extorig_rimage_2] iwi-aor--- 500.00m /dev/sdc1(1) [synced_primary_raid1_2legs_1_extorig_rmeta_0] ewi-aor--- 4.00m /dev/sdi1(0) [synced_primary_raid1_2legs_1_extorig_rmeta_1] ewi-aor--- 4.00m /dev/sdb1(0) [synced_primary_raid1_2legs_1_extorig_rmeta_2] ewi-aor--- 4.00m /dev/sdc1(0) root -wi-ao---- <6.20g /dev/vda2(205) swap -wi-ao---- 820.00m /dev/vda2(0) Verifying FAILED device /dev/sdj1 is *NOT* in the volume(s) WARNING: Device for PV UQkwE4-mQmN-iZp8-dT3D-YCbs-38Ay-qqLsI5 not found or rejected by a filter. Verifying IMAGE device /dev/sdb1 *IS* in the volume(s) WARNING: Device for PV UQkwE4-mQmN-iZp8-dT3D-YCbs-38Ay-qqLsI5 not found or rejected by a filter. Verifying IMAGE device /dev/sdc1 *IS* in the volume(s) WARNING: Device for PV UQkwE4-mQmN-iZp8-dT3D-YCbs-38Ay-qqLsI5 not found or rejected by a filter. Verify the rimage/rmeta dm devices remain after the failures Checking EXISTENCE and STATE of synced_primary_raid1_2legs_1_extorig_rimage_0 on: virt-374 Checking EXISTENCE and STATE of synced_primary_raid1_2legs_1_extorig_rmeta_0 on: virt-374 Verify the raid image order is what's expected based on raid fault policy EXPECTED LEG ORDER: unknown /dev/sdb1 /dev/sdc1 WARNING: Device for PV UQkwE4-mQmN-iZp8-dT3D-YCbs-38Ay-qqLsI5 not found or rejected by a filter. WARNING: Device for PV UQkwE4-mQmN-iZp8-dT3D-YCbs-38Ay-qqLsI5 not found or rejected by a filter. WARNING: Device for PV UQkwE4-mQmN-iZp8-dT3D-YCbs-38Ay-qqLsI5 not found or rejected by a filter. WARNING: Device for PV UQkwE4-mQmN-iZp8-dT3D-YCbs-38Ay-qqLsI5 not found or rejected by a filter. WARNING: Device for PV UQkwE4-mQmN-iZp8-dT3D-YCbs-38Ay-qqLsI5 not found or rejected by a filter. ACTUAL LEG ORDER: /dev/sdi1 /dev/sdb1 /dev/sdc1 unknown ne /dev/sdi1 /dev/sdb1 ne /dev/sdb1 /dev/sdc1 ne /dev/sdc1 Verifying files (checkit) on mirror(s) on... ---- virt-374 ---- Enabling device sdj on virt-374 WARNING: Inconsistent metadata found for VG black_bird Running vgs to make LVM update metadata version if possible (will restore a-m PVs) WARNING: Missing device /dev/sdj1 reappeared, updating metadata for VG black_bird to version 19. WARNING: Inconsistent metadata found for VG black_bird - updating to use version 19 ------------------------------------------------------------------------------- Force a vgreduce to clean up the corrupt additional LV ( vgreduce --removemissing --force black_bird ) ------------------------------------------------------------------------------- Recreating PVs /dev/sdj1 and then extending back into black_bird virt-374 pvcreate /dev/sdj1 Can't initialize physical volume "/dev/sdj1" of volume group "black_bird" without -ff /dev/sdj1: physical volume not initialized. recreation of /dev/sdj1 failed, must still be in VG virt-374 vgextend black_bird /dev/sdj1 Physical volume '/dev/sdj1' is already in volume group 'black_bird' Unable to add physical volume '/dev/sdj1' to volume group 'black_bird' /dev/sdj1: physical volume not initialized. extension of /dev/sdj1 back into black_bird failed Verify that each of the raid repairs finished successfully Checking for leftover '-missing_0_0' or 'unknown devices' Checking for PVs marked as missing (a-m)... Verifying files (checkit) on mirror(s) on... ---- virt-374 ---- Stopping the io load (collie/xdoio) on mirror(s) <halt name="virt-374_synced_primary_raid1_2legs_1" pid="22928" time="Tue Dec 12 11:21:27 2017 +0100" type="cmd" duration="297" signal="2" /> HACK TO KILL XDOIO... xdoio: no process found Unmounting xfs and removing mnt point on virt-374... Deactivating and removing raid(s) Removing the left over external thin and pool volumes on virt-374... ========================== 3.10.0-811.el7.x86_64 lvm2-2.02.176-5.el7 BUILT: Wed Dec 6 11:13:07 CET 2017 lvm2-libs-2.02.176-5.el7 BUILT: Wed Dec 6 11:13:07 CET 2017 lvm2-cluster-2.02.176-5.el7 BUILT: Wed Dec 6 11:13:07 CET 2017 lvm2-python-boom-0.8.1-5.el7 BUILT: Wed Dec 6 11:15:40 CET 2017 cmirror-2.02.176-5.el7 BUILT: Wed Dec 6 11:13:07 CET 2017 device-mapper-1.02.145-5.el7 BUILT: Wed Dec 6 11:13:07 CET 2017 device-mapper-libs-1.02.145-5.el7 BUILT: Wed Dec 6 11:13:07 CET 2017 device-mapper-event-1.02.145-5.el7 BUILT: Wed Dec 6 11:13:07 CET 2017 device-mapper-event-libs-1.02.145-5.el7 BUILT: Wed Dec 6 11:13:07 CET 2017 device-mapper-persistent-data-0.7.3-3.el7 BUILT: Tue Nov 14 12:07:18 CET 2017 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2018:0853 |