Red Hat Bugzilla – Bug 1281525
external origin raid volumes are not monitored
Last modified: 2018-04-10 11:19:55 EDT
Description of problem: As a result device failures are not handled automatically. [root@host-109 ~]# lvs -a -o +devices LV Attr LSize Pool Origin Data% Meta% Cpy%Sync Devices POOL twi-a-tz-- 500.00m 0.00 0.88 POOL_tdata(0) [POOL_tdata] Twi-ao---- 500.00m /dev/sde1(127) [POOL_tmeta] ewi-ao---- 4.00m /dev/sde1(252) [lvol0_pmspare] ewi------- 4.00m /dev/sde1(126) synced_primary_raid1_2legs_1 rwi-a-r--- 500.00m 100.00 synced_primary_raid1_2legs_1_rimage_0(0),synced_primary_raid1_2legs_1_rimage_1(0),synced_primary_raid1_2legs_1_rimage_2(0) [synced_primary_raid1_2legs_1_rimage_0] iwi-aor--- 500.00m /dev/sdg1(1) [synced_primary_raid1_2legs_1_rimage_1] iwi-aor--- 500.00m /dev/sde1(1) [synced_primary_raid1_2legs_1_rimage_2] iwi-aor--- 500.00m /dev/sdd1(1) [synced_primary_raid1_2legs_1_rmeta_0] ewi-aor--- 4.00m /dev/sdg1(0) [synced_primary_raid1_2legs_1_rmeta_1] ewi-aor--- 4.00m /dev/sde1(0) [synced_primary_raid1_2legs_1_rmeta_2] ewi-aor--- 4.00m /dev/sdd1(0) [root@host-109 ~]# lvconvert --thinpool black_bird/POOL --originname synced_primary_raid1_2legs_1_extorig -T synced_primary_raid1_2legs_1 --yes Logical volume "synced_primary_raid1_2legs_1_extorig" created. Converted black_bird/synced_primary_raid1_2legs_1 to thin volume with external origin black_bird/synced_primary_raid1_2legs_1_extorig. [root@host-109 ~]# lvs -a -o +devices LV Attr LSize Pool Origin Data% Meta% Cpy%Sync Devices POOL twi-aotz-- 500.00m 0.00 0.98 POOL_tdata(0) [POOL_tdata] Twi-ao---- 500.00m /dev/sde1(127) [POOL_tmeta] ewi-ao---- 4.00m /dev/sde1(252) [lvol0_pmspare] ewi------- 4.00m /dev/sde1(126) synced_primary_raid1_2legs_1 Vwi-a-tz-- 500.00m POOL synced_primary_raid1_2legs_1_extorig 0.0 synced_primary_raid1_2legs_1_extorig ori---r--- 500.00m synced_primary_raid1_2legs_1_extorig_rimage_0(0),synced_primary_raid1_2legs_1_extorig_rimage_1(0),synced_primary_raid1_2legs_1_extorig_rimage_2(0) [synced_primary_raid1_2legs_1_extorig_rimage_0] Iwi-aor-r- 500.00m /dev/sdg1(1) [synced_primary_raid1_2legs_1_extorig_rimage_1] Iwi-aor-r- 500.00m /dev/sde1(1) [synced_primary_raid1_2legs_1_extorig_rimage_2] Iwi-aor-r- 500.00m /dev/sdd1(1) [synced_primary_raid1_2legs_1_extorig_rmeta_0] ewi-aor-r- 4.00m /dev/sdg1(0) [synced_primary_raid1_2legs_1_extorig_rmeta_1] ewi-aor-r- 4.00m /dev/sde1(0) [synced_primary_raid1_2legs_1_extorig_rmeta_2] ewi-aor-r- 4.00m /dev/sdd1(0) Nov 12 10:08:18 host-109 lvm[1240]: No longer monitoring RAID device black_bird-synced_primary_raid1_2legs_1 for events. Version-Release number of selected component (if applicable): 3.10.0-327.el7.x86_64 lvm2-2.02.130-5.el7 BUILT: Wed Oct 14 08:27:29 CDT 2015 lvm2-libs-2.02.130-5.el7 BUILT: Wed Oct 14 08:27:29 CDT 2015 lvm2-cluster-2.02.130-5.el7 BUILT: Wed Oct 14 08:27:29 CDT 2015 device-mapper-1.02.107-5.el7 BUILT: Wed Oct 14 08:27:29 CDT 2015 device-mapper-libs-1.02.107-5.el7 BUILT: Wed Oct 14 08:27:29 CDT 2015 device-mapper-event-1.02.107-5.el7 BUILT: Wed Oct 14 08:27:29 CDT 2015 device-mapper-event-libs-1.02.107-5.el7 BUILT: Wed Oct 14 08:27:29 CDT 2015 device-mapper-persistent-data-0.5.5-1.el7 BUILT: Thu Aug 13 09:58:10 CDT 2015 cmirror-2.02.130-5.el7 BUILT: Wed Oct 14 08:27:29 CDT 2015 sanlock-3.2.4-1.el7 BUILT: Fri Jun 19 12:48:49 CDT 2015 sanlock-lib-3.2.4-1.el7 BUILT: Fri Jun 19 12:48:49 CDT 2015 lvm2-lockd-2.02.130-5.el7 BUILT: Wed Oct 14 08:27:29 CDT 2015
[root@host-109 ~]# grep allocate /etc/lvm/lvm.conf raid_fault_policy = "allocate" A 'pvscan --cache' is required to even detect a failure... Disabling device sdc on host-109.virt.lab.msp.redhat.com Getting recovery check start time from /var/log/messages: Nov 12 09:41 Attempting I/O to cause mirror down conversion(s) on host-109.virt.lab.msp.redhat.com dd if=/dev/zero of=/mnt/synced_primary_raid1_2legs_1/ddfile count=10 bs=4M 10+0 records in 10+0 records out 41943040 bytes (42 MB) copied, 0.0613936 s, 683 MB/s rescan PVs /dev/sdc1: read failed after 0 of 4096 at 26838958080: Input/output error /dev/sdc1: read failed after 0 of 4096 at 26839048192: Input/output error /dev/sdc1: read failed after 0 of 4096 at 0: Input/output error /dev/sdc1: read failed after 0 of 4096 at 4096: Input/output error Current mirror/raid device structure(s): WARNING: Device for PV fj7JLI-3UcH-aFLI-DKKy-90hZ-342f-9zwEsg not found or rejected by a filter. LV Attr LSize Cpy%Sync Devices POOL twi-aotz-- 500.00m POOL_tdata(0) [POOL_tdata] Twi-ao---- 500.00m /dev/sdd1(127) [POOL_tmeta] ewi-ao---- 4.00m /dev/sdd1(252) [lvol0_pmspare] ewi------- 4.00m /dev/sdd1(126) snap1_synced_primary_raid1_2legs_1 Vwi-a-tzp- 500.00m snap2_synced_primary_raid1_2legs_1 Vwi-a-tzp- 500.00m snap3_synced_primary_raid1_2legs_1 Vwi-a-tzp- 500.00m synced_primary_raid1_2legs_1 Vwi-aotzp- 500.00m synced_primary_raid1_2legs_1_extorig ori---r-p- 500.00m synced_primary_raid1_2legs_1_extorig_rimage_0(0),synced_primary_raid1_2legs_1_extorig_rimage_1(0),synced_primary_raid1_2legs_1_extorig_rimage_2(0) [synced_primary_raid1_2legs_1_extorig_rimage_0] Iwi-aor-p- 500.00m unknown device(1) [synced_primary_raid1_2legs_1_extorig_rimage_1] Iwi-aor-r- 500.00m /dev/sdd1(1) [synced_primary_raid1_2legs_1_extorig_rimage_2] Iwi-aor-r- 500.00m /dev/sdg1(1) [synced_primary_raid1_2legs_1_extorig_rmeta_0] ewi-aor-p- 4.00m unknown device(0) [synced_primary_raid1_2legs_1_extorig_rmeta_1] ewi-aor-r- 4.00m /dev/sdd1(0) [synced_primary_raid1_2legs_1_extorig_rmeta_2] ewi-aor-r- 4.00m /dev/sdg1(0) (ALLOCATE POLICY) there should not be an 'unknown' device associated with synced_primary_raid1_2legs_1_extorig_rimage_0 on host-109.virt.lab.msp.redhat.com synced_primary_raid1_2legs_1_extorig synced_primary_raid1_2legs_1_extorig_rimage_0(0),synced_primary_raid1_2legs_1_extorig_rimage_1(0),synced_primary_raid1_2legs_1_extorig_rimage_2(0) [synced_primary_raid1_2legs_1_extorig_rimage_0] unknown device(1)
similar to bug 1280450, I suspect there is just something wrong with evaluating the type of LV (which is why sync % isn't printed right and why the LV isn't being monitored).
Fixed by upstream commit: https://www.redhat.com/archives/lvm-devel/2017-October/msg00045.html
Marking verified with latest rpms. External raid origin volumes and are monitored and repaired according to current raid fault policy. Also I was not able to observe 'unknown' devices any more after triggering a repair as shown in Comment 1. =============SCENARIO============= virt-374: pvcreate /dev/sdi1 /dev/sdb1 /dev/sdg1 /dev/sde1 /dev/sdj1 /dev/sdh1 /dev/sdc1 virt-374: vgcreate black_bird /dev/sdi1 /dev/sdb1 /dev/sdg1 /dev/sde1 /dev/sdj1 /dev/sdh1 /dev/sdc1 Enabling raid allocate fault policies on: virt-374 ================================================================================ Iteration 0.1 started at Tue Dec 12 11:15:54 CET 2017 ================================================================================ Scenario kill_primary_synced_raid1_2legs: Kill primary leg of synced 2 leg raid1 volume(s) ********* RAID hash info for this scenario ********* * names: synced_primary_raid1_2legs_1 * sync: 1 * type: raid1 * -m |-i value: 2 * leg devices: /dev/sdj1 /dev/sdb1 /dev/sdc1 * spanned legs: 0 * manual repair: 0 * no MDA devices: * failpv(s): /dev/sdj1 * additional snap: /dev/sdb1 * failnode(s): virt-374 * lvmetad: 1 * raid fault policy: allocate ****************************************************** Creating raids(s) on virt-374... virt-374: lvcreate --type raid1 -m 2 -n synced_primary_raid1_2legs_1 -L 500M black_bird /dev/sdj1:0-2400 /dev/sdb1:0-2400 /dev/sdc1:0-2400 Current mirror/raid device structure(s): LV Attr LSize Cpy%Sync Devices synced_primary_raid1_2legs_1 rwi-a-r--- 500.00m 6.26 synced_primary_raid1_2legs_1_rimage_0(0),synced_primary_raid1_2legs_1_rimage_1(0),synced_primary_raid1_2legs_1_rimage_2(0) [synced_primary_raid1_2legs_1_rimage_0] Iwi-aor--- 500.00m /dev/sdj1(1) [synced_primary_raid1_2legs_1_rimage_1] Iwi-aor--- 500.00m /dev/sdb1(1) [synced_primary_raid1_2legs_1_rimage_2] Iwi-aor--- 500.00m /dev/sdc1(1) [synced_primary_raid1_2legs_1_rmeta_0] ewi-aor--- 4.00m /dev/sdj1(0) [synced_primary_raid1_2legs_1_rmeta_1] ewi-aor--- 4.00m /dev/sdb1(0) [synced_primary_raid1_2legs_1_rmeta_2] ewi-aor--- 4.00m /dev/sdc1(0) root -wi-ao---- <6.20g /dev/vda2(205) swap -wi-ao---- 820.00m /dev/vda2(0) Waiting until all mirror|raid volumes become fully syncd... 1/1 mirror(s) are fully synced: ( 100.00% ) Sleeping 15 sec Creating thin pool for external origin on device not to be failed lvcreate --type thin-pool -n POOL -L 500M black_bird /dev/sdb1 Convert mirror/raid volume(s) to External Origin volume(s) on virt-374... lvconvert --thinpool black_bird/POOL --originname synced_primary_raid1_2legs_1_extorig -T synced_primary_raid1_2legs_1 --yes Activating external origin in order for it to be repaired after failure lvchange -ay black_bird/synced_primary_raid1_2legs_1_extorig Creating xfs on top of mirror(s) on virt-374... Mounting mirrored xfs filesystems on virt-374... Current mirror/raid device structure(s): LV Attr LSize Cpy%Sync Devices POOL twi-aotz-- 500.00m POOL_tdata(0) [POOL_tdata] Twi-ao---- 500.00m /dev/sdb1(127) [POOL_tmeta] ewi-ao---- 4.00m /dev/sdb1(252) [lvol0_pmspare] ewi------- 4.00m /dev/sdb1(126) synced_primary_raid1_2legs_1 Vwi-aotz-- 500.00m synced_primary_raid1_2legs_1_extorig ori-a-r--- 500.00m 100.00 synced_primary_raid1_2legs_1_extorig_rimage_0(0),synced_primary_raid1_2legs_1_extorig_rimage_1(0),synced_primary_raid1_2legs_1_extorig_rimage_2(0) [synced_primary_raid1_2legs_1_extorig_rimage_0] iwi-aor--- 500.00m /dev/sdj1(1) [synced_primary_raid1_2legs_1_extorig_rimage_1] iwi-aor--- 500.00m /dev/sdb1(1) [synced_primary_raid1_2legs_1_extorig_rimage_2] iwi-aor--- 500.00m /dev/sdc1(1) [synced_primary_raid1_2legs_1_extorig_rmeta_0] ewi-aor--- 4.00m /dev/sdj1(0) [synced_primary_raid1_2legs_1_extorig_rmeta_1] ewi-aor--- 4.00m /dev/sdb1(0) [synced_primary_raid1_2legs_1_extorig_rmeta_2] ewi-aor--- 4.00m /dev/sdc1(0) root -wi-ao---- <6.20g /dev/vda2(205) swap -wi-ao---- 820.00m /dev/vda2(0) PV=/dev/sdj1 synced_primary_raid1_2legs_1_extorig_rimage_0: 1.0 synced_primary_raid1_2legs_1_extorig_rmeta_0: 1.0 Creating a snapshot volume of each of the raids Writing verification files (checkit) to mirror(s) on... ---- virt-374 ---- <start name="virt-374_synced_primary_raid1_2legs_1" pid="22928" time="Tue Dec 12 11:16:30 2017 +0100" type="cmd" /> Sleeping 15 seconds to get some outsanding I/O locks before the failure lvcreate -k n -s /dev/black_bird/synced_primary_raid1_2legs_1 -n snap1_synced_primary_raid1_2legs_1 WARNING: Sum of all thin volume sizes (1000.00 MiB) exceeds the size of thin pool black_bird/POOL (500.00 MiB). lvcreate -k n -s /dev/black_bird/synced_primary_raid1_2legs_1 -n snap2_synced_primary_raid1_2legs_1 WARNING: Sum of all thin volume sizes (1.46 GiB) exceeds the size of thin pool black_bird/POOL (500.00 MiB). lvcreate -k n -s /dev/black_bird/synced_primary_raid1_2legs_1 -n snap3_synced_primary_raid1_2legs_1 WARNING: Sum of all thin volume sizes (1.95 GiB) exceeds the size of thin pool black_bird/POOL (500.00 MiB). Verifying files (checkit) on mirror(s) on... ---- virt-374 ---- Disabling device sdj on virt-374rescan device... /dev/sdj1: read failed after 0 of 4096 at 32212123648: Input/output error /dev/sdj1: read failed after 0 of 4096 at 32212209664: Input/output error /dev/sdj1: read failed after 0 of 4096 at 0: Input/output error /dev/sdj1: read failed after 0 of 4096 at 4096: Input/output error Attempting I/O to cause mirror down conversion(s) on virt-374 dd if=/dev/zero of=/mnt/synced_primary_raid1_2legs_1/ddfile count=10 bs=4M 10+0 records in 10+0 records out 41943040 bytes (42 MB) copied, 0.367308 s, 114 MB/s rescan PVs due issues w/ spanned legs involving raid[4,5] or virt volumes in 7.2 Verifying current sanity of lvm after the failure Current mirror/raid device structure(s): WARNING: Device for PV UQkwE4-mQmN-iZp8-dT3D-YCbs-38Ay-qqLsI5 not found or rejected by a filter. LV Attr LSize Cpy%Sync Devices POOL twi-aotz-- 500.00m POOL_tdata(0) [POOL_tdata] Twi-ao---- 500.00m /dev/sdb1(127) [POOL_tmeta] ewi-ao---- 4.00m /dev/sdb1(252) bb_snap1 swi-a-s--- 252.00m /dev/sdb1(253) [lvol0_pmspare] ewi------- 4.00m /dev/sdb1(126) snap1_synced_primary_raid1_2legs_1 Vwi-a-tz-- 500.00m snap2_synced_primary_raid1_2legs_1 Vwi-a-tz-- 500.00m snap3_synced_primary_raid1_2legs_1 Vwi-a-tz-- 500.00m synced_primary_raid1_2legs_1 owi-aotz-- 500.00m synced_primary_raid1_2legs_1_extorig ori-a-r--- 500.00m 100.00 synced_primary_raid1_2legs_1_extorig_rimage_0(0),synced_primary_raid1_2legs_1_extorig_rimage_1(0),synced_primary_raid1_2legs_1_extorig_rimage_2(0) [synced_primary_raid1_2legs_1_extorig_rimage_0] iwi-aor--- 500.00m /dev/sdi1(1) [synced_primary_raid1_2legs_1_extorig_rimage_1] iwi-aor--- 500.00m /dev/sdb1(1) [synced_primary_raid1_2legs_1_extorig_rimage_2] iwi-aor--- 500.00m /dev/sdc1(1) [synced_primary_raid1_2legs_1_extorig_rmeta_0] ewi-aor--- 4.00m /dev/sdi1(0) [synced_primary_raid1_2legs_1_extorig_rmeta_1] ewi-aor--- 4.00m /dev/sdb1(0) [synced_primary_raid1_2legs_1_extorig_rmeta_2] ewi-aor--- 4.00m /dev/sdc1(0) root -wi-ao---- <6.20g /dev/vda2(205) swap -wi-ao---- 820.00m /dev/vda2(0) Verifying FAILED device /dev/sdj1 is *NOT* in the volume(s) WARNING: Device for PV UQkwE4-mQmN-iZp8-dT3D-YCbs-38Ay-qqLsI5 not found or rejected by a filter. Verifying IMAGE device /dev/sdb1 *IS* in the volume(s) WARNING: Device for PV UQkwE4-mQmN-iZp8-dT3D-YCbs-38Ay-qqLsI5 not found or rejected by a filter. Verifying IMAGE device /dev/sdc1 *IS* in the volume(s) WARNING: Device for PV UQkwE4-mQmN-iZp8-dT3D-YCbs-38Ay-qqLsI5 not found or rejected by a filter. Verify the rimage/rmeta dm devices remain after the failures Checking EXISTENCE and STATE of synced_primary_raid1_2legs_1_extorig_rimage_0 on: virt-374 Checking EXISTENCE and STATE of synced_primary_raid1_2legs_1_extorig_rmeta_0 on: virt-374 Verify the raid image order is what's expected based on raid fault policy EXPECTED LEG ORDER: unknown /dev/sdb1 /dev/sdc1 WARNING: Device for PV UQkwE4-mQmN-iZp8-dT3D-YCbs-38Ay-qqLsI5 not found or rejected by a filter. WARNING: Device for PV UQkwE4-mQmN-iZp8-dT3D-YCbs-38Ay-qqLsI5 not found or rejected by a filter. WARNING: Device for PV UQkwE4-mQmN-iZp8-dT3D-YCbs-38Ay-qqLsI5 not found or rejected by a filter. WARNING: Device for PV UQkwE4-mQmN-iZp8-dT3D-YCbs-38Ay-qqLsI5 not found or rejected by a filter. WARNING: Device for PV UQkwE4-mQmN-iZp8-dT3D-YCbs-38Ay-qqLsI5 not found or rejected by a filter. ACTUAL LEG ORDER: /dev/sdi1 /dev/sdb1 /dev/sdc1 unknown ne /dev/sdi1 /dev/sdb1 ne /dev/sdb1 /dev/sdc1 ne /dev/sdc1 Verifying files (checkit) on mirror(s) on... ---- virt-374 ---- Enabling device sdj on virt-374 WARNING: Inconsistent metadata found for VG black_bird Running vgs to make LVM update metadata version if possible (will restore a-m PVs) WARNING: Missing device /dev/sdj1 reappeared, updating metadata for VG black_bird to version 19. WARNING: Inconsistent metadata found for VG black_bird - updating to use version 19 ------------------------------------------------------------------------------- Force a vgreduce to clean up the corrupt additional LV ( vgreduce --removemissing --force black_bird ) ------------------------------------------------------------------------------- Recreating PVs /dev/sdj1 and then extending back into black_bird virt-374 pvcreate /dev/sdj1 Can't initialize physical volume "/dev/sdj1" of volume group "black_bird" without -ff /dev/sdj1: physical volume not initialized. recreation of /dev/sdj1 failed, must still be in VG virt-374 vgextend black_bird /dev/sdj1 Physical volume '/dev/sdj1' is already in volume group 'black_bird' Unable to add physical volume '/dev/sdj1' to volume group 'black_bird' /dev/sdj1: physical volume not initialized. extension of /dev/sdj1 back into black_bird failed Verify that each of the raid repairs finished successfully Checking for leftover '-missing_0_0' or 'unknown devices' Checking for PVs marked as missing (a-m)... Verifying files (checkit) on mirror(s) on... ---- virt-374 ---- Stopping the io load (collie/xdoio) on mirror(s) <halt name="virt-374_synced_primary_raid1_2legs_1" pid="22928" time="Tue Dec 12 11:21:27 2017 +0100" type="cmd" duration="297" signal="2" /> HACK TO KILL XDOIO... xdoio: no process found Unmounting xfs and removing mnt point on virt-374... Deactivating and removing raid(s) Removing the left over external thin and pool volumes on virt-374... ========================== 3.10.0-811.el7.x86_64 lvm2-2.02.176-5.el7 BUILT: Wed Dec 6 11:13:07 CET 2017 lvm2-libs-2.02.176-5.el7 BUILT: Wed Dec 6 11:13:07 CET 2017 lvm2-cluster-2.02.176-5.el7 BUILT: Wed Dec 6 11:13:07 CET 2017 lvm2-python-boom-0.8.1-5.el7 BUILT: Wed Dec 6 11:15:40 CET 2017 cmirror-2.02.176-5.el7 BUILT: Wed Dec 6 11:13:07 CET 2017 device-mapper-1.02.145-5.el7 BUILT: Wed Dec 6 11:13:07 CET 2017 device-mapper-libs-1.02.145-5.el7 BUILT: Wed Dec 6 11:13:07 CET 2017 device-mapper-event-1.02.145-5.el7 BUILT: Wed Dec 6 11:13:07 CET 2017 device-mapper-event-libs-1.02.145-5.el7 BUILT: Wed Dec 6 11:13:07 CET 2017 device-mapper-persistent-data-0.7.3-3.el7 BUILT: Tue Nov 14 12:07:18 CET 2017
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2018:0853