Bug 1468590
| Summary: | vg is in limbo "Recovery failed" state after raid failure until 'vgreduce --removemissing' is run | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Roman Bednář <rbednar> | ||||
| Component: | lvm2 | Assignee: | David Teigland <teigland> | ||||
| lvm2 sub component: | LVM lock daemon / lvmlockd | QA Contact: | cluster-qe <cluster-qe> | ||||
| Status: | CLOSED ERRATA | Docs Contact: | |||||
| Severity: | unspecified | ||||||
| Priority: | high | CC: | agk, cmarthal, heinzm, jbrassow, mcsontos, prajnoha, rhandlin, teigland, zkabelac | ||||
| Version: | 7.4 | ||||||
| Target Milestone: | rc | ||||||
| Target Release: | --- | ||||||
| Hardware: | x86_64 | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | lvm2-2.02.186-1.el7 | Doc Type: | If docs needed, set a value | ||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2020-03-31 20:04:48 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | 1560739 | ||||||
| Bug Blocks: | |||||||
| Attachments: |
|
||||||
For a shared VG we have to disable repairs (writing the VG) that vg_read() usually does. I'm in the middle of a big overhaul of vg_read() at the moment which is addressing this problem. I was looking into this as well yesterday. This seems to be a state that the VG goes through after a failed device reappears, but before/until a 'vgreduce --removemissing' is run. This affects all raid types, not just raid10.
# Basic raid1 (non primary device) failure:
host-113: pvcreate /dev/sdb1 /dev/sda1 /dev/sdf1 /dev/sdd1 /dev/sdh1 /dev/sdc1 /dev/sde1
host-113: vgcreate --shared black_bird /dev/sdb1 /dev/sda1 /dev/sdf1 /dev/sdd1 /dev/sdh1 /dev/sdc1 /dev/sde1
host-113: vgchange --lock-start black_bird
host-114: vgchange --lock-start black_bird
host-115: vgchange --lock-start black_bird
Enabling raid allocate fault policies on: host-115
================================================================================
Iteration 0.1 started at Thu Jul 6 14:27:11 CDT 2017
================================================================================
Scenario kill_random_synced_raid1_3legs: Kill random leg of synced 3 leg raid1 volume(s)
********* RAID hash info for this scenario *********
* names: synced_random_raid1_3legs_1
* sync: 1
* type: raid1
* -m |-i value: 3
* leg devices: /dev/sdf1 /dev/sdd1 /dev/sde1 /dev/sdh1
* spanned legs: 0
* manual repair: 0
* no MDA devices:
* failpv(s): /dev/sde1
* additional snap: /dev/sdf1
* failnode(s): host-115
* lvmetad: 0
* raid fault policy: allocate
******************************************************
Creating raids(s) on host-115...
host-115: lvcreate -aye --type raid1 -m 3 -n synced_random_raid1_3legs_1 -L 500M black_bird /dev/sdf1:0-2400 /dev/sdd1:0-2400 /dev/sde1:0-2400 /dev/sdh1:0-2400
Current mirror/raid device structure(s):
LV Attr LSize Cpy%Sync Devices
[lvmlock] -wi-ao---- 256.00m /dev/sdb1(0)
synced_random_raid1_3legs_1 rwi-a-r--- 500.00m 6.26 synced_random_raid1_3legs_1_rimage_0(0),synced_random_raid1_3legs_1_rimage_1(0),synced_random_raid1_3legs_1_rimage_2(0),synced_random_raid1_3legs_1_rimage_3(0)
[synced_random_raid1_3legs_1_rimage_0] Iwi-aor--- 500.00m /dev/sdf1(1)
[synced_random_raid1_3legs_1_rimage_1] Iwi-aor--- 500.00m /dev/sdd1(1)
[synced_random_raid1_3legs_1_rimage_2] Iwi-aor--- 500.00m /dev/sde1(1)
[synced_random_raid1_3legs_1_rimage_3] Iwi-aor--- 500.00m /dev/sdh1(1)
[synced_random_raid1_3legs_1_rmeta_0] ewi-aor--- 4.00m /dev/sdf1(0)
[synced_random_raid1_3legs_1_rmeta_1] ewi-aor--- 4.00m /dev/sdd1(0)
[synced_random_raid1_3legs_1_rmeta_2] ewi-aor--- 4.00m /dev/sde1(0)
[synced_random_raid1_3legs_1_rmeta_3] ewi-aor--- 4.00m /dev/sdh1(0)
[lvmlock] -wi-ao---- 256.00m /dev/sdg1(0)
Waiting until all mirror|raid volumes become fully syncd...
1/1 mirror(s) are fully synced: ( 100.00% )
Sleeping 15 sec
Creating gfs2 on top of mirror(s) on host-115...
mkfs.gfs2 -J 32M -j 1 -p lock_nolock /dev/black_bird/synced_random_raid1_3legs_1 -O
Mounting mirrored gfs2 filesystems on host-115...
PV=/dev/sde1
synced_random_raid1_3legs_1_rimage_2: 1.0
synced_random_raid1_3legs_1_rmeta_2: 1.0
Creating a snapshot volume of each of the raids
Writing verification files (checkit) to mirror(s) on...
---- host-115 ----
Sleeping 15 seconds to get some outsanding I/O locks before the failure
Verifying files (checkit) on mirror(s) on...
---- host-115 ----
Disabling device sde on host-115rescan device...
Attempting I/O to cause mirror down conversion(s) on host-115
dd if=/dev/zero of=/mnt/synced_random_raid1_3legs_1/ddfile count=10 bs=4M
Verifying current sanity of lvm after the failure
Current mirror/raid device structure(s):
WARNING: Not using lvmetad because a repair command was run.
Couldn't find device with uuid icmWuc-ACno-MeJy-HVOs-XU12-3Wat-0JZ0Pj.
LV Attr LSize Cpy%Sync Devices
bb_snap1 swi-a-s--- 252.00m /dev/sdf1(126)
[lvmlock] -wi-ao---- 256.00m /dev/sdb1(0)
synced_random_raid1_3legs_1 owi-aor--- 500.00m 100.00 synced_random_raid1_3legs_1_rimage_0(0),synced_random_raid1_3legs_1_rimage_1(0),synced_random_raid1_3legs_1_rimage_2(0),synced_random_raid1_3legs_1_rimage_3(0)
[synced_random_raid1_3legs_1_rimage_0] iwi-aor--- 500.00m /dev/sdf1(1)
[synced_random_raid1_3legs_1_rimage_1] iwi-aor--- 500.00m /dev/sdd1(1)
[synced_random_raid1_3legs_1_rimage_2] iwi-aor--- 500.00m /dev/sdb1(65)
[synced_random_raid1_3legs_1_rimage_3] iwi-aor--- 500.00m /dev/sdh1(1)
[synced_random_raid1_3legs_1_rmeta_0] ewi-aor--- 4.00m /dev/sdf1(0)
[synced_random_raid1_3legs_1_rmeta_1] ewi-aor--- 4.00m /dev/sdd1(0)
[synced_random_raid1_3legs_1_rmeta_2] ewi-aor--- 4.00m /dev/sdb1(64)
[synced_random_raid1_3legs_1_rmeta_3] ewi-aor--- 4.00m /dev/sdh1(0)
[lvmlock] -wi-ao---- 256.00m /dev/sdg1(0)
Verifying FAILED device /dev/sde1 is *NOT* in the volume(s)
Verifying IMAGE device /dev/sdf1 *IS* in the volume(s)
Verifying IMAGE device /dev/sdd1 *IS* in the volume(s)
Verifying IMAGE device /dev/sdh1 *IS* in the volume(s)
Verify the rimage/rmeta dm devices remain after the failures
Checking EXISTENCE and STATE of synced_random_raid1_3legs_1_rimage_2 on: host-115
Checking EXISTENCE and STATE of synced_random_raid1_3legs_1_rmeta_2 on: host-115
Verify the raid image order is what's expected based on raid fault policy
EXPECTED LEG ORDER: /dev/sdf1 /dev/sdd1 unknown /dev/sdh1
ACTUAL LEG ORDER: /dev/sdf1 /dev/sdd1 /dev/sdb1 /dev/sdh1
Verifying files (checkit) on mirror(s) on...
---- host-115 ----
Enabling device sde on host-115 Running vgs to make LVM update metadata version if possible (will restore a-m PVs)
WARNING: Not using lvmetad because a repair command was run.
WARNING: Missing device /dev/sde1 reappeared, updating metadata for VG black_bird to version 9.
Recovery of volume group "black_bird" failed.
Cannot process volume group black_bird
Simple vgs cmd failed after bringing sde back online
# If you comment out this vgs failure in the test, it should continue on and eventually pass...
# Here's where lvm/lvmlockd is in a limbo type state until a "vgreduce --removemissing" is run
# VG is "gone"
[root@host-115 ~]# lvs -a -o +devices
WARNING: Not using lvmetad because a repair command was run.
WARNING: Missing device /dev/sde1 reappeared, updating metadata for VG black_bird to version 9.
Recovery of volume group "black_bird" failed.
Cannot process volume group black_bird
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert Devices
[lvmlock] global -wi-ao---- 256.00m /dev/sdg1(0)
[root@host-115 ~]# pvscan
WARNING: Inconsistent metadata found for VG black_bird
WARNING: Missing device /dev/sde1 reappeared, updating metadata for VG black_bird to version 9.
Recovery of volume group "black_bird" failed.
Cannot process volume group black_bird
PV /dev/sdg1 VG global lvm2 [<21.00 GiB / <20.75 GiB free]
WARNING: Missing device /dev/sde1 reappeared, updating metadata for VG black_bird to version 9.
Recovery of volume group "black_bird" failed.
Cannot process volume group black_bird
Jul 6 14:28:13 host-115 qarshd[6468]: Running cmdline: echo offline > /sys/block/sde/device/state
Jul 6 14:28:13 host-115 qarshd[6472]: Running cmdline: pvscan --cache /dev/sde1
Jul 6 14:28:13 host-115 kernel: sd 7:0:0:1: rejecting I/O to offline device
Jul 6 14:28:13 host-115 kernel: sd 7:0:0:1: rejecting I/O to offline device
Jul 6 14:28:13 host-115 kernel: md: super_written gets error=-5, uptodate=0
Jul 6 14:28:13 host-115 kernel: md/raid1:mdX: Disk failure on dm-9, disabling device.#012md/raid1:mdX: Operation continuing on 3 devices.
Jul 6 14:28:13 host-115 lvm[6137]: WARNING: Device #2 of raid1 array, black_bird-synced_random_raid1_3legs_1-real, has failed.
Jul 6 14:28:13 host-115 lvm[6137]: WARNING: Disabling lvmetad cache for repair command.
Jul 6 14:28:13 host-115 lvm[6137]: WARNING: Not using lvmetad because of repair.
[...]
Jul 6 14:28:13 host-115 lvm[6137]: Couldn't find device with uuid icmWuc-ACno-MeJy-HVOs-XU12-3Wat-0JZ0Pj.
Jul 6 14:28:13 host-115 lvm[6137]: WARNING: Couldn't find all devices for LV black_bird/synced_random_raid1_3legs_1_rimage_2 while checking used and assumed devices.
Jul 6 14:28:13 host-115 lvm[6137]: WARNING: Couldn't find all devices for LV black_bird/synced_random_raid1_3legs_1_rmeta_2 while checking used and assumed devices.
Jul 6 14:28:13 host-115 kernel: device-mapper: raid: Device 2 specified for rebuild; clearing superblock
Jul 6 14:28:13 host-115 kernel: md/raid1:mdX: active with 3 out of 4 mirrors
Jul 6 14:28:13 host-115 kernel: md: recovery of RAID array mdX
### Repair does eventually finish successfully, just like any normal raid failure.
Jul 6 14:28:14 host-115 kernel: md/raid1:mdX: active with 3 out of 4 mirrors
Jul 6 14:28:14 host-115 kernel: md: mdX: recovery interrupted.
Jul 6 14:28:14 host-115 lvm[6137]: Faulty devices in black_bird/synced_random_raid1_3legs_1 successfully replaced.
Jul 6 14:28:14 host-115 kernel: md: recovery of RAID array mdX
Jul 6 14:28:14 host-115 lvm[6137]: raid1 array, black_bird-synced_random_raid1_3legs_1-real, is not in-sync.
Jul 6 14:28:15 host-115 qarshd[6634]: Running cmdline: dd if=/dev/zero of=/mnt/synced_random_raid1_3legs_1/ddfile count=10 bs=4M
Jul 6 14:28:16 host-115 qarshd[6639]: Running cmdline: sync
Jul 6 14:28:20 host-115 kernel: md: mdX: recovery done.
Jul 6 14:28:20 host-115 lvm[6137]: raid1 array, black_bird-synced_random_raid1_3legs_1-real, is now in-sync.
Jul 6 14:31:49 host-115 qarshd[6855]: Running cmdline: echo running > /sys/block/sde/device/state
Jul 6 14:31:49 host-115 qarshd[6859]: Running cmdline: vgs
Jul 6 14:32:04 host-115 crmd[2082]: notice: High CPU load detected: 1.420000
Jul 6 14:32:34 host-115 crmd[2082]: notice: High CPU load detected: 1.380000
[...]
Jul 6 14:41:34 host-115 crmd[2082]: notice: High CPU load detected: 1.180000
Jul 6 14:42:04 host-115 crmd[2082]: notice: High CPU load detected: 1.110000
Jul 6 14:42:13 host-115 lvmetad[472]: update_metadata ignoring outdated metadata on PV icmWuc-ACno-MeJy-HVOs-XU12-3Wat-0JZ0Pj seqno 7 for 42Gol8-sLb1-xFXL-dZrr-fdbj-8Ypn-LpqjNQ black_bird seqno 9
Jul 6 14:42:13 host-115 lvmetad[472]: PV icmWuc-ACno-MeJy-HVOs-XU12-3Wat-0JZ0Pj has outdated metadata for VG 42Gol8-sLb1-xFXL-dZrr-fdbj-8Ypn-LpqjNQ
Jul 6 14:42:13 host-115 lvmetad[472]: Cannot use VG metadata for black_bird 42Gol8-sLb1-xFXL-dZrr-fdbj-8Ypn-LpqjNQ from PV icmWuc-ACno-MeJy-HVOs-XU12-3Wat-0JZ0Pj on 2113
[root@host-115 ~]# vgs
WARNING: Missing device /dev/sde1 reappeared, updating metadata for VG black_bird to version 9.
Recovery of volume group "black_bird" failed.
Cannot process volume group black_bird
VG #PV #LV #SN Attr VSize VFree
global 1 0 0 wz--ns <21.00g <20.75g
rhel_host-115 1 2 0 wz--n- <7.00g 0
[root@host-115 ~]# vgreduce --removemissing --force black_bird
Wrote out consistent volume group black_bird.
### Seems to be back to a relative "normal" state now. /dev/sde needs to be added back to VG
[root@host-115 ~]# pvscan
PV /dev/sdb1 VG black_bird lvm2 [<21.00 GiB / 20.25 GiB free]
PV /dev/sda1 VG black_bird lvm2 [<21.00 GiB / <21.00 GiB free]
PV /dev/sdd1 VG black_bird lvm2 [<21.00 GiB / 20.50 GiB free]
PV /dev/sdh1 VG black_bird lvm2 [<21.00 GiB / 20.50 GiB free]
PV /dev/sdc1 VG black_bird lvm2 [<21.00 GiB / <21.00 GiB free]
PV /dev/sdf1 VG black_bird lvm2 [<21.00 GiB / <20.26 GiB free]
PV /dev/vda2 VG rhel_host-115 lvm2 [<7.00 GiB / 0 free]
PV /dev/sdg1 VG global lvm2 [<21.00 GiB / <20.75 GiB free]
WARNING: PV /dev/sde1 is marked in use but no VG was found using it.
WARNING: PV /dev/sde1 might need repairing.
PV /dev/sde1 lvm2 [<21.00 GiB]
Total: 9 [<174.97 GiB] / in use: 8 [<153.97 GiB] / in no VG: 1 [<21.00 GiB]
[root@host-115 ~]# lvs
LV VG Attr LSize Pool Origin Data% Cpy%Sync
bb_snap1 black_bird swi-a-s--- 252.00m synced_random_raid1_3legs_1 29.50
synced_random_raid1_3legs_1 black_bird owi-aor--- 500.00m 100.00
Another affect of this, or possible an entirely different bug, is that once the test case passes, and you remove the raid LV, you're left with an "unknown" PV that you're unable to get rid of. All the devices sd[abcdefg] are back in the VG. [root@host-115 ~]# pvscan PV /dev/vda2 VG rhel_host-115 lvm2 [<7.00 GiB / 0 free] PV /dev/sdg1 VG global lvm2 [<21.00 GiB / <20.75 GiB free] WARNING: Device for PV po9UOb-IiEU-evit-nEgQ-dpmf-ZECl-zKxQFI not found or rejected by a filter. Reading VG black_bird without a lock. PV /dev/sdb1 VG black_bird lvm2 [<21.00 GiB / <20.75 GiB free] PV /dev/sda1 VG black_bird lvm2 [<21.00 GiB / <21.00 GiB free] PV /dev/sdh1 VG black_bird lvm2 [<21.00 GiB / <21.00 GiB free] PV /dev/sdc1 VG black_bird lvm2 [<21.00 GiB / <21.00 GiB free] PV /dev/sdd1 VG black_bird lvm2 [<21.00 GiB / <21.00 GiB free] PV /dev/sde1 VG black_bird lvm2 [<21.00 GiB / <21.00 GiB free] PV [unknown] VG black_bird lvm2 [<21.00 GiB / <21.00 GiB free] PV /dev/sdf1 VG black_bird lvm2 [<21.00 GiB / <21.00 GiB free] Total: 10 [195.96 GiB] / in use: 10 [195.96 GiB] / in no VG: 0 [0 ] Assumed fixed by dependency 1560739 Adding QA ack for 7.8. Covered by automated tests, see qa whiteboard. Created attachment 1623843 [details]
test results
Verified with latest RPMs. Attaching test result log.
lvm2-2.02.186-2.el7.x86_64
kernel-3.10.0-1100.el7.x86_64
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:1129 |
Volume group recovery fails after single leg non-synced raid10 failure when using lvmlockd and lvmetad (lvmetad auto-disabled after repair). ./black_bird -L virt-388 -w EXT -F -e kill_primary_non_synced_raid10_3legs Enabling raid allocate fault policies on: virt-388 ================================================================================ Iteration 0.1 started at Fri Jul 7 13:26:31 CEST 2017 ================================================================================ Scenario kill_primary_non_synced_raid10_3legs: Kill primary leg of NON synced 3 leg raid10 volume(s) ********* RAID hash info for this scenario ********* * names: non_synced_primary_raid10_3legs_1 * sync: 0 * type: raid10 * -m |-i value: 3 * leg devices: /dev/sde1 /dev/sdj1 /dev/sdc1 /dev/sdb1 /dev/sdg1 /dev/sdi1 * spanned legs: 0 * manual repair: 0 * no MDA devices: * failpv(s): /dev/sde1 * additional snap: /dev/sdj1 * failnode(s): virt-388 * lvmetad: 0 * raid fault policy: allocate ****************************************************** Creating raids(s) on virt-388... virt-388: lvcreate -aye --type raid10 -i 3 -n non_synced_primary_raid10_3legs_1 -L 10G black_bird /dev/sde1:0-3600 /dev/sdj1:0-3600 /dev/sdc1:0-3600 /dev/sdb1:0-3600 /dev/sdg1:0-360 0 /dev/sdi1:0-3600 Current mirror/raid device structure(s): LV Attr LSize Cpy%Sync Devices [lvmlock] -wi-ao---- 256.00m /dev/sdd1(0) non_synced_primary_raid10_3legs_1 rwi-a-r--- <10.01g 0.00 non_synced_primary_raid10_3legs_1_rimage_0(0),non_synced_primary_raid10_3legs_1_rimage_1(0),non_synced_pr imary_raid10_3legs_1_rimage_2(0),non_synced_primary_raid10_3legs_1_rimage_3(0),non_synced_primary_raid10_3legs_1_rimage_4(0),non_synced_primary_raid10_3legs_1_rimage_5(0) [non_synced_primary_raid10_3legs_1_rimage_0] Iwi-aor--- <3.34g /dev/sde1(1) [non_synced_primary_raid10_3legs_1_rimage_1] Iwi-aor--- <3.34g /dev/sdj1(1) [non_synced_primary_raid10_3legs_1_rimage_2] Iwi-aor--- <3.34g /dev/sdc1(1) [non_synced_primary_raid10_3legs_1_rimage_3] Iwi-aor--- <3.34g /dev/sdb1(1) [non_synced_primary_raid10_3legs_1_rimage_4] Iwi-aor--- <3.34g /dev/sdg1(1) [non_synced_primary_raid10_3legs_1_rimage_5] Iwi-aor--- <3.34g /dev/sdi1(1) [non_synced_primary_raid10_3legs_1_rmeta_0] ewi-aor--- 4.00m /dev/sde1(0) [non_synced_primary_raid10_3legs_1_rmeta_1] ewi-aor--- 4.00m /dev/sdj1(0) [non_synced_primary_raid10_3legs_1_rmeta_2] ewi-aor--- 4.00m /dev/sdc1(0) [non_synced_primary_raid10_3legs_1_rmeta_3] ewi-aor--- 4.00m /dev/sdb1(0) [non_synced_primary_raid10_3legs_1_rmeta_4] ewi-aor--- 4.00m /dev/sdg1(0) [non_synced_primary_raid10_3legs_1_rmeta_5] ewi-aor--- 4.00m /dev/sdi1(0) [lvmlock] -wi-ao---- 256.00m /dev/sda1(0) root -wi-ao---- <6.20g /dev/vda2(205) swap -wi-ao---- 820.00m /dev/vda2(0) * NOTE: not enough available devices for allocation fault polices to fully work * Creating ext on top of mirror(s) on virt-388... mke2fs 1.42.9 (28-Dec-2013) Mounting mirrored ext filesystems on virt-388... PV=/dev/sde1 non_synced_primary_raid10_3legs_1_rimage_0: 1.0 non_synced_primary_raid10_3legs_1_rmeta_0: 1.0 Creating a snapshot volume of each of the raids Writing verification files (checkit) to mirror(s) on... ---- virt-388 ---- Verifying files (checkit) on mirror(s) on... ---- virt-388 ---- Name GrpID RgID ObjType ArID ArStart ArSize RMrg/s WMrg/s R/s W/s RSz/s WSz/s AvgRqSz QSize Util% AWait RdAWait WrAWait virt-388_load 0 0 group 0 133.00m 980.00k 0.00 0.00 0.00 245.00 0 980.00k 4.00k 1.01 0.50 4.13 0.00 4.13 Name GrpID RgID ObjType RgStart RgSize #Areas ArSize ProgID virt-388_load 0 0 group 133.00m 980.00k 1 980.00k dmstats Current sync percent just before failure ( 18.50% ) Disabling device sde on virt-388rescan device... /dev/sde1: read failed after 0 of 1024 at 42944036864: Input/output error /dev/sde1: read failed after 0 of 1024 at 42944143360: Input/output error /dev/sde1: read failed after 0 of 1024 at 0: Input/output error /dev/sde1: read failed after 0 of 1024 at 4096: Input/output error /dev/sde1: read failed after 0 of 2048 at 0: Input/output error Attempting I/O to cause mirror down conversion(s) on virt-388 dd if=/dev/zero of=/mnt/non_synced_primary_raid10_3legs_1/ddfile count=10 bs=4M 10+0 records in 10+0 records out 41943040 bytes (42 MB) copied, 0.205385 s, 204 MB/s Verifying current sanity of lvm after the failure Current mirror/raid device structure(s): WARNING: Not using lvmetad because a repair command was run. Couldn't find device with uuid enqEiN-09mX-Zx1d-nBrB-sCcj-IWLw-yK3IxP. LV Attr LSize Cpy%Sync Devices bb_snap1 swi-a-s--- 252.00m /dev/sdj1(855) [lvmlock] -wi-ao---- 256.00m /dev/sdd1(0) non_synced_primary_raid10_3legs_1 owi-aor--- <10.01g 100.00 non_synced_primary_raid10_3legs_1_rimage_0(0),non_synced_primary_raid10_3legs_1_rimage_1(0),non_synced_pr imary_raid10_3legs_1_rimage_2(0),non_synced_primary_raid10_3legs_1_rimage_3(0),non_synced_primary_raid10_3legs_1_rimage_4(0),non_synced_primary_raid10_3legs_1_rimage_5(0) [non_synced_primary_raid10_3legs_1_rimage_0] iwi-aor--- <3.34g /dev/sdd1(65) [non_synced_primary_raid10_3legs_1_rimage_1] iwi-aor--- <3.34g /dev/sdj1(1) [non_synced_primary_raid10_3legs_1_rimage_2] iwi-aor--- <3.34g /dev/sdc1(1) [non_synced_primary_raid10_3legs_1_rimage_3] iwi-aor--- <3.34g /dev/sdb1(1) [non_synced_primary_raid10_3legs_1_rimage_4] iwi-aor--- <3.34g /dev/sdg1(1) [non_synced_primary_raid10_3legs_1_rimage_5] iwi-aor--- <3.34g /dev/sdi1(1) [non_synced_primary_raid10_3legs_1_rmeta_0] ewi-aor--- 4.00m /dev/sdd1(64) [non_synced_primary_raid10_3legs_1_rmeta_1] ewi-aor--- 4.00m /dev/sdj1(0) [non_synced_primary_raid10_3legs_1_rmeta_2] ewi-aor--- 4.00m /dev/sdc1(0) [non_synced_primary_raid10_3legs_1_rmeta_3] ewi-aor--- 4.00m /dev/sdb1(0) [non_synced_primary_raid10_3legs_1_rmeta_4] ewi-aor--- 4.00m /dev/sdg1(0) [non_synced_primary_raid10_3legs_1_rmeta_5] ewi-aor--- 4.00m /dev/sdi1(0) [lvmlock] -wi-ao---- 256.00m /dev/sda1(0) root -wi-ao---- <6.20g /dev/vda2(205) swap -wi-ao---- 820.00m /dev/vda2(0) Verifying FAILED device /dev/sde1 is *NOT* in the volume(s) WARNING: Not using lvmetad because a repair command was run. Couldn't find device with uuid enqEiN-09mX-Zx1d-nBrB-sCcj-IWLw-yK3IxP. Verifying IMAGE device /dev/sdj1 *IS* in the volume(s) WARNING: Not using lvmetad because a repair command was run. Couldn't find device with uuid enqEiN-09mX-Zx1d-nBrB-sCcj-IWLw-yK3IxP. Verifying IMAGE device /dev/sdc1 *IS* in the volume(s) WARNING: Not using lvmetad because a repair command was run. Couldn't find device with uuid enqEiN-09mX-Zx1d-nBrB-sCcj-IWLw-yK3IxP. Verifying IMAGE device /dev/sdb1 *IS* in the volume(s) WARNING: Not using lvmetad because a repair command was run. Couldn't find device with uuid enqEiN-09mX-Zx1d-nBrB-sCcj-IWLw-yK3IxP. Verifying IMAGE device /dev/sdg1 *IS* in the volume(s) WARNING: Not using lvmetad because a repair command was run. Couldn't find device with uuid enqEiN-09mX-Zx1d-nBrB-sCcj-IWLw-yK3IxP. Verifying IMAGE device /dev/sdi1 *IS* in the volume(s) WARNING: Not using lvmetad because a repair command was run. Couldn't find device with uuid enqEiN-09mX-Zx1d-nBrB-sCcj-IWLw-yK3IxP. Verify the rimage/rmeta dm devices remain after the failures Checking EXISTENCE and STATE of non_synced_primary_raid10_3legs_1_rimage_0 on: virt-388 Checking EXISTENCE and STATE of non_synced_primary_raid10_3legs_1_rmeta_0 on: virt-388 Verify the raid image order is what's expected based on raid fault policy EXPECTED LEG ORDER: unknown /dev/sdj1 /dev/sdc1 /dev/sdb1 /dev/sdg1 /dev/sdi1 WARNING: Not using lvmetad because a repair command was run. Couldn't find device with uuid enqEiN-09mX-Zx1d-nBrB-sCcj-IWLw-yK3IxP. WARNING: Not using lvmetad because a repair command was run. Couldn't find device with uuid enqEiN-09mX-Zx1d-nBrB-sCcj-IWLw-yK3IxP. WARNING: Not using lvmetad because a repair command was run. Couldn't find device with uuid enqEiN-09mX-Zx1d-nBrB-sCcj-IWLw-yK3IxP. WARNING: Not using lvmetad because a repair command was run. Couldn't find device with uuid enqEiN-09mX-Zx1d-nBrB-sCcj-IWLw-yK3IxP. WARNING: Not using lvmetad because a repair command was run. Couldn't find device with uuid enqEiN-09mX-Zx1d-nBrB-sCcj-IWLw-yK3IxP. WARNING: Not using lvmetad because a repair command was run. Couldn't find device with uuid enqEiN-09mX-Zx1d-nBrB-sCcj-IWLw-yK3IxP. WARNING: Not using lvmetad because a repair command was run. Couldn't find device with uuid enqEiN-09mX-Zx1d-nBrB-sCcj-IWLw-yK3IxP. WARNING: Not using lvmetad because a repair command was run. Couldn't find device with uuid enqEiN-09mX-Zx1d-nBrB-sCcj-IWLw-yK3IxP. ACTUAL LEG ORDER: /dev/sdd1 /dev/sdj1 /dev/sdc1 /dev/sdb1 /dev/sdg1 /dev/sdi1 unknown ne /dev/sdd1 /dev/sdj1 ne /dev/sdj1 /dev/sdc1 ne /dev/sdc1 /dev/sdb1 ne /dev/sdb1 /dev/sdg1 ne /dev/sdg1 /dev/sdi1 ne /dev/sdi1 Verifying files (checkit) on mirror(s) on... ---- virt-388 ---- Enabling device sde on virt-388 Running vgs to make LVM update metadata version if possible (will restore a-m PVs) WARNING: Not using lvmetad because a repair command was run. WARNING: Missing device /dev/sde1 reappeared, updating metadata for VG black_bird to version 9. Recovery of volume group "black_bird" failed. Cannot process volume group black_bird Simple vgs cmd failed after brining sde back online Possible regression of bug 1412843/1434054 ================================================================== Check device and services are ok: [root@virt-388 ~]# cat /sys/block/sde/device/state running [root@virt-388 ~]# systemctl is-active lvm2-lvmetad lvm2-lvmlockd sanlock active active active [root@virt-388 ~]# vgs WARNING: Not using lvmetad because a repair command was run. WARNING: Missing device /dev/sde1 reappeared, updating metadata for VG black_bird to version 9. Recovery of volume group "black_bird" failed. Cannot process volume group black_bird VG #PV #LV #SN Attr VSize VFree global 1 0 0 wz--ns 39.98g 39.73g rhel_virt-388 1 2 0 wz--n- <7.00g 0 =================================================================== 3.10.0-689.el7.x86_64 lvm2-2.02.171-8.el7 BUILT: Wed Jun 28 20:28:58 CEST 2017 lvm2-libs-2.02.171-8.el7 BUILT: Wed Jun 28 20:28:58 CEST 2017 lvm2-cluster-2.02.171-8.el7 BUILT: Wed Jun 28 20:28:58 CEST 2017 device-mapper-1.02.140-8.el7 BUILT: Wed Jun 28 20:28:58 CEST 2017 device-mapper-libs-1.02.140-8.el7 BUILT: Wed Jun 28 20:28:58 CEST 2017 device-mapper-event-1.02.140-8.el7 BUILT: Wed Jun 28 20:28:58 CEST 2017 device-mapper-event-libs-1.02.140-8.el7 BUILT: Wed Jun 28 20:28:58 CEST 2017 device-mapper-persistent-data-0.7.0-0.1.rc6.el7 BUILT: Mon Mar 27 17:15:46 CEST 2017 cmirror-2.02.171-8.el7 BUILT: Wed Jun 28 20:28:58 CEST 2017 sanlock-3.5.0-1.el7 BUILT: Wed Apr 26 16:37:30 CEST 2017 sanlock-lib-3.5.0-1.el7 BUILT: Wed Apr 26 16:37:30 CEST 2017 lvm2-lockd-2.02.171-8.el7 BUILT: Wed Jun 28 20:28:58 CEST 2017