Bug 1372101

Summary: raid6 partial activation fails when missing PV
Product: Red Hat Enterprise Linux 7 Reporter: Corey Marthaler <cmarthal>
Component: lvm2Assignee: Heinz Mauelshagen <heinzm>
lvm2 sub component: Mirroring and RAID QA Contact: cluster-qe <cluster-qe>
Status: CLOSED WONTFIX Docs Contact:
Severity: medium    
Priority: low CC: agk, heinzm, jbrassow, lmiksik, mcsontos, msnitzer, prajnoha, rbednar, zkabelac
Version: 7.3Flags: heinzm: needinfo+
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1809254 (view as bug list) Environment:
Last Closed: 2020-03-25 14:30:21 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 1546181, 1560739, 1729303, 1809254    
Attachments:
Description Flags
verbose vgchange attempt none

Description Corey Marthaler 2016-08-31 22:15:52 UTC
Description of problem:
This test case passes for all other raid types (0,0_meta,1,4,5,10) except raid6.
This test case had been turned off and just recently turned on due to 1348327.


host-117: pvcreate /dev/sdd2 /dev/sdd1 /dev/sde2 /dev/sde1 /dev/sdb2 /dev/sdb1 /dev/sdc2 /dev/sdc1 /dev/sda2 /dev/sda1
host-117: vgcreate  raid_sanity /dev/sdd2 /dev/sdd1 /dev/sde2 /dev/sde1 /dev/sdb2 /dev/sdb1 /dev/sdc2 /dev/sdc1 /dev/sda2 /dev/sda1

============================================================
Iteration 1 of 1 started at Wed Aug 31 15:25:15 CDT 2016
============================================================
SCENARIO (raid6) - [vgcfgrestore_raid_with_missing_pv]
Create a raid, force remove a leg, and then restore it's VG
host-117: lvcreate  --type raid6 -i 3 -n missing_pv_raid -L 100M raid_sanity
Deactivating missing_pv_raid raid
Backup the VG config
host-117 vgcfgbackup -f /tmp/raid_sanity.bkup.16614 raid_sanity
Force removing PV /dev/sdd2 (used in this raid)
host-117: 'pvremove -ff --yes /dev/sdd2'
  WARNING: PV /dev/sdd2 is used by VG raid_sanity
  WARNING: Wiping physical volume label from /dev/sdd2 of volume group "raid_sanity"
Verifying that this VG is now corrupt
  WARNING: Device for PV QTtTBH-FF1V-l4IA-FtGl-t5F9-Ahyu-dI8EIe not found or rejected by a filter.
  Failed to find physical volume "/dev/sdd2".
Attempt to restore the VG back to it's original state (should not segfault BZ 1348327)
host-117 vgcfgrestore -f /tmp/raid_sanity.bkup.16614 raid_sanity
  Couldn't find device with uuid QTtTBH-FF1V-l4IA-FtGl-t5F9-Ahyu-dI8EIe.
  Cannot restore Volume Group raid_sanity with 1 PVs marked as missing.
  Restore failed.
Checking syslog to see if vgcfgrestore segfaulted

Activating VG in partial readonly mode

[root@host-117 ~]# lvs -a -o +devices
  WARNING: Device for PV QTtTBH-FF1V-l4IA-FtGl-t5F9-Ahyu-dI8EIe not found or rejected by a filter.
  LV                         VG            Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert Devices                                                                                                                                    
  missing_pv_raid            raid_sanity   rwi---r-p- 108.00m                                                     missing_pv_raid_rimage_0(0),missing_pv_raid_rimage_1(0),missing_pv_raid_rimage_2(0),missing_pv_raid_rimage_3(0),missing_pv_raid_rimage_4(0)
  [missing_pv_raid_rimage_0] raid_sanity   Iwi---r-p-  36.00m                                                     [unknown](1)                                                                                                                               
  [missing_pv_raid_rimage_1] raid_sanity   Iwi---r---  36.00m                                                     /dev/sdd1(1)                                                                                                                               
  [missing_pv_raid_rimage_2] raid_sanity   Iwi---r---  36.00m                                                     /dev/sde2(1)                                                                                                                               
  [missing_pv_raid_rimage_3] raid_sanity   Iwi---r---  36.00m                                                     /dev/sde1(1)                                                                                                                               
  [missing_pv_raid_rimage_4] raid_sanity   Iwi---r---  36.00m                                                     /dev/sdb2(1)                                                                                                                               
  [missing_pv_raid_rmeta_0]  raid_sanity   ewi---r-p-   4.00m                                                     [unknown](0)                                                                                                                               
  [missing_pv_raid_rmeta_1]  raid_sanity   ewi---r---   4.00m                                                     /dev/sdd1(0)                                                                                                                               
  [missing_pv_raid_rmeta_2]  raid_sanity   ewi---r---   4.00m                                                     /dev/sde2(0)                                                                                                                               
  [missing_pv_raid_rmeta_3]  raid_sanity   ewi---r---   4.00m                                                     /dev/sde1(0)                                                                                                                               
  [missing_pv_raid_rmeta_4]  raid_sanity   ewi---r---   4.00m                                                     /dev/sdb2(0)
  0 logical volume(s) in volume group "raid_sanity" now active

Aug 31 15:27:35 host-117 kernel: device-mapper: raid: Failed to read superblock of device at position 0
Aug 31 15:27:35 host-117 kernel: md/raid:mdX: not clean -- starting background reconstruction
Aug 31 15:27:35 host-117 kernel: md/raid:mdX: device dm-7 operational as raid disk 1
Aug 31 15:27:35 host-117 kernel: md/raid:mdX: device dm-9 operational as raid disk 2
Aug 31 15:27:35 host-117 kernel: md/raid:mdX: device dm-11 operational as raid disk 3
Aug 31 15:27:35 host-117 kernel: md/raid:mdX: device dm-13 operational as raid disk 4
Aug 31 15:27:35 host-117 kernel: md/raid:mdX: allocated 5432kB
Aug 31 15:27:35 host-117 kernel: md/raid:mdX: cannot start dirty degraded array.
Aug 31 15:27:35 host-117 kernel: md/raid:mdX: failed to run raid set.
Aug 31 15:27:35 host-117 kernel: md: pers->run() failed ...
Aug 31 15:27:35 host-117 kernel: device-mapper: table: 253:14: raid: Failed to run raid array
Aug 31 15:27:35 host-117 kernel: device-mapper: ioctl: error adding target to table
Aug 31 15:27:35 host-117 multipathd: dm-14: remove map (uevent)
Aug 31 15:27:35 host-117 multipathd: dm-14: remove map (uevent)



Version-Release number of selected component (if applicable):
3.10.0-497.el7.x86_64

lvm2-2.02.164-4.el7    BUILT: Wed Aug 31 08:47:09 CDT 2016
lvm2-libs-2.02.164-4.el7    BUILT: Wed Aug 31 08:47:09 CDT 2016
lvm2-cluster-2.02.164-4.el7    BUILT: Wed Aug 31 08:47:09 CDT 2016
device-mapper-1.02.133-4.el7    BUILT: Wed Aug 31 08:47:09 CDT 2016
device-mapper-libs-1.02.133-4.el7    BUILT: Wed Aug 31 08:47:09 CDT 2016
device-mapper-event-1.02.133-4.el7    BUILT: Wed Aug 31 08:47:09 CDT 2016
device-mapper-event-libs-1.02.133-4.el7    BUILT: Wed Aug 31 08:47:09 CDT 2016
device-mapper-persistent-data-0.6.3-1.el7    BUILT: Fri Jul 22 05:29:13 CDT 2016
cmirror-2.02.164-4.el7    BUILT: Wed Aug 31 08:47:09 CDT 2016
sanlock-3.4.0-1.el7    BUILT: Fri Jun 10 11:41:03 CDT 2016
sanlock-lib-3.4.0-1.el7    BUILT: Fri Jun 10 11:41:03 CDT 2016
lvm2-lockd-2.02.164-4.el7    BUILT: Wed Aug 31 08:47:09 CDT 2016


How reproducible:
Everytime

Comment 2 Heinz Mauelshagen 2016-11-14 13:51:13 UTC
Corey,

do you find this to be the bare bone example?

# lvcreate --ty raid6 -i3 -nr -L64 ssd
# lvchange -an ssd/r
# pvremove -ff /dev/sda # 1st PV
# lvchange -ay ssd/r # -> Success!

Kernel:
3.10.0-513.el7.x86_64

uscpace:
LVM version:     2.02.166(2)-RHEL7 (2016-09-28)
Library version: 1.02.135-RHEL7 (2016-09-28)
Driver version:  4.34.0

Comment 3 Heinz Mauelshagen 2017-03-08 13:50:22 UTC
Tested with upstream (base for rhel7.4) -> ok.

[root@vm46 ~]# lvchange -an tb
[root@vm46 ~]# pvremove  -yff /dev/sda
  WARNING: PV /dev/sda is used by VG tb
  WARNING: Wiping physical volume label from /dev/sda of volume group "tb"
  Labels on physical volume "/dev/sda" successfully wiped.
[root@vm46 ~]# vgs
  WARNING: Device for PV VgZTIa-hTqB-p7G1-38V9-rY85-ZAzx-CAeCM1 not found or rejected by a filter.
  VG     #PV #LV #SN Attr   VSize   VFree  
  fedora   1   2   0 wz--n-  49.00g  30.01g
  t        2   1   0 wz--n- 512.00t 312.00t
  tb     257   1   0 wz-pn-  32.12t  32.12t
[root@vm46 ~]# vgrestore -f /etc/lvm/backup/tb tb
-bash: vgrestore: command not found
[root@vm46 ~]# vgcfgrestore -f /etc/lvm/backup/tb tb
  Couldn't find device with uuid VgZTIa-hTqB-p7G1-38V9-rY85-ZAzx-CAeCM1.
  Cannot restore Volume Group tb with 1 PVs marked as missing.
  Restore failed.
[root@vm46 ~]# lvchange -ay tb
  WARNING: Device for PV VgZTIa-hTqB-p7G1-38V9-rY85-ZAzx-CAeCM1 not found or rejected by a filter.
[root@vm46 ~]# lvs -aoname,attr,size,segtype,syncpercent,datastripes,stripesize,reshapelenle,datacopies,regionsize,devices tb|sed 's/  *$//'
  WARNING: Device for PV VgZTIa-hTqB-p7G1-38V9-rY85-ZAzx-CAeCM1 not found or rejected by a filter.
  LV           Attr       LSize   Type   Cpy%Sync #DStr Stripe RSize #Cpy Region Devices
  r            Rwi-a-r-p-   1.01g raid5  100.00       3 64.00k     0    2  8.00m r_rimage_0(0),r_rimage_1(0),r_rimage_2(0),r_rimage_3(0)
  [r_rimage_0] Iwi-aor-p- 344.00m linear              1     0      0    1     0  [unknown](1)
  [r_rimage_1] iwi-aor--- 344.00m linear              1     0      0    1     0  /dev/sdd(1)
  [r_rimage_2] iwi-aor--- 344.00m linear              1     0      0    1     0  /dev/sde(1)
  [r_rimage_3] iwi-aor--- 344.00m linear              1     0      0    1     0  /dev/sdf(1)
  [r_rmeta_0]  ewi-aor-p-   4.00m linear              1     0           1     0  [unknown](0)
  [r_rmeta_1]  ewi-aor---   4.00m linear              1     0           1     0  /dev/sdd(0)
  [r_rmeta_2]  ewi-aor---   4.00m linear              1     0           1     0  /dev/sde(0)
  [r_rmeta_3]  ewi-aor---   4.00m linear              1     0           1     0  /dev/sdf(0)
[root@vm46 ~]# fsck -fn /dev/tb/r
fsck from util-linux 2.28.2
e2fsck 1.43.3 (04-Sep-2016)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/dev/mapper/tb-r: 11/66096 files (0.0% non-contiguous), 12997/264192 blocks

Comment 5 Corey Marthaler 2017-04-26 20:50:58 UTC
This is still here. It appears the key to being able to reproduce this is the creation followed by *immediate* deactivation. That should then cause the partial activation to fail.


[root@host-076 ~]# lvcreate  --type raid6 -i 3 -n missing_pv_raid -L 100M raid_sanity && lvchange -an raid_sanity/missing_pv_raid                                                                                   
  Using default stripesize 64.00 KiB.                                                                                                                                                                               
  Rounding size 100.00 MiB (25 extents) up to stripe boundary size 108.00 MiB(27 extents).                                                                                                                          
  Logical volume "missing_pv_raid" created.                                                                                                                                                                         

[root@host-076 ~]# lvs -a -o +devices
  LV                         VG            Attr       LSize   Devices
  missing_pv_raid            raid_sanity   rwi---r--- 108.00m missing_pv_raid_rimage_0(0),missing_pv_raid_rimage_1(0),missing_pv_raid_rimage_2(0),missing_pv_raid_rimage_3(0),missing_pv_raid_rimage_4(0)
  [missing_pv_raid_rimage_0] raid_sanity   Iwi---r---  36.00m /dev/sdb1(1)
  [missing_pv_raid_rimage_1] raid_sanity   Iwi---r---  36.00m /dev/sdc1(1)
  [missing_pv_raid_rimage_2] raid_sanity   Iwi---r---  36.00m /dev/sdd1(1)
  [missing_pv_raid_rimage_3] raid_sanity   Iwi---r---  36.00m /dev/sde1(1)
  [missing_pv_raid_rimage_4] raid_sanity   Iwi---r---  36.00m /dev/sdf1(1)
  [missing_pv_raid_rmeta_0]  raid_sanity   ewi---r---   4.00m /dev/sdb1(0)
  [missing_pv_raid_rmeta_1]  raid_sanity   ewi---r---   4.00m /dev/sdc1(0)
  [missing_pv_raid_rmeta_2]  raid_sanity   ewi---r---   4.00m /dev/sdd1(0)
  [missing_pv_raid_rmeta_3]  raid_sanity   ewi---r---   4.00m /dev/sde1(0)
  [missing_pv_raid_rmeta_4]  raid_sanity   ewi---r---   4.00m /dev/sdf1(0)

[root@host-076 ~]#  pvremove -ff --yes /dev/sdb1
  WARNING: PV /dev/sdb1 is used by VG raid_sanity
  WARNING: Wiping physical volume label from /dev/sdb1 of volume group "raid_sanity"
  Labels on physical volume "/dev/sdb1" successfully wiped.

[root@host-076 ~]# vgchange -ay --partial raid_sanity
  PARTIAL MODE. Incomplete logical volumes will be processed.
  WARNING: Device for PV wr50kt-tOJh-xOgu-qj6v-JnYA-Q8ke-lUoDwh not found or rejected by a filter.
  device-mapper: reload ioctl on  (253:14) failed: Input/output error
  0 logical volume(s) in volume group "raid_sanity" now active


Apr 26 15:43:24 host-076 kernel: device-mapper: raid: Failed to read superblock of device at position 0
Apr 26 15:43:24 host-076 kernel: md/raid:mdX: not clean -- starting background reconstruction
Apr 26 15:43:24 host-076 kernel: md/raid:mdX: device dm-7 operational as raid disk 1
Apr 26 15:43:24 host-076 kernel: md/raid:mdX: device dm-9 operational as raid disk 2
Apr 26 15:43:24 host-076 kernel: md/raid:mdX: device dm-11 operational as raid disk 3
Apr 26 15:43:24 host-076 kernel: md/raid:mdX: device dm-13 operational as raid disk 4
Apr 26 15:43:24 host-076 kernel: md/raid:mdX: cannot start dirty degraded array.
Apr 26 15:43:24 host-076 kernel: md/raid:mdX: failed to run raid set.
Apr 26 15:43:24 host-076 kernel: md: pers->run() failed ...
Apr 26 15:43:24 host-076 kernel: device-mapper: table: 253:14: raid: Failed to run raid array
Apr 26 15:43:24 host-076 kernel: device-mapper: ioctl: error adding target to table
Apr 26 15:43:24 host-076 multipathd: dm-14: remove map (uevent)
Apr 26 15:43:24 host-076 multipathd: dm-14: remove map (uevent)



3.10.0-651.el7.x86_64

lvm2-2.02.170-2.el7    BUILT: Thu Apr 13 14:37:43 CDT 2017
lvm2-libs-2.02.170-2.el7    BUILT: Thu Apr 13 14:37:43 CDT 2017
lvm2-cluster-2.02.170-2.el7    BUILT: Thu Apr 13 14:37:43 CDT 2017
device-mapper-1.02.139-2.el7    BUILT: Thu Apr 13 14:37:43 CDT 2017
device-mapper-libs-1.02.139-2.el7    BUILT: Thu Apr 13 14:37:43 CDT 2017
device-mapper-event-1.02.139-2.el7    BUILT: Thu Apr 13 14:37:43 CDT 2017
device-mapper-event-libs-1.02.139-2.el7    BUILT: Thu Apr 13 14:37:43 CDT 2017
device-mapper-persistent-data-0.7.0-0.1.rc6.el7    BUILT: Mon Mar 27 10:15:46 CDT 2017

Comment 6 Heinz Mauelshagen 2017-05-02 12:53:13 UTC
Corey,

can I have the "vgchange -vvvv ..." output and the respective part of the kernel log, please?

This very sequence succeeds locally:

# lvcreate -y --ty raid6 -i3 -nr -L100 p
  Using default stripesize 64.00 KiB.
  Rounding size 100.00 MiB (25 extents) up to stripe boundary size 108.00 MiB(27 extents).
  Logical volume "r" created.

# lvs -aoname,size,segtype,stripes,stripesize,regionsize,datastripes,datacopies,copypercent,devices p
  LV           LSize   Type   #Str Stripe Region #DStr #Cpy Cpy%Sync Devices                                                              
  r            108.00m raid6     5 64.00k  2.00m     3    3 0.00     r_rimage_0(0),r_rimage_1(0),r_rimage_2(0),r_rimage_3(0),r_rimage_4(0)
  [r_rimage_0]  36.00m linear    1     0      0      1    1          /dev/sdb(1)                                                          
  [r_rimage_1]  36.00m linear    1     0      0      1    1          /dev/sdc(1)                                                          
  [r_rimage_2]  36.00m linear    1     0      0      1    1          /dev/sdd(1)                                                          
  [r_rimage_3]  36.00m linear    1     0      0      1    1          /dev/sde(1)                                                          
  [r_rimage_4]  36.00m linear    1     0      0      1    1          /dev/sdf(1)                                                          
  [r_rmeta_0]    4.00m linear    1     0      0      1    1          /dev/sdb(0)                                                          
  [r_rmeta_1]    4.00m linear    1     0      0      1    1          /dev/sdc(0)                                                          
  [r_rmeta_2]    4.00m linear    1     0      0      1    1          /dev/sdd(0)                                                          
  [r_rmeta_3]    4.00m linear    1     0      0      1    1          /dev/sde(0)                                                          
  [r_rmeta_4]    4.00m linear    1     0      0      1    1          /dev/sdf(0)

# pvremove -yff /dev/sdb
  Can't open /dev/sdb exclusively.  Mounted filesystem?

# lvchange -an p/r

# pvremove -yff /dev/sdb
  WARNING: PV /dev/sdb is used by VG p
  WARNING: Wiping physical volume label from /dev/sdb of volume group "p".
  Labels on physical volume "/dev/sdb" successfully wiped.

# vgchange -ay p
  WARNING: Device for PV ozeuGs-q4xK-aSFW-HZ2j-8x8D-xcZk-v1PUd8 not found or rejected by a filter.
  1 logical volume(s) in volume group "p" now active

# lvs -aoname,size,segtype,stripes,stripesize,regionsize,datastripes,datacopies,copypercent,devices p
  WARNING: Device for PV ozeuGs-q4xK-aSFW-HZ2j-8x8D-xcZk-v1PUd8 not found or rejected by a filter.
  LV           LSize   Type   #Str Stripe Region #DStr #Cpy Cpy%Sync Devices                                                              
  r            108.00m raid6     5 64.00k  2.00m     3    3 100.00   r_rimage_0(0),r_rimage_1(0),r_rimage_2(0),r_rimage_3(0),r_rimage_4(0)
  [r_rimage_0]  36.00m linear    1     0      0      1    1          [unknown](1)                                                         
  [r_rimage_1]  36.00m linear    1     0      0      1    1          /dev/sdc(1)                                                          
  [r_rimage_2]  36.00m linear    1     0      0      1    1          /dev/sdd(1)                                                          
  [r_rimage_3]  36.00m linear    1     0      0      1    1          /dev/sde(1)                                                          
  [r_rimage_4]  36.00m linear    1     0      0      1    1          /dev/sdf(1)                                                          
  [r_rmeta_0]    4.00m linear    1     0      0      1    1          [unknown](0)                                                         
  [r_rmeta_1]    4.00m linear    1     0      0      1    1          /dev/sdc(0)                                                          
  [r_rmeta_2]    4.00m linear    1     0      0      1    1          /dev/sdd(0)                                                          
  [r_rmeta_3]    4.00m linear    1     0      0      1    1          /dev/sde(0)                                                          
  [r_rmeta_4]    4.00m linear    1     0      0      1    1          /dev/sdf(0)

Comment 7 Corey Marthaler 2017-05-02 14:40:43 UTC
Created attachment 1275709 [details]
verbose vgchange attempt

Comment 8 Corey Marthaler 2017-05-02 14:42:47 UTC
kernel output from above attachment in comment #7

May  2 09:37:53 host-073 qarshd[24343]: Running cmdline: vgchange -vvvv -ay --partial raid_sanity
May  2 09:37:53 host-073 kernel: device-mapper: raid: Failed to read superblock of device at position 0
May  2 09:37:53 host-073 kernel: md/raid:mdX: not clean -- starting background reconstruction
May  2 09:37:53 host-073 kernel: md/raid:mdX: device dm-7 operational as raid disk 1
May  2 09:37:53 host-073 kernel: md/raid:mdX: device dm-9 operational as raid disk 2
May  2 09:37:53 host-073 kernel: md/raid:mdX: device dm-11 operational as raid disk 3
May  2 09:37:53 host-073 kernel: md/raid:mdX: device dm-13 operational as raid disk 4
May  2 09:37:53 host-073 kernel: md/raid:mdX: cannot start dirty degraded array.
May  2 09:37:53 host-073 kernel: md/raid:mdX: failed to run raid set.
May  2 09:37:53 host-073 kernel: md: pers->run() failed ...
May  2 09:37:53 host-073 kernel: device-mapper: table: 253:14: raid: Failed to run raid array
May  2 09:37:53 host-073 kernel: device-mapper: ioctl: error adding target to table


3.10.0-660.el7.x86_64

lvm2-2.02.170-2.el7    BUILT: Thu Apr 13 14:37:43 CDT 2017
lvm2-libs-2.02.170-2.el7    BUILT: Thu Apr 13 14:37:43 CDT 2017
lvm2-cluster-2.02.170-2.el7    BUILT: Thu Apr 13 14:37:43 CDT 2017
device-mapper-1.02.139-2.el7    BUILT: Thu Apr 13 14:37:43 CDT 2017
device-mapper-libs-1.02.139-2.el7    BUILT: Thu Apr 13 14:37:43 CDT 2017
device-mapper-event-1.02.139-2.el7    BUILT: Thu Apr 13 14:37:43 CDT 2017
device-mapper-event-libs-1.02.139-2.el7    BUILT: Thu Apr 13 14:37:43 CDT 2017
device-mapper-persistent-data-0.7.0-0.1.rc6.el7    BUILT: Mon Mar 27 10:15:46 CDT 2017

Comment 9 Heinz Mauelshagen 2017-05-08 15:17:54 UTC
Immediate deactivation after creation using this sequence succeeded:

lvcreate -y --ty rai6_ra_6 -i3 -L100 -nr p; lvchange -an p; pvremove -yff /dev/sda; lvchange -ay p/r ->

# lvs -aoname,size,attr,segtype,stripes,stripesize,regionsize,datastripes,datacopies,copypercent,devices p
  WARNING: Device for PV 9d2zjV-uNfI-oCni-Is39-LmH8-oEjq-0R5uoo not found or rejected by a filter.
  LV           LSize   Attr       Type       #Str Stripe Region #DStr #Cpy Cpy%Sync Devices                                                              
  r            108.00m rwi-a-r-p- raid6_ra_6    5 64.00k  2.00m     3    3 100.00   r_rimage_0(0),r_rimage_1(0),r_rimage_2(0),r_rimage_3(0),r_rimage_4(0)
  [r_rimage_0]  36.00m Iwi-aor-p- linear        1     0      0      1    1          [unknown](1)                                                         
  [r_rimage_1]  36.00m iwi-aor--- linear        1     0      0      1    1          /dev/sdb(1)                                                          
  [r_rimage_2]  36.00m iwi-aor--- linear        1     0      0      1    1          /dev/sdc(1)                                                          
  [r_rimage_3]  36.00m iwi-aor--- linear        1     0      0      1    1          /dev/sdd(1)                                                          
  [r_rimage_4]  36.00m iwi-aor--- linear        1     0      0      1    1          /dev/sde(1)                                                          
  [r_rmeta_0]    4.00m ewi-aor-p- linear        1     0      0      1    1          [unknown](0)                                                         
  [r_rmeta_1]    4.00m ewi-aor--- linear        1     0      0      1    1          /dev/sdb(0)                                                          
  [r_rmeta_2]    4.00m ewi-aor--- linear        1     0      0      1    1          /dev/sdc(0)                                                          
  [r_rmeta_3]    4.00m ewi-aor--- linear        1     0      0      1    1          /dev/sdd(0)                                                          
  [r_rmeta_4]    4.00m ewi-aor--- linear        1     0      0      1    1          /dev/sde(0)

Comment 10 Heinz Mauelshagen 2017-05-09 15:44:59 UTC
RHEL7.3 runtime + upstream kernel -> works.

The flaw's that the superblocks aren't being written fast enough before directly deactivating the raid device. I.e. waiting a (few) second(s) before deactivating the RAID LV after creating it anew later succeeds activation in my testing.

I don't see any qualifying difference in kernel DM relative to the failure so I assume a missing MD backport in the RHEL kernel for now reasoning this flaw.

FWIW:
creating a RAID LV aiming at not using it immediately is a niche case.
Typically data will be first stored on it after creation thus leading to much later deactivation.
Nonetheless, not being able to activate and use such RAID(6) LV can be worked around by deleting and recreating it.

Comment 25 Jonathan Earl Brassow 2017-10-30 13:59:47 UTC
moving back to assigned since corey repo'ed this bug with newer version of kernel than the one referenced with the commit ID fixing the issue.

Comment 29 Heinz Mauelshagen 2017-12-01 00:07:35 UTC
Patches submitted upstream https://www.redhat.com/archives/dm-devel/2017-December/msg00000.html

Comment 42 Corey Marthaler 2020-01-08 17:11:06 UTC
FWIW: continued to hit this with final 7.8 regression testing.

3.10.0-1121.el7.x86_64

lvm2-2.02.186-4.el7    BUILT: Wed Nov 27 04:05:17 CST 2019
lvm2-libs-2.02.186-4.el7    BUILT: Wed Nov 27 04:05:17 CST 2019
device-mapper-1.02.164-4.el7    BUILT: Wed Nov 27 04:05:17 CST 2019
device-mapper-libs-1.02.164-4.el7    BUILT: Wed Nov 27 04:05:17 CST 2019
device-mapper-event-1.02.164-4.el7    BUILT: Wed Nov 27 04:05:17 CST 2019
device-mapper-event-libs-1.02.164-4.el7    BUILT: Wed Nov 27 04:05:17 CST 2019


[vgcfgrestore_raid_with_missing_pv] host-083: vgcreate   raid_sanity /dev/mapper/cPV16 /dev/mapper/cPV15 /dev/mapper/cPV14 /dev/mapper/cPV13 /dev/mapper/cPV12 /dev/mapper/cPV11 /dev/mapper/cPV10 /dev/mapper/cPV9 /dev/mapper/cPV8 /dev/mapper/cPV7
[vgcfgrestore_raid_with_missing_pv] 
[vgcfgrestore_raid_with_missing_pv] ============================================================
[vgcfgrestore_raid_with_missing_pv] Iteration 1 of 1 started at Tue Jan  7 19:18:32 CST 2020
[vgcfgrestore_raid_with_missing_pv] ============================================================
[vgcfgrestore_raid_with_missing_pv] SCENARIO (raid6) - [vgcfgrestore_raid_with_missing_pv]
[vgcfgrestore_raid_with_missing_pv] Create a raid, force remove a leg, and then restore its VG
[vgcfgrestore_raid_with_missing_pv] host-083: lvcreate  --type raid6_zr -i 3 -n missing_pv_raid -L 100M raid_sanity
[vgcfgrestore_raid_with_missing_pv] HACK: adding a sleep here with raid6 to avoid bug 1372101
[vgcfgrestore_raid_with_missing_pv] Deactivating missing_pv_raid raid
[vgcfgrestore_raid_with_missing_pv] Backup the VG config
[vgcfgrestore_raid_with_missing_pv] host-083 vgcfgbackup -f /tmp/raid_sanity.bkup.25676 raid_sanity
[vgcfgrestore_raid_with_missing_pv] Force removing PV /dev/mapper/cPV16 (used in this raid)
[vgcfgrestore_raid_with_missing_pv] host-083: 'pvremove -ff --yes /dev/mapper/cPV16'
[vgcfgrestore_raid_with_missing_pv]   WARNING: PV /dev/mapper/cPV16 is used by VG raid_sanity.
[vgcfgrestore_raid_with_missing_pv]   WARNING: Wiping physical volume label from /dev/mapper/cPV16 of volume group "raid_sanity".
[vgcfgrestore_raid_with_missing_pv] Verifying that this VG is now corrupt
[vgcfgrestore_raid_with_missing_pv]   WARNING: Device for PV XRtB9k-99q3-kVZh-TYRr-jT46-3es2-oRp4kU not found or rejected by a filter.
[vgcfgrestore_raid_with_missing_pv]   Couldn't find device with uuid XRtB9k-99q3-kVZh-TYRr-jT46-3es2-oRp4kU.
[vgcfgrestore_raid_with_missing_pv]   Failed to find physical volume "/dev/mapper/cPV16".
[vgcfgrestore_raid_with_missing_pv] Attempt to restore the VG back to its original state (should not segfault BZ 1348327)
[vgcfgrestore_raid_with_missing_pv] host-083 vgcfgrestore --yes -f /tmp/raid_sanity.bkup.25676 raid_sanity
[vgcfgrestore_raid_with_missing_pv]   Couldn't find device with uuid XRtB9k-99q3-kVZh-TYRr-jT46-3es2-oRp4kU.
[vgcfgrestore_raid_with_missing_pv]   Cannot restore Volume Group raid_sanity with 1 PVs marked as missing.
[vgcfgrestore_raid_with_missing_pv]   Restore failed.
[vgcfgrestore_raid_with_missing_pv] Checking syslog to see if vgcfgrestore segfaulted
[vgcfgrestore_raid_with_missing_pv] Activating VG in partial readonly mode
[vgcfgrestore_raid_with_missing_pv] host-083 vgchange -ay --partial raid_sanity
[vgcfgrestore_raid_with_missing_pv]   PARTIAL MODE. Incomplete logical volumes will be processed.
[vgcfgrestore_raid_with_missing_pv]   WARNING: Device for PV XRtB9k-99q3-kVZh-TYRr-jT46-3es2-oRp4kU not found or rejected by a filter.
[vgcfgrestore_raid_with_missing_pv]   Couldn't find device with uuid XRtB9k-99q3-kVZh-TYRr-jT46-3es2-oRp4kU.
[vgcfgrestore_raid_with_missing_pv]   device-mapper: reload ioctl on  (253:34) failed: Input/output error



Jan  7 19:18:41 host-083 qarshd[13107]: Running cmdline: vgchange -ay --partial raid_sanity
Jan  7 19:18:41 host-083 kernel: device-mapper: raid: Failed to read superblock of device at position 0
Jan  7 19:18:41 host-083 kernel: md/raid:mdX: not clean -- starting background reconstruction
Jan  7 19:18:41 host-083 kernel: md/raid:mdX: device dm-27 operational as raid disk 1
Jan  7 19:18:41 host-083 kernel: md/raid:mdX: device dm-29 operational as raid disk 2
Jan  7 19:18:41 host-083 kernel: md/raid:mdX: device dm-31 operational as raid disk 3
Jan  7 19:18:41 host-083 kernel: md/raid:mdX: device dm-33 operational as raid disk 4
Jan  7 19:18:41 host-083 kernel: md/raid:mdX: cannot start dirty degraded array.
Jan  7 19:18:41 host-083 kernel: md/raid:mdX: failed to run raid set.
Jan  7 19:18:41 host-083 kernel: md: pers->run() failed ...
Jan  7 19:18:41 host-083 kernel: device-mapper: table: 253:34: raid: Failed to run raid array
Jan  7 19:18:41 host-083 kernel: device-mapper: ioctl: error adding target to table
Jan  7 19:18:41 host-083 multipathd: dm-34: remove map (uevent)
Jan  7 19:18:41 host-083 multipathd: dm-34: remove map (uevent)