Bug 1656424

Summary: LVM2 after last update (EL7.6) used wrong device when activating VG on boot
Product: Red Hat Enterprise Linux 7 Reporter: Milan Kerslager <milan.kerslager>
Component: lvm2Assignee: LVM and device-mapper development team <lvm-team>
lvm2 sub component: Default / Unclassified QA Contact: cluster-qe <cluster-qe>
Status: CLOSED ERRATA Docs Contact:
Severity: unspecified    
Priority: high CC: agk, cmarthal, gordon.messmer, heinzm, jbrassow, mcsontos, milan.kerslager, msnitzer, pasik, prajnoha, rbednar, rhandlin, teigland, zkabelac
Version: 7.6Keywords: ZStream
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: lvm2-2.02.184-1.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1657640 (view as bug list) Environment:
Last Closed: 2019-08-06 13:10:41 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1657640    
Attachments:
Description Flags
Wrong VG Opteron
none
mdadm --detail --scan -vvv none

Description Milan Kerslager 2018-12-05 13:49:04 UTC
Description of problem:
After latest update to 7.6 after reboot, my LVM did not work. I saw that LVM picked wrong device for one VG when activating it. I had to deactivate VG, modify lvm.conf (filter added) and reactivate VG. Then I rebuilt initramfs (not sure if it is needed).

Version-Release number of selected component (if applicable):
lvm2-2.02.180-10.el7_6.2.x86_64

How reproducible:
I made /dev/md2 by hand (RAID6) and VG too. After update to 7.6 (CentOS 7.6), one VG did not work and the machine end up in emergency shell. The LVM picked up /dev/sdb1 instead of /dev/md2 so it did not work.

How to fix:

vgdisplay -v            # wrong device in "Physical volumes" section
vgchange -a n Opteron   # deactivate VG

add next line to /etc/lvm/lvm.conf (there were none, just commented-out examples) to use only /dev/md* devices and ignore all others (modify according to your setup):

    filter =  [ "a/md/", "r/.*/" ]

vgchange -a y Opteron   # activate VG
vgdisplay -v            # check VG setup

Expected results:
LVM2 should not change behaviour how to build VG, how to pick up devices. LVM should be aware of picking up non-working device.

I'm not sure if the filter should be in /etc/lvm/lvmlocal.conf instead.

Comment 2 Milan Kerslager 2018-12-05 16:06:03 UTC
Created attachment 1511777 [details]
Wrong VG Opteron

VG Opteron picked up /dev/sda1 instead of /dev/md2 after reboot.

Comment 5 Marian Csontos 2018-12-05 18:05:31 UTC
Could you please confirm the md is version 0.9?

mdadm --detail --scan -vvv

Comment 6 Milan Kerslager 2018-12-05 19:55:11 UTC
Created attachment 1511893 [details]
mdadm --detail --scan -vvv

The md2 array has metadata v1.0
It is an old array that has been transformed RAID1->RAID5->RAID6 in the past.

Comment 7 Roman Bednář 2018-12-06 15:16:50 UTC
It seems like simple mixing v0.9 and v1.0 md metadata does not reproduce the bug. 

Neither did creating two md raid6 devices with v0.9 and v1.0 metadata and upgrading from 7.5 to 7.6 with reboot. 

Any chance of providing a reliable reproducer here?


===================

# mdadm -v  --detail --scan
ARRAY /dev/md0 level=raid6 num-devices=4 metadata=0.90 UUID=35d3e125:eff7bc90:65c288b3:2c710f73
   devices=/dev/sdb,/dev/sdc,/dev/sdd,/dev/sde
ARRAY /dev/md1 level=raid6 num-devices=4 metadata=1.0 name=1 UUID=43111dee:a292d73d:cb03a2dd:3ca6a979
   devices=/dev/sdf,/dev/sdg,/dev/sdh,/dev/sdi

# pvs
  PV         VG            Fmt  Attr PSize  PFree
  /dev/md0   vg            lvm2 a--  <9.99g <9.99g
  /dev/md1   vg            lvm2 a--  <9.99g <9.99g
  /dev/sdj   vg            lvm2 a--   4.99g  4.99g
  /dev/sdk   vg            lvm2 a--   4.99g  4.99g
  /dev/sdl   vg            lvm2 a--   4.99g  4.99g
  /dev/sdm   vg            lvm2 a--   4.99g  4.99g
  /dev/sdn   vg            lvm2 a--   4.99g  4.99g
  /dev/sdo   vg            lvm2 a--   4.99g  4.99g
  /dev/vda2  rhel_host-085 lvm2 a--  <7.00g  1.40g

# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 7.5 (Maipo)

# yum update -y

###Update from lvm2-2.02.177-4.el7 to lvm2-2.02.180-10.el7

# reboot 

# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 7.6 (Maipo)

# vgchange -an vg
  0 logical volume(s) in volume group "vg" now active

# vgchange -ay vg
  1 logical volume(s) in volume group "vg" now active

# vgs vg
  VG #PV #LV #SN Attr   VSize   VFree
  vg   8   1   0 wz--n- <49.93g <48.93g

Comment 8 Gordon Messmer 2018-12-06 22:45:49 UTC
I was hit by this bug as well.  I believe the bug is that lvm2 no longer excludes devices with md metadata 0.90 when scanning for PVs.  In order to reproduce the problem, a block device must have both md metadata 0.90 and LVM PV metadata.

This is easiest to reproduce with a RAID1 volume, where both component devices will have the metadata for both md 0.90 and LV PV.

On a system with a RAID1 device and md metadata 1.2, running "pvs" with verbose=7 set in lvm.conf will include this output:

#device/dev-io.c:609           Opened /dev/sda3 RO O_DIRECT
#device/dev-io.c:359         /dev/sda3: size is 1951133696 sectors
#device/dev-io.c:658           Closed /dev/sda3
#filters/filter-mpath.c:196           /dev/sda3: Device is a partition, using primary device sda for mpath component detection
#device/dev-io.c:336         /dev/sda3: using cached size 1951133696 sectors
#device/dev-md.c:163           Found md magic number at offset 4096 of /dev/sda3.
#filters/filter-md.c:108           /dev/sda3: Skipping md component device 

Here, we can see that lvm2 finds the md magic number and skips examining the device for PV metadata.

On a system with a RAID1 device and md metadata 0.90, running "pvs" with verbose=7 includes this output instead:

#device/dev-io.c:609           Opened /dev/sda3 RO O_DIRECT
#device/dev-io.c:359         /dev/sda3: size is 5858142208 sectors
#device/dev-io.c:658           Closed /dev/sda3
#filters/filter-mpath.c:196           /dev/sda3: Device is a partition, using primary device sda for mpath component detection
#filters/filter-partitioned.c:30            filter partitioned deferred /dev/sda3
#filters/filter-md.c:99            filter md deferred /dev/sda3
#filters/filter-persistent.c:346           filter caching good /dev/sda3

Comment 12 Marian Csontos 2018-12-14 08:45:25 UTC
(In reply to David Teigland from comment #4)
> Looking at the 2018-06-01-stable branch, these three commits are all related
> to improving id of md componenents:
> 
> scan: md metadata version 0.90 is at the end of disk
> https://sourceware.org/git/?p=lvm2.git;a=commit;
> h=0e42ebd6d4012d210084a9ccf8d76f853726de3c
> 
> pvscan lvmetad: use full md filter when md 1.0 devices are present
> https://sourceware.org/git/?p=lvm2.git;a=commit;
> h=a01e1fec0fe7c2fa61577c0e636e907cde7279ea
> 
> pvscan lvmetad: use udev info to improve md component detection
> https://sourceware.org/git/?p=lvm2.git;a=commit;
> h=a188b1e513ed5ca0f5f3702c823490f5610d4495

David, this last patch requires c527a0cb, which is broader in scope. IIUC it is just an optimization, and is not needed to fix the issue, right?

Comment 15 Roman Bednář 2019-07-03 08:12:40 UTC
Reliable reproducer was not discovered for this bug. Marking verified (SanityOnly).

Comment 17 errata-xmlrpc 2019-08-06 13:10:41 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2253