Bug 1676612

Summary: lvm tools expect access to udev, even when disabled in the configuration
Product: Red Hat Enterprise Linux 7 Reporter: Niels de Vos <ndevos>
Component: lvm2Assignee: Peter Rajnoha <prajnoha>
lvm2 sub component: Configuration files QA Contact: cluster-qe <cluster-qe>
Status: CLOSED ERRATA Docs Contact:
Severity: urgent    
Priority: urgent CC: agk, bkunal, cmarthal, dyocum, hchiramm, heinzm, jbrassow, mcsontos, mrobson, msnitzer, ndevos, pasik, pdwyer, prajnoha, puebele, rbednar, rcyriac, rhandlin, zkabelac
Version: 7.6Keywords: Regression, ZStream
Target Milestone: pre-dev-freeze   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: lvm2-2.02.184-1.el7 Doc Type: Bug Fix
Doc Text:
Missing conditional when querying udev database. When the database is not present. e.g. in cluster this resulted in a timeout and lvm commands took a long time. Fix ensures the udev access respects obtain_device_list_from_udev settings.
Story Points: ---
Clone Of:
: 1684133 1688316 (view as bug list) Environment:
Last Closed: 2019-08-06 13:10:44 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 1674485, 1676466, 1684133, 1688316    

Description Niels de Vos 2019-02-12 16:48:01 UTC
Description of problem:
When running lvm commands (pvscan and the like) in a container, progress is really slow and can take many hours (depending on the number of devices and possibly other factors).

While running 'pvs' messages like these are printed:

  WARNING: Device /dev/xvda2 not initialized in udev database even after waiting 10000000 microseconds.
  WARNING: Device /dev/xvdb1 not initialized in udev database even after waiting 10000000 microseconds.
  WARNING: Device /dev/xvdc not initialized in udev database even after waiting 10000000 microseconds.
  WARNING: Device /dev/xvdd not initialized in udev database even after waiting 10000000 microseconds.
  WARNING: Device /dev/xvdf not initialized in udev database even after waiting 10000000 microseconds.


Because this is running in a container, there is no udev available (it will be running on the host instead). In /etc/lvm/lvm.conf many options related to udev have been disabled.

More details with ltrace output can be found at https://bugzilla.redhat.com/show_bug.cgi?id=1674485#c8

Version-Release number of selected component (if applicable):
lvm2-2.02.180-10.el7_6.3.x86_64

How reproducible:
100% but depending on the (virtual) hardware used, fails on AWS and Hyper-V

Steps to Reproduce:
1. deploy OpenShift Container Storage (OCS) 3.11.1
2. rsh/exec into a glusterfs container, run some lvm commands

Actual results:
Started glusterfs-server containers never become Ready because initialization of the devices (pvscan is called during start) never finishes.

Expected results:
The glusterfs-server container should become Ready in a few minutes after starting.

Additional info:
Downgrading to lvm2-2.02.180-10.el7_6.2.x86_64 resolves the issue.

Future versions of the rhgs-server-container will disable obtain_device_list_from_udev in lvm.conf as well. This is currently not sufficient as it does not completely disables udev interactions. That needs 
https://sourceware.org/git/?p=lvm2.git;a=commitdiff;h=3ebce8dbd2d9afc031e0737f8feed796ec7a8df9;hp=d19e3727951853093828b072e254e447f7d61c60

Comment 3 Peter Rajnoha 2019-02-13 11:58:23 UTC
(In reply to Niels de Vos from comment #0)
> Future versions of the rhgs-server-container will disable
> obtain_device_list_from_udev in lvm.conf as well. This is currently not
> sufficient as it does not completely disables udev interactions. That needs 
> https://sourceware.org/git/?p=lvm2.git;a=commitdiff;
> h=3ebce8dbd2d9afc031e0737f8feed796ec7a8df9;
> hp=d19e3727951853093828b072e254e447f7d61c60

Yes, this recent patch is currently missing in a build. However, it's also worth noting that there was a reason why we started reading udev database for information about MD and mpath devices - this was because, under some circumstances and due to some recent changes in how LVM does scanning and device filtering, LVM was unable to correctly detect some MD components. The problematic scenario is when:

  - MD device on top of MD components is not activated yet

  - MD metadata are placed at the end of the disk (current LVM disk-reading code caches/considers only start of the disk)

Similar for mpath which doesn't use any signatures on disk at all and if the mpath device on top of mpath components is not set up yet, LVM doesn't filter these components.

Usign udev info here means we're using info which is exported directly by MD and mpath tools based on their configuration and their knowledge. So without using udev in LVM, we're opening a possibility for these kinds of problems to appear.

Comment 4 Yaniv Kaul 2019-02-14 06:59:31 UTC
Peter, thanks for the analysis. What's the next step here? This has hit OCS 3.11.1 hard on AWS deployments. We need a fix for it. For the time being, we reverted to older lvm2 version.

Comment 5 Humble Chirammal 2019-02-14 09:02:29 UTC
(In reply to Peter Rajnoha from comment #3)
> (In reply to Niels de Vos from comment #0)
> > Future versions of the rhgs-server-container will disable
> > obtain_device_list_from_udev in lvm.conf as well. This is currently not
> > sufficient as it does not completely disables udev interactions. That needs 
> > https://sourceware.org/git/?p=lvm2.git;a=commitdiff;
> > h=3ebce8dbd2d9afc031e0737f8feed796ec7a8df9;
> > hp=d19e3727951853093828b072e254e447f7d61c60
> 
> Yes, this recent patch is currently missing in a build. However, it's also
> worth noting that there was a reason why we started reading udev database
> for information about MD and mpath devices - this was because, under some
> circumstances and due to some recent changes in how LVM does scanning and
> device filtering, LVM was unable to correctly detect some MD components. The
> problematic scenario is when:
> 
>   - MD device on top of MD components is not activated yet
> 
>   - MD metadata are placed at the end of the disk (current LVM disk-reading
> code caches/considers only start of the disk)
> 
> Similar for mpath which doesn't use any signatures on disk at all and if the
> mpath device on top of mpath components is not set up yet, LVM doesn't
> filter these components.
> 
> Usign udev info here means we're using info which is exported directly by MD
> and mpath tools based on their configuration and their knowledge. So without
> using udev in LVM, we're opening a possibility for these kinds of problems
> to appear.

Thanks for the analysis, One question I have here is:

How do we make sure we are hitting this issue being in one of the conditions mentioned above ?

that said, from problem description we can see that , the error/warning comes for the guest kernel devices which are typical Xen based block devices:

--snip--

WARNING: Device /dev/xvda2 not initialized in udev database even after waiting 10000000 microseconds.
WARNING: Device /dev/xvdb1 not initialized in udev database even after waiting 10000000 microseconds.

--/snip--

In another setup: we can see the errors are present on "scsi" devices" as well.

[root@dhcp46-115 /]# time pvs
  WARNING: Device /dev/rhel_dhcp47-42/root not initialized in udev database even after waiting 10000000 microseconds.
  WARNING: Device /dev/sda1 not initialized in udev database even after waiting 10000000 microseconds.
  WARNING: Device /dev/sda2 not initialized in udev database even after waiting 10000000 microseconds.
  WARNING: Device /dev/sda3 not initialized in udev database even after waiting 10000000 microseconds.
  WARNING: Device /dev/vg_0f0b820a8ce7358d933692c82db220fe/brick_539df0f01872cc2d2250f8344de1d1e3 not initialized in udev database even after waiting 10000000 microseconds.
  WARNING: Device /dev/rhel_dhcp47-42/home not initialized in udev database even after waiting 10000000 microseconds.
  WARNING: Device /dev/mapper/docker-8:17-11610-a15d2327feccd20bfbdec749b9ae6a8f6e3e8c84ec6317057fb0d6e470c8f743 not initialized in udev database even after waiting 10000000 microseconds.
  WARNING: Device /dev/mapper/docker-8:17-11610-0db4c3fb7c2a280f4e2e7384a7108630c4c71f13b49a00a8d8323cbf47d654fb not initialized in udev database even after waiting 10000000 microseconds.

--/snip--

iiuc, by "MD" you are referring RAID devices . Please correct me if I am wrong. 

Are you saying we will run into this issue regardless of the device type ( SCSI or Xen Disk) detected in guest kernel and it ONLY depends on whats hosting this chunk of space in backend array? 

How can we verify it we are indeed hitting any of the scenario mentioned in comment#2 ?

Additionally, if I got your comment, we should **not** disable 'obtain_device_list_from_udev' in lvm.conf , if we do there are more chances for failure of device detection. Isnt it ?

Comment 6 Peter Rajnoha 2019-02-14 11:17:35 UTC
(In reply to Yaniv Kaul from comment #4)
> Peter, thanks for the analysis. What's the next step here? This has hit OCS
> 3.11.1 hard on AWS deployments. We need a fix for it. For the time being, we
> reverted to older lvm2 version.

For now, the fix here is for LVM to not try to get any records from udev if obtain_device_list_from_udev=0 set in lvm.conf (we need to add that new patch to a new build: https://sourceware.org/git/?p=lvm2.git;a=commitdiff;h=3ebce8dbd2d9afc031e0737f8feed796ec7a8df9;hp=d19e3727951853093828b072e254e447f7d61c60).

Comment 7 Peter Rajnoha 2019-02-14 12:05:40 UTC
(In reply to Humble Chirammal from comment #5)
> (In reply to Peter Rajnoha from comment #3)
> > (In reply to Niels de Vos from comment #0)
> > > Future versions of the rhgs-server-container will disable
> > > obtain_device_list_from_udev in lvm.conf as well. This is currently not
> > > sufficient as it does not completely disables udev interactions. That needs 
> > > https://sourceware.org/git/?p=lvm2.git;a=commitdiff;
> > > h=3ebce8dbd2d9afc031e0737f8feed796ec7a8df9;
> > > hp=d19e3727951853093828b072e254e447f7d61c60
> > 
> > Yes, this recent patch is currently missing in a build. However, it's also
> > worth noting that there was a reason why we started reading udev database
> > for information about MD and mpath devices - this was because, under some
> > circumstances and due to some recent changes in how LVM does scanning and
> > device filtering, LVM was unable to correctly detect some MD components. The
> > problematic scenario is when:
> > 
> >   - MD device on top of MD components is not activated yet
> > 
> >   - MD metadata are placed at the end of the disk (current LVM disk-reading
> > code caches/considers only start of the disk)
> > 
> > Similar for mpath which doesn't use any signatures on disk at all and if the
> > mpath device on top of mpath components is not set up yet, LVM doesn't
> > filter these components.
> > 
> > Usign udev info here means we're using info which is exported directly by MD
> > and mpath tools based on their configuration and their knowledge. So without
> > using udev in LVM, we're opening a possibility for these kinds of problems
> > to appear.
> 
> Thanks for the analysis, One question I have here is:
> 
> How do we make sure we are hitting this issue being in one of the conditions
> mentioned above ?
> 
> that said, from problem description we can see that , the error/warning
> comes for the guest kernel devices which are typical Xen based block devices:
> 
> --snip--
> 
> WARNING: Device /dev/xvda2 not initialized in udev database even after
> waiting 10000000 microseconds.
> WARNING: Device /dev/xvdb1 not initialized in udev database even after
> waiting 10000000 microseconds.
> 

OK, I'll try to explain where this comes from, but I also need to explain the context a little bit...

The facts:

  1) there's a new internal cache for reading disks in LVM (added sometime 1/2 year ago), this cache caches data from start of disks (this was done to speedup LVM device scanning)

  2) this works fine mostly, but there are some cases where you need to also read the end of the disk (older versions of MD metadata write their headers at the end of the disk instead of its start - MD metadata version 0.9 and 1.0)

  3) when you execute an LVM command, LVM internally filters out improper devices ("improper" here means "they can't be PVs" for various reasons, e.g. MD component, mpath component, too small disk to hold a PV...)

  4) these internal LVM filters use the new internal LVM caching mechanism now

  5) since the new caching mechanism caches only starts of disks, we can't reliably filter out MD component by looking at the start of the disk (the cached start), but we also need to check its end whether we don't have MD component with metadata version 0.9 or 1.0 in which case this must not be used as a PV and must be filtered out

  6) since this was just a few cases where we need to check the end of disk, instead of adding complexity to the new LVM caching mechanism, we decided to read the already existing information about MD components directly from udev database (there, it's identified already by blkid and exported to udev database). We read the udev db unconditionally here.

  7) the udev database is not available everywhere unfortunately - just like chroot environments and containers. The /run/udev could be remounted there though, but there are probably other problems like selinux (which I heard lately) associated with that. This may make udev db unreliable under chroot or container.

  8) since LVM tries to read the udev db and it can't access that, it tries that several times with a timeout (the 10s - 10 000 000 milliseconds) because it might be just a case that udev hasn't yet processed the device and it needs to complete. But if the udev db is not accessible, the record about device will never get readable - so that's where the "WARNING: Device ... not initialized in udev database even after waiting 10000000 microseconds." comes from.

> 
> iiuc, by "MD" you are referring RAID devices . Please correct me if I am
> wrong. 

Yes, the "MD raid" devices - handled by "md" kernel driver and the related "mdadm" tool in userspace.

> 
> Are you saying we will run into this issue regardless of the device type (
> SCSI or Xen Disk) detected in guest kernel and it ONLY depends on whats
> hosting this chunk of space in backend array? 
> 

Yes, regardless of device type, because LVM needs to check each disk it tries to access and it needs to make sure it's not a device it should not access - simply, LVM needs to run filters on each device.

> How can we verify it we are indeed hitting any of the scenario mentioned in
> comment#2 ?
> 
> Additionally, if I got your comment, we should **not** disable
> 'obtain_device_list_from_udev' in lvm.conf , if we do there are more chances
> for failure of device detection. Isnt it ?

Yes, if we don't have access to udev db records, there could be an LVM filter failure (a failure to detect a device which shouldn't be considered for a PV and it shouldn't be processed further by LVM at all). 

But this fails only in certain narrow cases:

  - you have MD with (old) metadata version 0.9 or 1.0 on your system which store signatures at the end of the disk and which can't be seen by LVM due to the new caching mechanism which looks at the start of the disk only and hence we need to rely on external info from udev db here only where the info is exported directly by mdadm tooling

  - there's also a case with multipath (device-mapper-multipath) components where the top-level mpath device is not yet set up (this should be a rare case though), again we need to rely on udev db here to get the info exported by "multipath" tool that in turn exports this info from its configuration.


If we disable udev, there could be other (non-LVM) components these days on the system that fully rely on udev db records and if they don't have that, they can fail to see the device completely. LVM itself can still deal with the non-udev environment though (it's just that "old MD metadata" and "mpath device not yet set up" scenario where it can't deal with that - that's what you lose with setting obtain_device_list_from_udev=0).

So, disabling udev, yes, it can be done, but it could have consequences for other system components. Thing is, I'm not able to tell exactly who all read udev db these days and relies on this information completely, without any further fallback if udev db is not accessible. But since udev is a standard way of dealing with /dev these days, such components definitely exist...

Comment 8 Peter Rajnoha 2019-02-14 12:16:20 UTC
(In reply to Peter Rajnoha from comment #7)
> (In reply to Humble Chirammal from comment #5)
> > Additionally, if I got your comment, we should **not** disable
> > 'obtain_device_list_from_udev' in lvm.conf , if we do there are more chances
> > for failure of device detection. Isnt it ?
> 
> Yes, if we don't have access to udev db records, there could be an LVM
> filter failure (a failure to detect a device which shouldn't be considered
> for a PV and it shouldn't be processed further by LVM at all). 
> 
> But this fails only in certain narrow cases:
> 
>   - you have MD with (old) metadata version 0.9 or 1.0 on your system which
> store signatures at the end of the disk and which can't be seen by LVM due
> to the new caching mechanism which looks at the start of the disk only and
> hence we need to rely on external info from udev db here only where the info
> is exported directly by mdadm tooling
> 
>   - there's also a case with multipath (device-mapper-multipath) components
> where the top-level mpath device is not yet set up (this should be a rare
> case though), again we need to rely on udev db here to get the info exported
> by "multipath" tool that in turn exports this info from its configuration.
> 

Well, there's also one more associated problem, but that's wider problem when dealing with devices in a container while the host system does access the device as well:

  - if you handle a device inside container/guest (add or change), there's going to be a uevent on the host generate. Then running associated udev rules there which may open the device. If the device is open on the host, then the device handling inside container may fail because there's "someone else" handling the device outside (e.g. you can't exclusively open a device inside container to initialize it while the device is still open on the host due to scanning based on udev rules there). The devices are not containerized and we don't yet have any mediator between host and containers to avoid this parallel access (e.g. if we're handling device inside container, then the handling on the host should be ignored).

Comment 9 Peter Rajnoha 2019-02-14 12:18:41 UTC
(...when you have udev available, then of course LVM can synchronize itself with udev rule processing - wait for it to settle down. But if we don't have this in the container, we just execute actions even though there might be parallel processing on the host.)

Comment 10 Humble Chirammal 2019-02-15 11:05:16 UTC
(In reply to Peter Rajnoha from comment #7)

> 
> So, disabling udev, yes, it can be done, but it could have consequences for
> other system components. Thing is, I'm not able to tell exactly who all read
> udev db these days and relies on this information completely, without any
> further fallback if udev db is not accessible. But since udev is a standard
> way of dealing with /dev these days, such components definitely exist...

Perfect !! and thanks a lot Peter for detailed update and it clears the dust!!

I will revert if there are some questions/clarification required.

Comment 11 Marian Csontos 2019-02-28 14:35:19 UTC
This looks like ZStream material, to me. Jon, Niels?

Comment 12 Niels de Vos 2019-02-28 15:12:22 UTC
(In reply to Marian Csontos from comment #11)
> This looks like ZStream material, to me. Jon, Niels?

Yes, we'd like to include the latest lvm2 package in our OCS product. Currently we're downgrading to the previous version as we can not disable all udev integration (which does not work in containers).

Thanks!

Comment 17 Corey Marthaler 2019-07-01 21:44:34 UTC
Marking verified (SanityOnly). 

No related regressions found in single node environment regression testing.

3.10.0-1057.el7.x86_64

lvm2-2.02.185-2.el7    BUILT: Fri Jun 21 04:18:48 CDT 2019
lvm2-libs-2.02.185-2.el7    BUILT: Fri Jun 21 04:18:48 CDT 2019
lvm2-cluster-2.02.185-2.el7    BUILT: Fri Jun 21 04:18:48 CDT 2019
lvm2-lockd-2.02.185-2.el7    BUILT: Fri Jun 21 04:18:48 CDT 2019
lvm2-python-boom-0.9-18.el7    BUILT: Fri Jun 21 04:18:58 CDT 2019
cmirror-2.02.185-2.el7    BUILT: Fri Jun 21 04:18:48 CDT 2019
device-mapper-1.02.158-2.el7    BUILT: Fri Jun 21 04:18:48 CDT 2019
device-mapper-libs-1.02.158-2.el7    BUILT: Fri Jun 21 04:18:48 CDT 2019
device-mapper-event-1.02.158-2.el7    BUILT: Fri Jun 21 04:18:48 CDT 2019
device-mapper-event-libs-1.02.158-2.el7    BUILT: Fri Jun 21 04:18:48 CDT 2019
device-mapper-persistent-data-0.8.5-1.el7    BUILT: Mon Jun 10 03:58:20 CDT 2019

Comment 19 errata-xmlrpc 2019-08-06 13:10:44 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2253