1676612 – lvm tools expect access to udev, even when disabled in the configuration

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1676612 - lvm tools expect access to udev, even when disabled in the configuration

Summary: lvm tools expect access to udev, even when disabled in the configuration

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	lvm2
Sub Component:
Version:	7.6
Hardware:	x86_64
OS:	Linux
Priority:	urgent
Severity:	urgent
Target Milestone:	pre-dev-freeze
Target Release:	---
Assignee:	Peter Rajnoha
QA Contact:	cluster-qe@redhat.com
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1674485 1676466 1684133 1688316
TreeView+	depends on / blocked

Reported:	2019-02-12 16:48 UTC by Niels de Vos
Modified:	2024-12-20 18:48 UTC (History)
CC List:	20 users (show)
Fixed In Version:	lvm2-2.02.184-1.el7
Doc Type:	Bug Fix
Doc Text:	Missing conditional when querying udev database. When the database is not present. e.g. in cluster this resulted in a timeout and lvm commands took a long time. Fix ensures the udev access respects obtain_device_list_from_udev settings.
Clone Of:
Clones:	1684133 1688316 (view as bug list)
Environment:
Last Closed:	2019-08-06 13:10:44 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Bugzilla	1674475	1	None	None	None	2024-09-18 00:50:09 UTC
Red Hat Bugzilla	1675134	0	urgent	CLOSED	[GSS] Gluster pod loses udev access with 3.11.1 upgrade	2022-03-13 16:58:24 UTC
Red Hat Bugzilla	1676466	0	urgent	CLOSED	LVM in the glusterfs container should not try to use udev	2022-03-13 16:58:39 UTC
Red Hat Knowledge Base (Solution)	4281181	0	None	None	None	2020-09-11 00:27:51 UTC
Red Hat Product Errata	RHBA-2019:2253	0	None	None	None	2019-08-06 13:11:05 UTC

Internal Links: 1674485 1675134

Description Niels de Vos 2019-02-12 16:48:01 UTC

Description of problem:
When running lvm commands (pvscan and the like) in a container, progress is really slow and can take many hours (depending on the number of devices and possibly other factors).

While running 'pvs' messages like these are printed:

WARNING: Device /dev/xvda2 not initialized in udev database even after waiting 10000000 microseconds.
WARNING: Device /dev/xvdb1 not initialized in udev database even after waiting 10000000 microseconds.
WARNING: Device /dev/xvdc not initialized in udev database even after waiting 10000000 microseconds.
WARNING: Device /dev/xvdd not initialized in udev database even after waiting 10000000 microseconds.
WARNING: Device /dev/xvdf not initialized in udev database even after waiting 10000000 microseconds.

Because this is running in a container, there is no udev available (it will be running on the host instead). In /etc/lvm/lvm.conf many options related to udev have been disabled.

More details with ltrace output can be found at https://bugzilla.redhat.com/show_bug.cgi?id=1674485#c8

Version-Release number of selected component (if applicable):
lvm2-2.02.180-10.el7_6.3.x86_64

How reproducible:
100% but depending on the (virtual) hardware used, fails on AWS and Hyper-V

Steps to Reproduce:
1. deploy OpenShift Container Storage (OCS) 3.11.1
2. rsh/exec into a glusterfs container, run some lvm commands

Actual results:
Started glusterfs-server containers never become Ready because initialization of the devices (pvscan is called during start) never finishes.

Expected results:
The glusterfs-server container should become Ready in a few minutes after starting.

Additional info:
Downgrading to lvm2-2.02.180-10.el7_6.2.x86_64 resolves the issue.

Future versions of the rhgs-server-container will disable obtain_device_list_from_udev in lvm.conf as well. This is currently not sufficient as it does not completely disables udev interactions. That needs
https://sourceware.org/git/?p=lvm2.git;a=commitdiff;h=3ebce8dbd2d9afc031e0737f8feed796ec7a8df9;hp=d19e3727951853093828b072e254e447f7d61c60

Comment 3 Peter Rajnoha 2019-02-13 11:58:23 UTC

(In reply to Niels de Vos from comment #0)
> Future versions of the rhgs-server-container will disable
> obtain_device_list_from_udev in lvm.conf as well. This is currently not
> sufficient as it does not completely disables udev interactions. That needs 
> https://sourceware.org/git/?p=lvm2.git;a=commitdiff;
> h=3ebce8dbd2d9afc031e0737f8feed796ec7a8df9;
> hp=d19e3727951853093828b072e254e447f7d61c60

Yes, this recent patch is currently missing in a build. However, it's also worth noting that there was a reason why we started reading udev database for information about MD and mpath devices - this was because, under some circumstances and due to some recent changes in how LVM does scanning and device filtering, LVM was unable to correctly detect some MD components. The problematic scenario is when:

  - MD device on top of MD components is not activated yet

  - MD metadata are placed at the end of the disk (current LVM disk-reading code caches/considers only start of the disk)

Similar for mpath which doesn't use any signatures on disk at all and if the mpath device on top of mpath components is not set up yet, LVM doesn't filter these components.

Usign udev info here means we're using info which is exported directly by MD and mpath tools based on their configuration and their knowledge. So without using udev in LVM, we're opening a possibility for these kinds of problems to appear.

Comment 4 Yaniv Kaul 2019-02-14 06:59:31 UTC

Peter, thanks for the analysis. What's the next step here? This has hit OCS 3.11.1 hard on AWS deployments. We need a fix for it. For the time being, we reverted to older lvm2 version.

Comment 5 Humble Chirammal 2019-02-14 09:02:29 UTC

(In reply to Peter Rajnoha from comment #3)
> (In reply to Niels de Vos from comment #0)
> > Future versions of the rhgs-server-container will disable
> > obtain_device_list_from_udev in lvm.conf as well. This is currently not
> > sufficient as it does not completely disables udev interactions. That needs 
> > https://sourceware.org/git/?p=lvm2.git;a=commitdiff;
> > h=3ebce8dbd2d9afc031e0737f8feed796ec7a8df9;
> > hp=d19e3727951853093828b072e254e447f7d61c60
> 
> Yes, this recent patch is currently missing in a build. However, it's also
> worth noting that there was a reason why we started reading udev database
> for information about MD and mpath devices - this was because, under some
> circumstances and due to some recent changes in how LVM does scanning and
> device filtering, LVM was unable to correctly detect some MD components. The
> problematic scenario is when:
> 
>   - MD device on top of MD components is not activated yet
> 
>   - MD metadata are placed at the end of the disk (current LVM disk-reading
> code caches/considers only start of the disk)
> 
> Similar for mpath which doesn't use any signatures on disk at all and if the
> mpath device on top of mpath components is not set up yet, LVM doesn't
> filter these components.
> 
> Usign udev info here means we're using info which is exported directly by MD
> and mpath tools based on their configuration and their knowledge. So without
> using udev in LVM, we're opening a possibility for these kinds of problems
> to appear.

Thanks for the analysis, One question I have here is:

How do we make sure we are hitting this issue being in one of the conditions mentioned above ?

that said, from problem description we can see that , the error/warning comes for the guest kernel devices which are typical Xen based block devices:

--snip--

WARNING: Device /dev/xvda2 not initialized in udev database even after waiting 10000000 microseconds.
WARNING: Device /dev/xvdb1 not initialized in udev database even after waiting 10000000 microseconds.

--/snip--

In another setup: we can see the errors are present on "scsi" devices" as well.

[root@dhcp46-115 /]# time pvs
  WARNING: Device /dev/rhel_dhcp47-42/root not initialized in udev database even after waiting 10000000 microseconds.
  WARNING: Device /dev/sda1 not initialized in udev database even after waiting 10000000 microseconds.
  WARNING: Device /dev/sda2 not initialized in udev database even after waiting 10000000 microseconds.
  WARNING: Device /dev/sda3 not initialized in udev database even after waiting 10000000 microseconds.
  WARNING: Device /dev/vg_0f0b820a8ce7358d933692c82db220fe/brick_539df0f01872cc2d2250f8344de1d1e3 not initialized in udev database even after waiting 10000000 microseconds.
  WARNING: Device /dev/rhel_dhcp47-42/home not initialized in udev database even after waiting 10000000 microseconds.
  WARNING: Device /dev/mapper/docker-8:17-11610-a15d2327feccd20bfbdec749b9ae6a8f6e3e8c84ec6317057fb0d6e470c8f743 not initialized in udev database even after waiting 10000000 microseconds.
  WARNING: Device /dev/mapper/docker-8:17-11610-0db4c3fb7c2a280f4e2e7384a7108630c4c71f13b49a00a8d8323cbf47d654fb not initialized in udev database even after waiting 10000000 microseconds.

--/snip--

iiuc, by "MD" you are referring RAID devices . Please correct me if I am wrong. 

Are you saying we will run into this issue regardless of the device type ( SCSI or Xen Disk) detected in guest kernel and it ONLY depends on whats hosting this chunk of space in backend array? 

How can we verify it we are indeed hitting any of the scenario mentioned in comment#2 ?

Additionally, if I got your comment, we should **not** disable 'obtain_device_list_from_udev' in lvm.conf , if we do there are more chances for failure of device detection. Isnt it ?

Comment 6 Peter Rajnoha 2019-02-14 11:17:35 UTC

(In reply to Yaniv Kaul from comment #4)
> Peter, thanks for the analysis. What's the next step here? This has hit OCS
> 3.11.1 hard on AWS deployments. We need a fix for it. For the time being, we
> reverted to older lvm2 version.

For now, the fix here is for LVM to not try to get any records from udev if obtain_device_list_from_udev=0 set in lvm.conf (we need to add that new patch to a new build: https://sourceware.org/git/?p=lvm2.git;a=commitdiff;h=3ebce8dbd2d9afc031e0737f8feed796ec7a8df9;hp=d19e3727951853093828b072e254e447f7d61c60).

Comment 7 Peter Rajnoha 2019-02-14 12:05:40 UTC

(In reply to Humble Chirammal from comment #5)
> (In reply to Peter Rajnoha from comment #3)
> > (In reply to Niels de Vos from comment #0)
> > > Future versions of the rhgs-server-container will disable
> > > obtain_device_list_from_udev in lvm.conf as well. This is currently not
> > > sufficient as it does not completely disables udev interactions. That needs 
> > > https://sourceware.org/git/?p=lvm2.git;a=commitdiff;
> > > h=3ebce8dbd2d9afc031e0737f8feed796ec7a8df9;
> > > hp=d19e3727951853093828b072e254e447f7d61c60
> > 
> > Yes, this recent patch is currently missing in a build. However, it's also
> > worth noting that there was a reason why we started reading udev database
> > for information about MD and mpath devices - this was because, under some
> > circumstances and due to some recent changes in how LVM does scanning and
> > device filtering, LVM was unable to correctly detect some MD components. The
> > problematic scenario is when:
> > 
> >   - MD device on top of MD components is not activated yet
> > 
> >   - MD metadata are placed at the end of the disk (current LVM disk-reading
> > code caches/considers only start of the disk)
> > 
> > Similar for mpath which doesn't use any signatures on disk at all and if the
> > mpath device on top of mpath components is not set up yet, LVM doesn't
> > filter these components.
> > 
> > Usign udev info here means we're using info which is exported directly by MD
> > and mpath tools based on their configuration and their knowledge. So without
> > using udev in LVM, we're opening a possibility for these kinds of problems
> > to appear.
> 
> Thanks for the analysis, One question I have here is:
> 
> How do we make sure we are hitting this issue being in one of the conditions
> mentioned above ?
> 
> that said, from problem description we can see that , the error/warning
> comes for the guest kernel devices which are typical Xen based block devices:
> 
> --snip--
> 
> WARNING: Device /dev/xvda2 not initialized in udev database even after
> waiting 10000000 microseconds.
> WARNING: Device /dev/xvdb1 not initialized in udev database even after
> waiting 10000000 microseconds.
> 

OK, I'll try to explain where this comes from, but I also need to explain the context a little bit...

The facts:

  1) there's a new internal cache for reading disks in LVM (added sometime 1/2 year ago), this cache caches data from start of disks (this was done to speedup LVM device scanning)

  2) this works fine mostly, but there are some cases where you need to also read the end of the disk (older versions of MD metadata write their headers at the end of the disk instead of its start - MD metadata version 0.9 and 1.0)

  3) when you execute an LVM command, LVM internally filters out improper devices ("improper" here means "they can't be PVs" for various reasons, e.g. MD component, mpath component, too small disk to hold a PV...)

  4) these internal LVM filters use the new internal LVM caching mechanism now

  5) since the new caching mechanism caches only starts of disks, we can't reliably filter out MD component by looking at the start of the disk (the cached start), but we also need to check its end whether we don't have MD component with metadata version 0.9 or 1.0 in which case this must not be used as a PV and must be filtered out

  6) since this was just a few cases where we need to check the end of disk, instead of adding complexity to the new LVM caching mechanism, we decided to read the already existing information about MD components directly from udev database (there, it's identified already by blkid and exported to udev database). We read the udev db unconditionally here.

  7) the udev database is not available everywhere unfortunately - just like chroot environments and containers. The /run/udev could be remounted there though, but there are probably other problems like selinux (which I heard lately) associated with that. This may make udev db unreliable under chroot or container.

  8) since LVM tries to read the udev db and it can't access that, it tries that several times with a timeout (the 10s - 10 000 000 milliseconds) because it might be just a case that udev hasn't yet processed the device and it needs to complete. But if the udev db is not accessible, the record about device will never get readable - so that's where the "WARNING: Device ... not initialized in udev database even after waiting 10000000 microseconds." comes from.

> 
> iiuc, by "MD" you are referring RAID devices . Please correct me if I am
> wrong. 

Yes, the "MD raid" devices - handled by "md" kernel driver and the related "mdadm" tool in userspace.

> 
> Are you saying we will run into this issue regardless of the device type (
> SCSI or Xen Disk) detected in guest kernel and it ONLY depends on whats
> hosting this chunk of space in backend array? 
> 

Yes, regardless of device type, because LVM needs to check each disk it tries to access and it needs to make sure it's not a device it should not access - simply, LVM needs to run filters on each device.

> How can we verify it we are indeed hitting any of the scenario mentioned in
> comment#2 ?
> 
> Additionally, if I got your comment, we should **not** disable
> 'obtain_device_list_from_udev' in lvm.conf , if we do there are more chances
> for failure of device detection. Isnt it ?

Yes, if we don't have access to udev db records, there could be an LVM filter failure (a failure to detect a device which shouldn't be considered for a PV and it shouldn't be processed further by LVM at all). 

But this fails only in certain narrow cases:

  - you have MD with (old) metadata version 0.9 or 1.0 on your system which store signatures at the end of the disk and which can't be seen by LVM due to the new caching mechanism which looks at the start of the disk only and hence we need to rely on external info from udev db here only where the info is exported directly by mdadm tooling

  - there's also a case with multipath (device-mapper-multipath) components where the top-level mpath device is not yet set up (this should be a rare case though), again we need to rely on udev db here to get the info exported by "multipath" tool that in turn exports this info from its configuration.


If we disable udev, there could be other (non-LVM) components these days on the system that fully rely on udev db records and if they don't have that, they can fail to see the device completely. LVM itself can still deal with the non-udev environment though (it's just that "old MD metadata" and "mpath device not yet set up" scenario where it can't deal with that - that's what you lose with setting obtain_device_list_from_udev=0).

So, disabling udev, yes, it can be done, but it could have consequences for other system components. Thing is, I'm not able to tell exactly who all read udev db these days and relies on this information completely, without any further fallback if udev db is not accessible. But since udev is a standard way of dealing with /dev these days, such components definitely exist...

Comment 8 Peter Rajnoha 2019-02-14 12:16:20 UTC

(In reply to Peter Rajnoha from comment #7)
> (In reply to Humble Chirammal from comment #5)
> > Additionally, if I got your comment, we should **not** disable
> > 'obtain_device_list_from_udev' in lvm.conf , if we do there are more chances
> > for failure of device detection. Isnt it ?
> 
> Yes, if we don't have access to udev db records, there could be an LVM
> filter failure (a failure to detect a device which shouldn't be considered
> for a PV and it shouldn't be processed further by LVM at all). 
> 
> But this fails only in certain narrow cases:
> 
>   - you have MD with (old) metadata version 0.9 or 1.0 on your system which
> store signatures at the end of the disk and which can't be seen by LVM due
> to the new caching mechanism which looks at the start of the disk only and
> hence we need to rely on external info from udev db here only where the info
> is exported directly by mdadm tooling
> 
>   - there's also a case with multipath (device-mapper-multipath) components
> where the top-level mpath device is not yet set up (this should be a rare
> case though), again we need to rely on udev db here to get the info exported
> by "multipath" tool that in turn exports this info from its configuration.
> 

Well, there's also one more associated problem, but that's wider problem when dealing with devices in a container while the host system does access the device as well:

  - if you handle a device inside container/guest (add or change), there's going to be a uevent on the host generate. Then running associated udev rules there which may open the device. If the device is open on the host, then the device handling inside container may fail because there's "someone else" handling the device outside (e.g. you can't exclusively open a device inside container to initialize it while the device is still open on the host due to scanning based on udev rules there). The devices are not containerized and we don't yet have any mediator between host and containers to avoid this parallel access (e.g. if we're handling device inside container, then the handling on the host should be ignored).

Comment 9 Peter Rajnoha 2019-02-14 12:18:41 UTC

(...when you have udev available, then of course LVM can synchronize itself with udev rule processing - wait for it to settle down. But if we don't have this in the container, we just execute actions even though there might be parallel processing on the host.)

Comment 10 Humble Chirammal 2019-02-15 11:05:16 UTC

(In reply to Peter Rajnoha from comment #7)

> 
> So, disabling udev, yes, it can be done, but it could have consequences for
> other system components. Thing is, I'm not able to tell exactly who all read
> udev db these days and relies on this information completely, without any
> further fallback if udev db is not accessible. But since udev is a standard
> way of dealing with /dev these days, such components definitely exist...

Perfect !! and thanks a lot Peter for detailed update and it clears the dust!!

I will revert if there are some questions/clarification required.

Comment 11 Marian Csontos 2019-02-28 14:35:19 UTC

This looks like ZStream material, to me. Jon, Niels?

Comment 12 Niels de Vos 2019-02-28 15:12:22 UTC

(In reply to Marian Csontos from comment #11)
> This looks like ZStream material, to me. Jon, Niels?

Yes, we'd like to include the latest lvm2 package in our OCS product. Currently we're downgrading to the previous version as we can not disable all udev integration (which does not work in containers).

Thanks!

Comment 17 Corey Marthaler 2019-07-01 21:44:34 UTC

Marking verified (SanityOnly). 

No related regressions found in single node environment regression testing.

3.10.0-1057.el7.x86_64

lvm2-2.02.185-2.el7    BUILT: Fri Jun 21 04:18:48 CDT 2019
lvm2-libs-2.02.185-2.el7    BUILT: Fri Jun 21 04:18:48 CDT 2019
lvm2-cluster-2.02.185-2.el7    BUILT: Fri Jun 21 04:18:48 CDT 2019
lvm2-lockd-2.02.185-2.el7    BUILT: Fri Jun 21 04:18:48 CDT 2019
lvm2-python-boom-0.9-18.el7    BUILT: Fri Jun 21 04:18:58 CDT 2019
cmirror-2.02.185-2.el7    BUILT: Fri Jun 21 04:18:48 CDT 2019
device-mapper-1.02.158-2.el7    BUILT: Fri Jun 21 04:18:48 CDT 2019
device-mapper-libs-1.02.158-2.el7    BUILT: Fri Jun 21 04:18:48 CDT 2019
device-mapper-event-1.02.158-2.el7    BUILT: Fri Jun 21 04:18:48 CDT 2019
device-mapper-event-libs-1.02.158-2.el7    BUILT: Fri Jun 21 04:18:48 CDT 2019
device-mapper-persistent-data-0.8.5-1.el7    BUILT: Mon Jun 10 03:58:20 CDT 2019

Comment 19 errata-xmlrpc 2019-08-06 13:10:44 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2253

Note You need to log in before you can comment on or make changes to this bug.

agk
bkunal
cmarthal
dyocum
hchiramm
heinzm
jbrassow
mcsontos
mrobson
msnitzer
ndevos
pasik
pdwyer
prajnoha
puebele
rbednar
rcyriac
rhandlin
yuokada
zkabelac