1669266 – LVM2 PV scan hangs 50-70% of boots

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1669266 - LVM2 PV scan hangs 50-70% of boots

Summary: LVM2 PV scan hangs 50-70% of boots

Keywords:
Status:	CLOSED INSUFFICIENT_DATA
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	lvm2
Sub Component:
Version:	7.6
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	Peter Rajnoha
QA Contact:	cluster-qe@redhat.com
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2019-01-24 19:29 UTC by Orion Poplawski
Modified:	2021-09-03 12:41 UTC (History)
CC List:	9 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-11-03 12:56:34 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
/etc/lvm/lvm.conf (91.90 KB, text/plain) 2019-02-04 21:52 UTC, Orion Poplawski	no flags	Details
View All

Description Orion Poplawski 2019-01-24 19:29:41 UTC

Description of problem:

Since the 7.5 -> 7.6 update one of our machines is having trouble booting.  About 50-70% of the boots fail and hang during PV scanning:

systemd[1]: Found device /dev/mapper/vg_root-swap.
systemd[1]: Activating swap /dev/mapper/vg_root-swap...
kernel: Adding 51199996k swap on /dev/mapper/vg_root-swap.  Priority:-2 extents:1 across:51199996k FS
systemd[1]: Activated swap /dev/mapper/vg_root-swap.
systemd[1]: Started Flush Journal to Persistent Storage.
systemd[1]: Reached target Swap.
systemd[1]: Created slice system-lvm2\x2dpvscan.slice.
systemd[1]: Starting LVM2 PV scan on device 9:1...
systemd[1]: Found device /dev/disk/by-uuid/a3aa2173-3539-46dd-8edd-4e2ec49d7551
systemd[1]: Found device /dev/mapper/vg_root-scratch.

and there it hangs forever.

Version-Release number of selected component (if applicable):
lvm2-2.02.180-10.el7_6.2.x86_64

Example successful boot:

systemd[1]: Created slice system-lvm2\x2dpvscan.slice.
systemd[1]: Starting LVM2 PV scan on device 9:1...
systemd[1]: Found device /dev/mapper/vg_root-swap.
systemd[1]: Activating swap /dev/mapper/vg_root-swap...
kernel: Adding 51199996k swap on /dev/mapper/vg_root-swap.  Priority:-2 extents:1 across:51199996k FS
systemd[1]: Activated swap /dev/mapper/vg_root-swap.
systemd[1]: Reached target Swap.
systemd[1]: Found device /dev/disk/by-uuid/a3aa2173-3539-46dd-8edd-4e2ec49d7551.
systemd[1]: Started Flush Journal to Persistent Storage.
systemd[1]: Started Monitoring of LVM2 mirrors, snapshots etc. using dmeventd or progress polling.
systemd[1]: Reached target Local File Systems (Pre).
systemd[1]: tmp.mount: Directory /tmp to mount over is not empty, mounting anyway.
systemd[1]: Mounting /tmp...
systemd[1]: Mounting /boot...
lvm[5685]: WARNING: Device /dev/sda1 not initialized in udev database even after waiting 10000000 microseconds.
lvm[5685]: WARNING: lvmetad is being updated by another command (pid 11045).
lvm[5685]: WARNING: Not using lvmetad because cache update failed.
lvm[5685]: 2 logical volume(s) in volume group "vg_root" monitored

Comment 2 David Teigland 2019-01-24 19:55:11 UTC

systemd makes it very difficult to get any lvm debugging information, but Peter knows some tricks to get some info.

Comment 3 Peter Rajnoha 2019-01-28 12:43:23 UTC

The warning:

 "WARNING: Device /dev/sda1 not initialized in udev database even after waiting 10000000 microseconds"

...is issued by LVM if it tries to retrieve record about a device through libudev from udev database but at the same time, the database record is not yet initialized - either it hasn't been processed at all yet or it's being processed right at that moment and the processing hasn't finished even after waiting some time.

(I've already noticed that people see the same on Debian right now: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=918590).

It looks that this comes from the LVM filter code where we try to read udev db to check whether certain device is not an MD or mpath component (we've probably added a line of code where we look at udev db for certain info about MD to do our filtering more correct, but I had to check the changes in code to be sure).

I can't say right now why the record is not initialized even after waiting 10 seconds (the 10 million microseconds). It's possible the boot is taking a long time (too many devices or devices with slow access) and when we're trying to read the udev db records, the udev processing in the background hasn't finished yet, even after waiting...

I'll try to simulate this problem and see if that's the issue here.

Comment 4 Peter Rajnoha 2019-01-28 12:51:48 UTC

Could you please add "debug" to kernel command line at boot? This will add more debug logging both to systemd and udev. Then you can collect the log with "journalctl -b".

But if it hangs forever and you can't collect the logs, then please try rebooting until it's successful and then call "journalctl -b-1" to get logs for the previous failed boot.

Comment 5 Peter Rajnoha 2019-01-31 13:46:46 UTC

(In reply to Peter Rajnoha from comment #3)
> The warning:
> 
>  "WARNING: Device /dev/sda1 not initialized in udev database even after
> waiting 10000000 microseconds"
> 
> ...is issued by LVM if it tries to retrieve record about a device through
> libudev from udev database but at the same time, the database record is not
> yet initialized - either it hasn't been processed at all yet or it's being
> processed right at that moment and the processing hasn't finished even after
> waiting some time.
> 

I've tried to simulate this problem and the only way I could get to this situation is when we don't have any database record about the device yet. This may happen in two situations:
  - this is the first time we're processing the uevent for this device after we started udevd daemon (this happens at boot)
  - the udev database has been made unavailable (e.g. making /run/udev unavailable or using a different /run), this might be the case of chroot-ed or container environments where the /run with existing info from host hasn't been remounted to "guest"

Anyway, please, if you could collect the logs after adding "debug" to kernel command line, that should help us here (comment #4).

Comment 6 Orion Poplawski 2019-01-31 16:15:51 UTC

Thanks for all the help and suggestions.  I'll give them a try as soon as the machine becomes free again, which may be a few days.  Unfortunately I've not been able to get to a shell on the hung boots and journalctl -b -1 has not yielded proper logs either.  I've cleaned out old journal files though so perhaps that will help.

Comment 7 Orion Poplawski 2019-02-04 16:53:15 UTC

I'm afraid that the keyboard is completely unresponsive when it locks up and journalctl -b -1 is worthless, just a few random sshd entries.

Journal file /var/log/journal/a1b20bef010c40939e8e5d84dd88da0f/system~ is truncated, ignoring file.
-- Logs begin at Thu 2019-01-24 16:20:03 MST, end at Mon 2019-02-04 09:49:39 MST. --
Jan 24 16:20:03 smmic1.cora.nwra.com sshd[22942]: Received disconnect from 10.10.30.10 port 40356:11: disconnected by user
Jan 24 16:20:03 smmic1.cora.nwra.com sshd[22942]: Disconnected from 10.10.30.10 port 40356
...
Feb 02 22:22:57 smmic1.cora.nwra.com sshd[10225]: Received disconnect from 10.11.1.6 port 34218:11: disconnected by user
Feb 02 22:22:57 smmic1.cora.nwra.com sshd[10225]: Disconnected from 10.11.1.6 port 34218

Comment 8 Orion Poplawski 2019-02-04 18:18:50 UTC

Well, I setup a serial console to try to capture the output, but since enabling it I've been unable to get the machine to hang at boot.  I suppose that's an acceptable workaround since it's only the one machine.  Not sure if there is anything else I can provide at this point.

Comment 9 David Teigland 2019-02-04 21:11:46 UTC

I just heard about the same udev timeout error, but it was happening in a container (which does not properly handle udev).  Does this bug have any relation to containers? (Sorry if the answer is obvious from previous comments, I've not used containers, so I'm not sure what it would look like.)

Comment 10 Orion Poplawski 2019-02-04 21:13:13 UTC

This is a physical machine.

Comment 11 Zdenek Kabelac 2019-02-04 21:48:45 UTC

Can you please  attach    lvm.conf  from problematic machine
and basic   'lvdisplay'  ?

Comment 12 Orion Poplawski 2019-02-04 21:52:14 UTC

Created attachment 1526888 [details]
/etc/lvm/lvm.conf

# lvdisplay
  --- Logical volume ---
  LV Path                /dev/vg_root/scratch
  LV Name                scratch
  VG Name                vg_root
  LV UUID                AftvWn-up8G-VcjP-9562-iX2g-5vQX-REsgXP
  LV Write Access        read/write
  LV Creation host, time smmic1.cora.nwra.com, 2015-05-20 16:01:29 -0600
  LV Status              available
  # open                 1
  LV Size                1.29 TiB
  Current LE             338617
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     4096
  Block device           253:2

  --- Logical volume ---
  LV Path                /dev/vg_root/swap
  LV Name                swap
  VG Name                vg_root
  LV UUID                RcDeRE-VjUZ-vFv1-4Byf-pHOo-31s8-9onVeD
  LV Write Access        read/write
  LV Creation host, time smmic1.cora.nwra.com, 2015-05-20 16:01:36 -0600
  LV Status              available
  # open                 2
  LV Size                <48.83 GiB
  Current LE             12500
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     4096
  Block device           253:0

  --- Logical volume ---
  LV Path                /dev/vg_root/root
  LV Name                root
  VG Name                vg_root
  LV UUID                wj3BPF-xEaR-tP0x-rAZL-IE0A-U6j2-8baW3R
  LV Write Access        read/write
  LV Creation host, time smmic1.cora.nwra.com, 2015-05-20 16:01:37 -0600
  LV Status              available
  # open                 1
  LV Size                <24.77 GiB
  Current LE             6340
  Segments               3
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     4096
  Block device           253:1

Comment 13 Nir Soffer 2019-02-05 00:08:39 UTC

(In reply to Peter Rajnoha from comment #5)
>   - the udev database has been made unavailable (e.g. making /run/udev
> unavailable or using a different /run), this might be the case of chroot-ed
> or container environments where the /run with existing info from host hasn't
> been remounted to "guest"

Peter, we seems to hit the same issue in oVirt CI, reproducing easily (80%).

Our environment is:

CI Slave VM running CentOS 7.6 (I don't know the exact version)

Inside the slave, we run mock:

    ../jenkins/mock_configs/mock_runner.sh \
        --execute-script automation/check-patch.tests-py27.sh \
        --mock-confs-dir ../jenkins/mock_configs \
        --secrets-file /home/jenkins/workspace/vdsm_standard-check-patch/std_ci_secrets.yaml \
        --try-proxy --timeout-duration 3h \
        --try-mirrors http://mirrors.phx.ovirt.org/repos/yum/all_latest.json \
        'el7.*x86_64'

Mock is using a cache, so we may run up to 2 days old version.

Inside mock, we run our lvm tests. In the tests we do stuff like:

    losetup -f backing-file --show   # /dev/loop2
    pvcreate /dev/loop2
    pvs ...

We use this lvmlocal.conf on in this environment:

    activation {                                                                                                                                                                 
        udev_sync = 0 
        udev_rules = 0 
    }

This solved issue when activating lvs inside mock or when running in a container in travis.

These tests used to work fine 2-3 weeks ago, but when I run them today they broke because
pvs was hang. Looking in the stack trace of the hang test, we see that is is hang waiting
for pvs to exit.

This is the pvs command we run (this is the args list passed to subprocess.Popen()):

    ['/usr/bin/taskset',
     '--cpu-list', '0-15',
     '/usr/sbin/lvm',
     'pvs',
     '--config', 'devices {  preferred_names=["^/dev/mapper/"]  ignore_suspended_devices=1  write_cache_state=0  disable_after_error_count=3  filter=["a|^/dev/loop2$|^/dev/loop3$|^/dev/loop4$|", "r|.*|"] } global {  locking_type=1  prioritise_write_locks=1  wait_for_locks=1  use_lvmetad=0 } backup {  retain_min=50  retain_days=0 }', '--noheadings', '--units', 'b', '--nosuffix', '--separator', '|', '--ignoreskippedcluster', '-o', 'uuid,name,size,vg_name,vg_uuid,pe_start,pe_count,pe_alloc_count,mda_count,dev_size,mda_used_count',
    u'/dev/loop3']

We see this command hang, logging these warnings to stderr:

    WARNING: Device /dev/loop3 not initialized in udev database even after waiting 10000000
    microseconds.\n', '  WARNING: Device /dev/loop2 not initialized in udev da...
    
The command was hang for about 10 minutes until out test runner killed the hang test.

You can see the hang build log here:
https://jenkins.ovirt.org/job/vdsm_standard-check-patch/2554//artifact/check-patch.tests-py27.el7.x86_64/mock_logs/script/stdout_stderr.log

This is the hang test:
https://gerrit.ovirt.org/gitweb?p=vdsm.git;a=blob;f=tests/storage/lvm_test.py;h=8b19c8f07bb06c83a9948d081ad0c574dcb21bfc;hb=2903e279c70a4edbb061dd9a9c0a0dac3bb17f49#l218

I tried your suggestion and mounted /run/udev inside the chroot
https://gerrit.ovirt.org/c/97555

This seems to fix the issue, all tests succeed again. About 20 runs
were failing before and pass now.

I hope it helps to debug this issue.

Comment 14 Nir Soffer 2019-02-05 00:25:46 UTC

Another interesting detail - we run the same tests in Fedora 28 and 29
containers in travis-ci.

- Fedora 28 all lvms tests pass
- Fedora 29 hangs in the lvm tests on every build.

Here is a Fedora 29 hang build from travis:
https://travis-ci.org/nirs/vdsm/jobs/480107313

In this build lvchange command was hang, in this test:
https://gerrit.ovirt.org/gitweb?p=vdsm.git;a=blob;f=tests/storage/lvm_test.py;h=3289ae8ca8228fea24bb30efcaef5edda1e75315;hb=81093b351e3cefcde4058012ee8234e69bedc14b#l490

If you want to reproduce this in the same container:
docker pull ovirtorg/vdsm-test-fedora-29

See how we run docker here:
https://travis-ci.org/nirs/vdsm/jobs/480107313/config

Comment 15 Peter Rajnoha 2020-11-03 12:56:34 UTC

(In reply to Orion Poplawski from comment #8)
> Well, I setup a serial console to try to capture the output, but since
> enabling it I've been unable to get the machine to hang at boot.  I suppose
> that's an acceptable workaround since it's only the one machine.  Not sure
> if there is anything else I can provide at this point.

I have to close this report then due to insufficient data/logs.

Note: we had other reports where we saw the message "not initialized in udev database even after waiting N microseconds". Usually, this was caused by delayed udev processing which in turn could be caused by high number of devices or some device which is not accessible/responsive correctly and delays further udev processing (we'd probably see this in the debug log, but unfortunately, we don't have it). If we're not able to identify what actually causes these delays, the only thing I can recommend to keep the system booting and not timing out, is to increase the default timeouts (in systemd configuration).

(The other issue that Nir mentioned is about container/chroot environments where udev was not correctly set up - attaching the /run/udev and other needed parts for udev, so this is a bit different than what was originally reported in this bug.)

Note You need to log in before you can comment on or make changes to this bug.