Bug 2026370 - oVirt node fail to boot if lvm filter uses /dev/disk/by-id/lvm-pv-uuid-*
Summary: oVirt node fail to boot if lvm filter uses /dev/disk/by-id/lvm-pv-uuid-*
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: vdsm
Classification: oVirt
Component: Tools
Version: 4.50.0.2
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ovirt-4.5.0
: 4.50.0.3
Assignee: Nir Soffer
QA Contact: Shir Fishbain
URL:
Whiteboard:
Depends On: 2026640
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-11-24 13:37 UTC by Yedidyah Bar David
Modified: 2022-04-20 06:32 UTC (History)
7 users (show)

Fixed In Version: vdsm-4.50.0.3
Clone Of:
Environment:
Last Closed: 2022-03-13 09:32:14 UTC
oVirt Team: Storage
Embargoed:
pm-rhel: ovirt-4.5?


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHV-44076 0 None None None 2021-11-24 13:40:09 UTC
oVirt gerrit 117748 0 master MERGED lvmfilter: Don't use /dev/disk/by-id/lvm-pv-uuid-* 2021-12-02 18:02:42 UTC
oVirt gerrit 118047 0 None NEW stream: lvm2: exclude latest lvm2 due to regression 2021-12-15 13:37:56 UTC

Description Yedidyah Bar David 2021-11-24 13:37:23 UTC
Description of problem:

A machine installed with ovirt-node which was built with lvm2-2.03.14-1.el8.x86_64 fails during boot, while trying to mount /var, due to the filter added to /etc/lvm/lvm.conf by vdsm-tool config-lvm-filter (which is ran on ovirt-node by imgbase from imgbased project during installation).

Version-Release number of selected component (if applicable):
Current master vdsm, lvm2-2.03.14-1.el8.x86_64

How reproducible:
Always

Steps to Reproduce:
1. See above
2.
3.

Actual results:
Machine boots into an emergency shell failing to mount /var (and /var/log, /var/log/audit - all are separate filesystems in node).

Expected results:
Machine successfully boots

Additional info:
Nir already pushed a patch for this bug [1]. This reverts the fix for bug 1635614, so we need to decide if it's enough as-is, also making sure we do not regress.

[1] https://gerrit.ovirt.org/c/vdsm/+/117748

Comment 1 David Teigland 2021-11-24 14:32:17 UTC
That build of lvm somehow contains RHEL9-only features and changes, and should never have appeared in RHEL8.  That build needs to be removed as soon as possible.

Comment 2 Nir Soffer 2021-11-24 15:17:20 UTC
(In reply to David Teigland from comment #1)
> That build of lvm somehow contains RHEL9-only features and changes, and
> should never have appeared in RHEL8.  That build needs to be removed as soon
> as possible.

Do we need LVM bug for this?

Comment 3 Nir Soffer 2021-11-24 15:31:14 UTC
Adding more info from internal mail thread:

On the broken system:
=====================

journalctl -o json-pretty has:

        "MESSAGE" : "/dev/vda3 excluded by filters: device is rejected
by filter config.",
...
        "_CMDLINE" : "/usr/sbin/lvm pvscan --cache --listvg
--checkcomplete --vgonline --udevoutput --journal=output /dev/vda3",

[root@localhost ~]# grep ^filter /etc/lvm/lvm.conf
filter = ["a|^/dev/disk/by-id/lvm-pv-uuid-5M4lBs-4jIZ-2NBF-DRSl-dHwR-6Vfs-PQIOEB$|",
"r|.*|"]

/dev/disk/by-id/lvm-pv-uuid-5M4lBs-4jIZ-2NBF-DRSl-dHwR-6Vfs-PQIOEB
is a symlink to /dev/vda3.

This makes it fail to mount /var (and /var/log{,audit}).

If I 'vgchange -ay', it does successfully mount them.

lvm version: lvm2-2.03.14-1.el8.x86_64

On the working system:
======================
        "MESSAGE" : "  pvscan[1238] PV /dev/vda3 online, VG
onn_ibm-p8-kvm-03-guest-02 is complete.",
...
        "_CMDLINE" : "/usr/sbin/lvm pvscan --cache --activate ay 252:3",

(filter line and the symlink are similar (with a different ID)).

lvm version: lvm2-2.03.12-10.el8.x86_64

...

>         "_CMDLINE" : "/usr/sbin/lvm pvscan --cache --listvg
> --checkcomplete --vgonline --udevoutput --journal=output /dev/vda3",

This is in the udev rule for the new RHEL9 autoactivation method.

Related change:

https://github.com/lvmteam/lvm2/commit/67722b312390cdab29c076c912e14bd739c5c0f6#diff-6a1e9a3e15f9d614cbda0b5b26084c30ffa609a9c951ed1f89fd8a25a12edbb3R82

But this happens on Centos Stream 8:
lvm2-2.03.14-1.el8.x86_64

Comment 4 Nir Soffer 2021-11-24 15:58:52 UTC
From vdsm point of view, this regression in lvm shows that we cannot depend
on the udev links (/dev/disk/by-id/lvm-pv-uuid-*).

These link do not work for multipath devices (bug 2016173), and they are very
fragile and can break when lvm changes the udev rule (this bug).

The reason we use these links is that device names are not stable (bug 1635614).
This bug will be resolved by the switch to lvm devices (bug 2012830).

Comment 5 Nir Soffer 2021-12-02 18:30:52 UTC
Vdsm does not use the udev links /dev/disk/by-id/lvm-pv-uuid-xxx now.

LVM filter created by vdsm will use the device name /dev/sd{x} for
scsi devices, and /dev/mapper/{wwid} for multipath devices. This should
fix the issue when booting from SAN.

When adding a host to engine, new filter will use the new format.

To upgrade a host with older lvm filter to the new format, run:

    vdsm-tool config-lvm-filter

This change will be available in the next ovirt-4.5 build.

Comment 6 Sandro Bonazzola 2021-12-13 17:15:50 UTC
This seems to affect oVirt 4.4.9 new installs as well.
Can you please confirm? If that's the case we need an urgent backport to 4.4 as well.

Comment 7 Sandro Bonazzola 2021-12-13 17:37:29 UTC
Diego Ercolani is reporting failure on oVirt Italia Telegram channel with:

[root@ovirt-node2 ~]# rpm -qa | grep systemd
systemd-container-239-51.el8.x86_64
systemd-libs-239-51.el8.x86_64
systemd-pam-239-51.el8.x86_64
systemd-udev-239-51.el8.x86_64
python3-systemd-234-8.el8.x86_64
clevis-systemd-15-4.el8.x86_64
systemd-239-51.el8.x86_64

[root@ovirt-node2 ~]# rpm -qa | grep vdsm   
vdsm-jsonrpc-4.40.90.4-1.el8.noarch
vdsm-python-4.40.90.4-1.el8.noarch
vdsm-gluster-4.40.90.4-1.el8.x86_64
vdsm-common-4.40.90.4-1.el8.noarch
vdsm-client-4.40.90.4-1.el8.noarch
vdsm-4.40.90.4-1.el8.x86_64
vdsm-yajsonrpc-4.40.90.4-1.el8.noarch
vdsm-http-4.40.90.4-1.el8.noarch
vdsm-network-4.40.90.4-1.el8.x86_64
vdsm-api-4.40.90.4-1.el8.noarch

Comment 8 Nir Soffer 2021-12-13 17:44:01 UTC
(In reply to Sandro Bonazzola from comment #7)
> Diego Ercolani is reporting failure on oVirt Italia Telegram channel with:

This issues exists only on Centos Stream 8 or RHEL 8.6 nightly, both have broken
lvm (bug 2026640).

I don't know about any issue with RHEL 8.5.

To debug, please get output of:

    rpm -q lvm2
    grep ^filter /etc/lvm/lvm.conf
    lsinitrd -f /etc/lvm/lvm.conf | grep ^filter

Comment 9 Nir Soffer 2021-12-13 18:55:53 UTC
I reproduced the same issue with Centos Stream 8 host.

1. Installed new host from CentOS-Stream-8-x86_64-20211206-dvd1.iso
2. dnf update
3. Add ovirt-release44.rpm
4. Add host to engine 4.5 master
5. reboot host

Host failed to boot.

Restarting using rescue mode, I fond that the host was using
the expected /dev/disk/by-id/lvm-pv-uuid-xxx link in the lvm
filter.

Replacing the filter to /dev/vda2 and "dracut -f" fixes the issue.

Packages:
lvm2-2.03.14-1.el8.x86_64
vdsm-4.40.90.4-1.el8.x86_64

Working configuration:

# grep ^filter /etc/lvm/lvm.conf
filter = ["a|^/dev/vda2$|", "r|.*|"]

# lsinitrd -f /etc/lvm/lvm.conf | grep ^filter
filter = ["a|^/dev/vda2$|", "r|.*|"]

Running vdsm-tool config-lvm-filter suggests to replace the working
filter:

# vdsm-tool config-lvm-filter
Analyzing host...
Found these mounted logical volumes on this host:

  logical volume:  /dev/mapper/cs-root
  mountpoint:      /
  devices:         /dev/disk/by-id/lvm-pv-uuid-klrMLR-8GHy-L3nS-qLrS-32Wp-IjeD-DPoMux

  logical volume:  /dev/mapper/cs-swap
  mountpoint:      [SWAP]
  devices:         /dev/disk/by-id/lvm-pv-uuid-klrMLR-8GHy-L3nS-qLrS-32Wp-IjeD-DPoMux

This is the recommended LVM filter for this host:

  filter = [ "a|^/dev/disk/by-id/lvm-pv-uuid-klrMLR-8GHy-L3nS-qLrS-32Wp-IjeD-DPoMux$|", "r|.*|" ]

This filter allows LVM to access the local devices used by the
hypervisor, but not shared storage owned by Vdsm. If you add a new
device to the volume group, you will need to edit the filter manually.

This is the current LVM filter:

  filter = [ "a|^/dev/vda2$|", "r|.*|" ]

Comment 10 Sandro Bonazzola 2021-12-15 11:26:46 UTC
Ok, I'm going to blacklist the broken lvm2 build in node build kickstart.
Can you backport the vdsm fix to 4.4.10?

From Diego Ercolani:

lvm2-libs-2.03.14-1.el8.x86_64
libblockdev-lvm-2.24-7.el8.x86_64
llvm-compat-libs-12.0.1-3.module_el8.6.0+1029+6594c364.x86_64
lvm2-2.03.14-1.el8.x86_64

[root@ovirt-node2 ~]# uname -a
Linux ovirt-node2.ovirt 4.18.0-348.2.1.el8_5.x86_64 #1 SMP Tue Nov 16 14:42:35 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

Comment 11 Nir Soffer 2021-12-19 19:47:05 UTC
(In reply to Sandro Bonazzola from comment #10)
> Ok, I'm going to blacklist the broken lvm2 build in node build kickstart.
> Can you backport the vdsm fix to 4.4.10?

I don't want to change this in 4.4.10. This is an issue only when using the
broken lvm version which should be fixed before rhel 8.6 will be released,
so RHV users should never see this issue.

This change also disabled the fix for bug 1635614, so delivering this in
4.4.10 may break users that needed that fix, for solving an issue they
don't have.

Porting this to 4.4.10 makes sense only in upstream, if we want to deliver
it on Centos Stream 8, but I understand that we don't plan such release.

This fix will not be needed once we switch to lvm devieces, replacing
lvm filter, see bug 2012830.

Blacklisting the broken lvm2 build sounds like the right way, for now.

Comment 12 Arik 2022-03-13 09:32:14 UTC
This bug resulted from an issue with lvm-filter which has been replaced with lvm-devices in oVirt 4.5

Comment 13 Sandro Bonazzola 2022-04-20 06:32:24 UTC
This bugzilla is included in oVirt 4.5.0 release, published on April 20th 2022.

Since the problem described in this bug report should be resolved in oVirt 4.5.0 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.