Bug 2090169 - Invalid entry in /etc/multipath/wwids causes unbootable ovirt-node
Summary: Invalid entry in /etc/multipath/wwids causes unbootable ovirt-node
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: vdsm
Classification: oVirt
Component: General
Version: 4.50.0.13
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ovirt-4.5.1
: 4.50.1.4
Assignee: Nir Soffer
QA Contact: Yaning Wang
URL:
Whiteboard:
Depends On:
Blocks: 2095588
TreeView+ depends on / blocked
 
Reported: 2022-05-25 09:35 UTC by Jean-Louis Dupond
Modified: 2022-06-23 08:12 UTC (History)
9 users (show)

Fixed In Version: vdsm-4.50.1.4
Clone Of:
: 2095588 (view as bug list)
Environment:
Last Closed: 2022-06-23 07:55:04 UTC
oVirt Team: Storage
Embargoed:
pm-rhel: ovirt-4.5?
michal.skrivanek: exception+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github oVirt vdsm pull 245 0 None Merged lvmlocal.conf: Disable multipath_wwids_file usage 2022-06-27 20:36:09 UTC
Github oVirt vdsm pull 260 0 None Merged lvmlocal.conf: Disable multipath_wwids_file usage 2022-06-27 20:36:03 UTC
Red Hat Issue Tracker RHV-46117 0 None None None 2022-05-25 09:54:00 UTC

Description Jean-Louis Dupond 2022-05-25 09:35:40 UTC
Description of problem:
After upgrading an oVirt node from 4.4.10 to 4.5.0, the node didn't boot anymore and ended up in dracut rescue shell.

In the rescue shell it was clear that it didn't boot because LVM did not activate the root devices.

Now when trying to manually activate the LV's, it refused because the device was part of a mpath component.

But that was not the case, and the device's wwid was correctly added to the blacklist by vdsm.

Finally I found out that it was caused by the fact that despite the wwid was blacklisted, it was still listed in /etc/multipath/wwids.
This caused LVM to ignore the device, and render the node unbootable.

Removing the device from the wwids file fixed the issue.

Guess this is caused by or a newer LVM version or the fact that filter entry was removed in 4.5.0?

Comment 1 Benny Zlotnik 2022-06-06 13:00:24 UTC
Albert, any chance it's related to https://github.com/oVirt/vdsm/pull/228 ?

Comment 2 Albert Esteve 2022-06-06 13:50:30 UTC
> Albert, any chance it's related to https://github.com/oVirt/vdsm/pull/228 ?

It sounds similar, but I don't think so.

What triggered that change was an update in LVM that affected nodes running in rhel9 systems.
LVM now only uses event activation in rhel9, so the flag 'event_activation=0' was causing a misbehavior that left LVs inactive and broke the boot.

In this case, it seems to be a problem with LVM and multipath.

Comment 3 Arik 2022-06-06 14:36:31 UTC
We didn't test this with node/rhvh yet so this might be a node-issue that we'll face once we test upgrades with RHVH

Comment 4 Nir Soffer 2022-06-06 15:49:31 UTC
I think this is a duplicate of bug 2076262

David already fixed this in LVM, but I don't know when the fix will be available.

Comment 6 Nir Soffer 2022-06-09 20:31:12 UTC
Jean-Louis, do you want to test the fix? you can use the rpms built
by github here:
https://github.com/oVirt/vdsm/actions/runs/2470998957

Comment 7 Nir Soffer 2022-06-09 20:59:40 UTC
Updating severity, this cause node upgrade to fail and require fixing the host
in emergency mode.

Comment 8 Jean-Louis Dupond 2022-06-10 13:06:04 UTC
Nir: What would be a proper way to test on ovirt-node? Can't I just change the lvmlocal.conf and rebuild the initramfs (how to do that correctly?)

Comment 9 Nir Soffer 2022-06-10 19:39:00 UTC
(In reply to Jean-Louis Dupond from comment #8)
> Nir: What would be a proper way to test on ovirt-node? Can't I just change
> the lvmlocal.conf and rebuild the initramfs (how to do that correctly?)

Testing on existing hosts that has this issue (device in /etc/multiapth/wwids)
and running "vdsm-tool config-lvm-filter" does not import the vgs devices 
to the devices file is good case to test.

Steps:
1. Update the lvmlocal.conf with changes from the patch (new option, new revision)
2. Configure lvm:

   vdsm-tool config-lvm-filter
3. reboot

Expected results:
- use_devicesfile should be enabled in lvm.conf
- lvm filter should be removed from lvm.conf
- lvmdevices command should report all the relevant devices used by all host vgs.
- host should reboot successfully

I'm not sure that rebuilding the initramfs is needed since lvm does not use
the devices file during early boot. If you want to be sure you can run

   dracut -f

This may not be enough for ovirt-node.

The other use case we need to test is upgrade - "vdsm-tool configue" installs
a new lvmlocal.conf and we need to make sure the new file is use in the new
layer after rebooting.

Comment 14 Michal Skrivanek 2022-06-23 07:55:04 UTC
https://cbs.centos.org/koji/buildinfo?buildID=39701


Note You need to log in before you can comment on or make changes to this bug.