Description of problem: After upgrading an oVirt node from 4.4.10 to 4.5.0, the node didn't boot anymore and ended up in dracut rescue shell. In the rescue shell it was clear that it didn't boot because LVM did not activate the root devices. Now when trying to manually activate the LV's, it refused because the device was part of a mpath component. But that was not the case, and the device's wwid was correctly added to the blacklist by vdsm. Finally I found out that it was caused by the fact that despite the wwid was blacklisted, it was still listed in /etc/multipath/wwids. This caused LVM to ignore the device, and render the node unbootable. Removing the device from the wwids file fixed the issue. Guess this is caused by or a newer LVM version or the fact that filter entry was removed in 4.5.0?
Albert, any chance it's related to https://github.com/oVirt/vdsm/pull/228 ?
> Albert, any chance it's related to https://github.com/oVirt/vdsm/pull/228 ? It sounds similar, but I don't think so. What triggered that change was an update in LVM that affected nodes running in rhel9 systems. LVM now only uses event activation in rhel9, so the flag 'event_activation=0' was causing a misbehavior that left LVs inactive and broke the boot. In this case, it seems to be a problem with LVM and multipath.
We didn't test this with node/rhvh yet so this might be a node-issue that we'll face once we test upgrades with RHVH
I think this is a duplicate of bug 2076262 David already fixed this in LVM, but I don't know when the fix will be available.
Jean-Louis, do you want to test the fix? you can use the rpms built by github here: https://github.com/oVirt/vdsm/actions/runs/2470998957
Updating severity, this cause node upgrade to fail and require fixing the host in emergency mode.
Nir: What would be a proper way to test on ovirt-node? Can't I just change the lvmlocal.conf and rebuild the initramfs (how to do that correctly?)
(In reply to Jean-Louis Dupond from comment #8) > Nir: What would be a proper way to test on ovirt-node? Can't I just change > the lvmlocal.conf and rebuild the initramfs (how to do that correctly?) Testing on existing hosts that has this issue (device in /etc/multiapth/wwids) and running "vdsm-tool config-lvm-filter" does not import the vgs devices to the devices file is good case to test. Steps: 1. Update the lvmlocal.conf with changes from the patch (new option, new revision) 2. Configure lvm: vdsm-tool config-lvm-filter 3. reboot Expected results: - use_devicesfile should be enabled in lvm.conf - lvm filter should be removed from lvm.conf - lvmdevices command should report all the relevant devices used by all host vgs. - host should reboot successfully I'm not sure that rebuilding the initramfs is needed since lvm does not use the devices file during early boot. If you want to be sure you can run dracut -f This may not be enough for ovirt-node. The other use case we need to test is upgrade - "vdsm-tool configue" installs a new lvmlocal.conf and we need to make sure the new file is use in the new layer after rebooting.
https://cbs.centos.org/koji/buildinfo?buildID=39701