Description of problem: After an upgrade from rhvh-4.3.5.3-0.20190805 to rhvh-4.3.5.4-0.20190920, the system can't boot because it cannot find some partitions. The reason is that something has added this filter to lvm.conf: filter = ["a|^/dev/sda2$|", "r|.*|"] Version-Release number of selected component (if applicable): redhat-release-virtualization-host-4.3.5-4.el7ev.x86_64 imgbased-1.1.9-0.1.el7ev.noarch How reproducible: Always Steps to Reproduce: 1. Install RHVH 4.3.5.3 from DVD. Select /dev/sda (local disk) as installation destination and automatic partitioning. 2. It boots and works fine. 3. Normal upgrade procedure from the manager to RHVH 4.3.5.4 4. Reboot Actual results: Some filesystems listed in fstab are not found and the boot fails. /dev/sda2 is in the LVM filter. Expected results: A bootable system Additional info: Boot device: 3600508b1001ceb54f3e889ef0ab0e76b dm-0 HP ,LOGICAL VOLUME size=279G features='1 queue_if_no_path' hwhandler='0' wp=rw `-+- policy='service-time 0' prio=1 status=active `- 0:1:0:0 sda 8:0 active ready running
The upgrade log looks ok, Nir, any idea what may be causing this ?
(In reply to Yuval Turgeman from comment #2) > The upgrade log looks ok, Nir, any idea what may be causing this ? We need output of lsblk when the system is running to understand which device is the boot device, which is probably missing from the lvm filter.
The easiest way to get a working lvm filter, is: - remove the current filter - reboot - run "vdsm-tool config-lvm-filter"
Not sure, I'll try to reproduce
The issue here is not the lvm filter (which should be used on every RHV* host) but the fact that multipath grabs a local device after the upgrade. The way to prevent this is to blacklist the local device in multiapth configuration. Unfortunately there is no automatic way to do this. Please see this for instructions on blacklisting local devices: https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/dm_multipath/ignore_localdisk_procedure Important notes for RHV/RHV-H hosts: 1. Do not edit /etc/multipath.conf This file is managed by vdsm and it may change when upgrading vdsm. To change multipath configuration, add a drop-in file like this: # cat /etc/multipath.conf.d/local.conf blacklist { wwid SIBM-ESXSST336732LC____F3ET0EP0Q000072428BX1 } 2. rebuild initramfs after the changes https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/dm_multipath/mp_initramfs
Ben, the issue is this bug is: 1. System running using /dev/sda (multipath never grabbed this device) 2. Upgrade kernel (lvm, multipath and systemd versions are the same) 3. After boot, multipath grabs /dev/sda (the device appears in /etc/multipath/wwids) Since the system uses lvm filter allowing access only to /dev/sda*, and multipath grabbed it, lvm cannot access the device and the machine enter emergency shell on boot. How do yo suggest to debug this issue?
What does the multipath configuration look like? Do you know why multipath wasn't grabbing /dev/sda before. One possibility is that multipath didn't grab it because simply because LVM always grabbed it first. Multipath won't claim a device it has never claimed before until it successfully creates a multipath device using it. If somehow, multipath managed to win a race with LVM to make use of the device, it would then add the device to the wwids file, and claim it in the future. So, the first set in debugging this is looking at the configuration, log messages, and udev database entry for sda, to figure out why multipath is trying to use it at all, if that's unknown. If we know that it was always trying to use the device, but failing the race with LVM, then I'm not sure why it won the race this time, but relying on it always losing isn't a good idea.
I don't have the logs at hand as it's only reproduced on a QE machine. We keep 2 installations available on the same machine, and when booting to the previous machine (different kernel), we had some multipath warning during boot, so it could be that with an older kernel multipath could not claim the device, leaving it to lvm. Can we get some more logs from both RHVH layers (journalctl should be enough) ?
Qin and I debugged this a little further a few days ago. What happens here is that after a fresh installation, multipath is not configured, so LVM claims /dev/sda. However, during an upgrade, imgbased calls `vdsm-tool configure` which configures multipath and then regenerates the initrd. This means that the next boot with the new initrd will allow multipath to claim the device if possible. If the user configured LVM filter to only allow /dev/sda, then the system will not boot. Bottom line, if the user configures an LVM filter, they should also configure multipath properly, I think this is covered well by the KCS, can we close this ?
Which KCS are you referring to? I'm looking at: https://access.redhat.com/solutions/3450192 but there's no reference to the multipath configuration. Should it be expanded with the steps of comment 23?
(In reply to Juan Orti Alcaine from comment #34) > Which KCS are you referring to? I'm looking at: > https://access.redhat.com/solutions/3450192 but there's no reference to the > multipath configuration. Should it be expanded with the steps of comment 23? I was talking about https://access.redhat.com/solutions/4000961 actually, but I guess we could expand the one you mentioned also