Bug 1760223 - Upgraded RHVH host can't boot due to LVM filters
Summary: Upgraded RHVH host can't boot due to LVM filters
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: imgbased
Version: 4.3.5
Hardware: x86_64
OS: Linux
medium
high
Target Milestone: ovirt-4.3.8
: ---
Assignee: Yuval Turgeman
QA Contact: peyu
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-10-10 08:13 UTC by Juan Orti
Modified: 2020-08-31 07:06 UTC (History)
15 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-01-03 21:01:36 UTC
oVirt Team: Node
Target Upstream Version:
Embargoed:
lsvaty: testing_plan_complete-


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 3450192 0 None None None 2020-01-03 21:01:36 UTC
Red Hat Knowledge Base (Solution) 4000961 0 None None None 2019-10-10 08:13:17 UTC

Description Juan Orti 2019-10-10 08:13:18 UTC
Description of problem:
After an upgrade from rhvh-4.3.5.3-0.20190805 to rhvh-4.3.5.4-0.20190920, the system can't boot because it cannot find some partitions.
The reason is that something has added this filter to lvm.conf:

filter = ["a|^/dev/sda2$|", "r|.*|"]

Version-Release number of selected component (if applicable):
redhat-release-virtualization-host-4.3.5-4.el7ev.x86_64
imgbased-1.1.9-0.1.el7ev.noarch

How reproducible:
Always

Steps to Reproduce:
1. Install RHVH 4.3.5.3 from DVD. Select /dev/sda (local disk) as installation destination and automatic partitioning.
2. It boots and works fine.
3. Normal upgrade procedure from the manager to RHVH 4.3.5.4
4. Reboot

Actual results:
Some filesystems listed in fstab are not found and the boot fails. /dev/sda2 is in the LVM filter.

Expected results:
A bootable system

Additional info:
Boot device:
3600508b1001ceb54f3e889ef0ab0e76b dm-0 HP      ,LOGICAL VOLUME  
size=279G features='1 queue_if_no_path' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
  `- 0:1:0:0  sda  8:0    active ready running

Comment 2 Yuval Turgeman 2019-10-10 08:46:35 UTC
The upgrade log looks ok, Nir, any idea what may be causing this ?

Comment 4 Nir Soffer 2019-10-10 09:07:35 UTC
(In reply to Yuval Turgeman from comment #2)
> The upgrade log looks ok, Nir, any idea what may be causing this ?

We need output of lsblk when the system is running to understand which
device is the boot device, which is probably missing from the lvm filter.

Comment 5 Nir Soffer 2019-10-10 09:08:42 UTC
The easiest way to get a working lvm filter, is:
- remove the current filter
- reboot
- run "vdsm-tool config-lvm-filter"

Comment 13 Yuval Turgeman 2019-10-22 07:42:01 UTC
Not sure, I'll try to reproduce

Comment 23 Nir Soffer 2019-10-24 11:58:24 UTC
The issue here is not the lvm filter (which should be used on every RHV* host)
but the fact that multipath grabs a local device after the upgrade.

The way to prevent this is to blacklist the local device in multiapth
configuration. Unfortunately there is no automatic way to do this.

Please see this for instructions on blacklisting local devices:
https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/dm_multipath/ignore_localdisk_procedure

Important notes for RHV/RHV-H hosts:

1. Do not edit /etc/multipath.conf

This file is managed by vdsm and it may change when upgrading vdsm.

To change multipath configuration, add a drop-in file like this:

# cat /etc/multipath.conf.d/local.conf
blacklist {
      wwid SIBM-ESXSST336732LC____F3ET0EP0Q000072428BX1
}


2. rebuild initramfs after the changes

https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/dm_multipath/mp_initramfs

Comment 24 Nir Soffer 2019-10-24 12:07:12 UTC
Ben, the issue is this bug is:

1. System running using /dev/sda (multipath never grabbed this device)
2. Upgrade kernel (lvm, multipath and systemd versions are the same)
3. After boot, multipath grabs /dev/sda (the device appears in /etc/multipath/wwids)

Since the system uses lvm filter allowing access only to /dev/sda*, and
multipath grabbed it, lvm cannot access the device and the machine enter
emergency shell on boot.

How do yo suggest to debug this issue?

Comment 25 Ben Marzinski 2019-10-29 22:26:19 UTC
What does the multipath configuration look like? Do you know why multipath wasn't grabbing /dev/sda before. One possibility is that multipath didn't grab it because simply because LVM always grabbed it first. Multipath won't claim a device it has never claimed before until it successfully creates a multipath device using it. If somehow, multipath managed to win a race with LVM to make use of the device, it would then add the device to the wwids file, and claim it in the future.

So, the first set in debugging this is looking at the configuration, log messages, and udev database entry for sda, to figure out why multipath is trying to use it at all, if that's unknown.  If we know that it was always trying to use the device, but failing the race with LVM, then I'm not sure why it won the race this time, but relying on it always losing isn't a good idea.

Comment 26 Yuval Turgeman 2019-11-04 08:48:07 UTC
I don't have the logs at hand as it's only reproduced on a QE machine.  We keep 2 installations available on the same machine, and when booting to the previous machine (different kernel), we had some multipath warning during boot, so it could be that with an older kernel multipath could not claim the device, leaving it to lvm.  Can we get some more logs from both RHVH layers (journalctl should be enough) ?

Comment 33 Yuval Turgeman 2019-12-01 12:10:54 UTC
Qin and I debugged this a little further a few days ago.  What happens here is that after a fresh installation, multipath is not configured, so LVM claims /dev/sda.
However, during an upgrade, imgbased calls `vdsm-tool configure` which configures multipath and then regenerates the initrd.  This means that the next boot with the new initrd will allow multipath to claim the device if possible.
If the user configured LVM filter to only allow /dev/sda, then the system will not boot.  Bottom line, if the user configures an LVM filter, they should also configure multipath properly, I think this is covered well by the KCS, can we close this ?

Comment 34 Juan Orti 2019-12-02 07:47:46 UTC
Which KCS are you referring to? I'm looking at: https://access.redhat.com/solutions/3450192 but there's no reference to the multipath configuration. Should it be expanded with the steps of comment 23?

Comment 35 Yuval Turgeman 2019-12-03 14:15:05 UTC
(In reply to Juan Orti Alcaine from comment #34)
> Which KCS are you referring to? I'm looking at:
> https://access.redhat.com/solutions/3450192 but there's no reference to the
> multipath configuration. Should it be expanded with the steps of comment 23?

I was talking about https://access.redhat.com/solutions/4000961 actually, but I guess we could expand the one you mentioned also


Note You need to log in before you can comment on or make changes to this bug.