Bug 2169770
| Summary: | System boot hangs after mounting /sysroot when multipath is enabled in the initramfs | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 8 | Reporter: | Renaud Métrich <rmetrich> | ||||||
| Component: | device-mapper-multipath | Assignee: | Ben Marzinski <bmarzins> | ||||||
| Status: | CLOSED NOTABUG | QA Contact: | Lin Li <lilin> | ||||||
| Severity: | high | Docs Contact: | |||||||
| Priority: | high | ||||||||
| Version: | 8.7 | CC: | agk, bmarzins, cbesson, heinzm, jbrassow, lilin, msnitzer, prajnoha, pstodulk, zkabelac | ||||||
| Target Milestone: | rc | Keywords: | Triaged | ||||||
| Target Release: | --- | Flags: | pm-rhel:
mirror+
|
||||||
| Hardware: | All | ||||||||
| OS: | Linux | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2023-02-28 17:40:25 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Attachments: |
|
||||||||
Created attachment 1944134 [details]
strace of the boot showing the initial mount failing (/dev/sda2)
Created attachment 1944135 [details]
strace of the boot showing multipathd execution
Leapp should not be using "find_multipaths no". "find_multipaths yes", which is what mpathconf defaults to, means that multipath will find the devices with multiple paths, and only multipath those devices. "find_multipaths no" means that multipath will rely on the blacklist and blacklist_exceptions sections of multipath.conf to control what devices are multipathed. Any device that is not blacklisted will get multipathed. This option should not be used unless you then manually edit /etc/multipath.conf to only include the devices you want multipathed. I don't understand why leapp doesn't just use the default configuration: # mpathconf --enable --with_multipathd y "--with_multipathd y" just starts up multipath after enabling in. If create your multipath.conf with that command, does everything work correctly? The other options that leapp configured are: --user_friendly_names n: This uses the WWID for the multipath device name instead of a name in the form mpath<X>. This is a safe parameter to set. It's just not the default. --enable_foreign y: This allows the multipath command to display Native NVMe multipath devices. Again, this is perfectly safe to set. It's just not the default. Hi Ben, leapp is not using "find_multipaths no" anywhere unless I overlooked something. So if this is set, then it seems to be "default". Also if I understand right that this is happening when booting to the upgrade initramfs, then it's good to know that multipath.conf is not present currently in the environment where the initramfs is created. I'm pretty sure I know what's going on here. There's a difference between the default configuration generated by mpathconf, and the what you get with an empty /etc/multipath.conf file. Multipath can't be enabled with no /etc/multipath.conf file, but it will work with an empty one. If a configuration option isn't set the in the config file, multipath uses the compiled in default, which we share with upstream. If the multipath.conf file is empty, multipath will just use all the compiled in defaults, which for the multipath version in RHEL-8 include: find_multipaths no enable_foreign ".*" user_friendly_names no These aren't the defaults we want, since they make life more complicated for our users. So when you run # mpathconf --enable the config file it creates isn't blank, it is populated with our desired defaults. I assume Renaud's mpathconf command was attempting to create something like an empty multipath.conf file. Does the upgrade initramfs include an empty /etc/multipath.conf file if don't have multipath set up on the machines? If multipath isn't needed, then you probably shouldn't have any multipath.conf file. That will disable multipath. Not including the module would also work. Stepping back, I guess the first question is, does the upgrade initramfs get made specifically for each leap upgrade, or do you use a single stock initramfs for all upgrades? The upgrade initramfs is created on each machine inside the target userspace container (e.g. /var/lib/leapp/el9userspace), but the container does not see all host config files - only those that are explicitely told to be copied inside (which is not done for multipath). Here is the script generating the upgrade initramfs inside the container:
https://github.com/oamg/leapp-repository/blob/master/repos/system_upgrade/common/actors/initramfs/upgradeinitramfsgenerator/files/generate-initram.sh
So it's possible to affect a lot of stuff around, e.g. via following msgs:
TargetUserSpaceUpgradeTasks - affecting the content inside the container
* https://github.com/oamg/leapp-repository/blob/master/repos/system_upgrade/common/models/targetuserspace.py#L99
UpgradeInitramfsTasks - affecting the creation of the upgrade initramfs
* https://github.com/oamg/leapp-repository/blob/master/repos/system_upgrade/common/models/initramfs.py#L43
We are currently close to deadlines, so in case of need, we could sync during March.
It looks like the configuration here was always causing a race, since "find_multipaths no" was set, but no blacklists were set up, meaning that multipathd will try to multipath all block devices. If multipath should not be running on all devices, then either find_mutlipaths needs to be set to something like "yes", or blacklisting needs to be set up to make sure that only the correct devices are multipathed. Without this, it was always possible that a system change, like the one caused by leapp, would switch the winner of the race, and cause these sorts of errors. |
Description of problem: The initial issue is seen with Leapp (from RHEL7 to RHEL8) due to having "multipath" dracut module added to the initramfs even when there is no real multipath device. But the issue is reproducible outside of Leapp upgrade. When embedding "multipath" dracut module (hence multipath components) and have a multipath configuration to not find any multipath devices, we can see the boot hang after failing to mount the root file system (when it's specified by LABEL or UUID): -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- [ 158.400446] /dev/sda2: Can't open blockdev [FAILED] Failed to mount /sysroot. See 'systemctl status sysroot.mount' for details. [DEPEND] Dependency failed for Initrd Root File System. [DEPEND] Dependency failed for Reload Configuration from the Real Root. : Mounting /sysroot... [ 3.036073] XFS (dm-2): Mounting V5 Filesystem [ 3.171671] XFS (dm-2): Ending clean mount [ OK ] Mounted /sysroot. --> hang -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- The issue is easily reproducible while stracing the initrd. We can see the initial "mount /dev/sda2 /sysroot" failing with EBUSY: -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- 12025 15:16:54.841128 mount("/dev/sda2", "/sysroot", "xfs", MS_MGC_VAL|MS_RDONLY, NULL <unfinished ...> 12025 15:16:54.910320 <... mount resumed>) = -1 EBUSY (Device or resource busy) <0.069182> -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- In the mean time, we see multipathd reclaiming the device and creating the device map: -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- 5540 11626 15:16:54.869545 write(2<UNIX:[26475]>, "0QEMU_QEMU_HARDDISK_DISK1: load table [0 41943040 multipath 0 0 1 1 service-time 0 1 1 8:0 1]\n", 94 <unfinished ...> 5541 11626 15:16:54.870278 <... write resumed>) = 94 <0.000032> -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- I believe all this is due to a "race" between discovery and reclaiming. Somehow multipath kernel module must "lock" /dev/sda2", causing the initial mount to fail with "Device or resource busy". Version-Release number of selected component (if applicable): RHEL8.4 but also RHEL8.7 (so probably every release) kernel-4.18.0-425.10.1.el8_7.x86_64 systemd-239-68.el8_7.2.x86_64 device-mapper-multipath-0.8.4-28.el8_7.1.x86_64 How reproducible: Always Steps to Reproduce: 1. Setup a VM with 1 disk configured in SCSI mode -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- <disk type='file' device='disk'> <driver name='qemu' type='raw' cache='none' discard='unmap'/> <source file='/var/lib/libvirt/images/scsi8-disk1.img' index='1'/> <backingStore/> <target dev='sdb' bus='scsi'/> <shareable/> <serial>DISK1</serial> <alias name='scsi0-0-0-1'/> <address type='drive' controller='0' bus='0' target='0' unit='1'/> </disk> -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- 2. Install the VM with RHEL8.7 I forced using GPT partition (inst.gpt) but it's probably not needed. Make sure to have / configured as XFS without LVM (so /dev/sda2 hosting /). Don't configure any /boot. 3. Once installed, install multipath and configure it similarly to what Leapp does -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- # yum -y install device-mapper-multipath # mpathconf --enable --user_friendly_names n --find_multipaths no --with_multipathd y --enable_foreign y # systemctl restart multipathd # dracut -f --add multipath -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- 4. Reboot Actual results: Hang occurs just before switchroot. We can see /sysroot being mounted, unmounted and remounted (first one is /dev/sda2): -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- Mounting /sysroot... [ OK ] Unmounted /sysroot. Mounting /sysroot... [ 3.036073] XFS (dm-2): Mounting V5 Filesystem [ 3.171671] XFS (dm-2): Ending clean mount [ OK ] Mounted /sysroot. -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- Additional info: When stracing the initrd, we can see the initial mount fails with Device busy: -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- mount: /sysroot: /dev/sda2 already mounted or mount point busy. -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- On the customer system, we can see the error message even without strace, probably this is due to a race that shows better with strace. To strace the initrd, proceed as shown below: 1. Install strace and embed it in the initramfs -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- # yum -y install strace # dracut -f --add multipath --install strace -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- 2. Reboot and stop at Grub menu and edit the entry, add the following -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- rdinit=/bin/sh -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- 3. Continue the boot, at the "sh" prompt, execute "init" under strace -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- sh-4.4# mkdir /strace sh-4.4# mount -t tmpfs tmpfs /strace sh-4.4# exec strace -fttTvyy -s 128 -o /strace/init.strace -D -- /init -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- 4. At the prompt (due to mouting of /sysroot failing), store the strace to the root file system -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- :/# ps -eaf | grep strace root 284 1 13 16:21 ? 00:00:06 strace -fttTvyy -s 128 -o /strace/init.strace -D -- /init root 12998 12644 0 16:21 ttyS0 00:00:00 grep strace :/# kill -9 284 :/# mount -o rw,remount /sysroot :/# mv /strace/init.strace /sysroot/ :/# exit -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- 5. Once exiting (last line above), the system hangs and never switches root, even though /sysroot got remounted (as a multipath device)