Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 2169770

Summary: System boot hangs after mounting /sysroot when multipath is enabled in the initramfs
Product: Red Hat Enterprise Linux 8 Reporter: Renaud Métrich <rmetrich>
Component: device-mapper-multipathAssignee: Ben Marzinski <bmarzins>
Status: CLOSED NOTABUG QA Contact: Lin Li <lilin>
Severity: high Docs Contact:
Priority: high    
Version: 8.7CC: agk, bmarzins, cbesson, heinzm, jbrassow, lilin, msnitzer, prajnoha, pstodulk, zkabelac
Target Milestone: rcKeywords: Triaged
Target Release: ---Flags: pm-rhel: mirror+
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-02-28 17:40:25 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
strace of the boot showing the initial mount failing (/dev/sda2)
none
strace of the boot showing multipathd execution none

Description Renaud Métrich 2023-02-14 16:26:16 UTC
Description of problem:

The initial issue is seen with Leapp (from RHEL7 to RHEL8) due to having "multipath" dracut module added to the initramfs even when there is no real multipath device. But the issue is reproducible outside of Leapp upgrade.

When embedding "multipath" dracut module (hence multipath components) and have a multipath configuration to not find any multipath devices, we can see the boot hang after failing to mount the root file system (when it's specified by LABEL or UUID):
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
[  158.400446] /dev/sda2: Can't open blockdev
[FAILED] Failed to mount /sysroot.
See 'systemctl status sysroot.mount' for details.
[DEPEND] Dependency failed for Initrd Root File System.
[DEPEND] Dependency failed for Reload Configuration from the Real Root.
 :
         Mounting /sysroot...
[    3.036073] XFS (dm-2): Mounting V5 Filesystem
[    3.171671] XFS (dm-2): Ending clean mount
[  OK  ] Mounted /sysroot.
--> hang
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

The issue is easily reproducible while stracing the initrd.
We can see the initial "mount /dev/sda2 /sysroot" failing with EBUSY:
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
12025 15:16:54.841128 mount("/dev/sda2", "/sysroot", "xfs", MS_MGC_VAL|MS_RDONLY, NULL <unfinished ...>
12025 15:16:54.910320 <... mount resumed>) = -1 EBUSY (Device or resource busy) <0.069182>
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

In the mean time, we see multipathd reclaiming the device and creating the device map:
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
5540 11626 15:16:54.869545 write(2<UNIX:[26475]>, "0QEMU_QEMU_HARDDISK_DISK1: load table [0 41943040 multipath 0 0 1 1 service-time 0 1 1 8:0 1]\n", 94 <unfinished ...>
5541 11626 15:16:54.870278 <... write resumed>) = 94 <0.000032>
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

I believe all this is due to a "race" between discovery and reclaiming. Somehow multipath kernel module must "lock" /dev/sda2", causing the initial mount to fail with "Device or resource busy".

Version-Release number of selected component (if applicable):

RHEL8.4 but also RHEL8.7 (so probably every release)

kernel-4.18.0-425.10.1.el8_7.x86_64
systemd-239-68.el8_7.2.x86_64
device-mapper-multipath-0.8.4-28.el8_7.1.x86_64

How reproducible:

Always

Steps to Reproduce:
1. Setup a VM with 1 disk configured in SCSI mode

  -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
    <disk type='file' device='disk'>
      <driver name='qemu' type='raw' cache='none' discard='unmap'/>
      <source file='/var/lib/libvirt/images/scsi8-disk1.img' index='1'/>
      <backingStore/>
      <target dev='sdb' bus='scsi'/>
      <shareable/>
      <serial>DISK1</serial>
      <alias name='scsi0-0-0-1'/>
      <address type='drive' controller='0' bus='0' target='0' unit='1'/>
    </disk>
  -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

2. Install the VM with RHEL8.7

  I forced using GPT partition (inst.gpt) but it's probably not needed.
  Make sure to have / configured as XFS without LVM (so /dev/sda2 hosting /).
  Don't configure any /boot.

3. Once installed, install multipath and configure it similarly to what Leapp does

  -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
  # yum -y install device-mapper-multipath
  # mpathconf --enable --user_friendly_names n --find_multipaths no --with_multipathd y --enable_foreign y
  # systemctl restart multipathd
  # dracut -f --add multipath
  -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

4. Reboot

Actual results:

  Hang occurs just before switchroot. We can see /sysroot being mounted, unmounted and remounted (first one is /dev/sda2):
  -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
         Mounting /sysroot...
  [  OK  ] Unmounted /sysroot.
         Mounting /sysroot...
  [    3.036073] XFS (dm-2): Mounting V5 Filesystem
  [    3.171671] XFS (dm-2): Ending clean mount
  [  OK  ] Mounted /sysroot.
  -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

Additional info:

  When stracing the initrd, we can see the initial mount fails with Device busy:
  -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
  mount: /sysroot: /dev/sda2 already mounted or mount point busy.
  -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

  On the customer system, we can see the error message even without strace, probably this is due to a race that shows better with strace.

  To strace the initrd, proceed as shown below:

  1. Install strace and embed it in the initramfs

    -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
    # yum -y install strace
    # dracut -f --add multipath --install strace
    -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

  2. Reboot and stop at Grub menu and edit the entry, add the following

    -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
    rdinit=/bin/sh
    -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

  3. Continue the boot, at the "sh" prompt, execute "init" under strace

    -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
    sh-4.4# mkdir /strace
    sh-4.4# mount -t tmpfs tmpfs /strace
    sh-4.4# exec strace -fttTvyy -s 128 -o /strace/init.strace -D -- /init
    -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

  4. At the prompt (due to mouting of /sysroot failing), store the strace to the root file system

    -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
    :/# ps -eaf | grep strace
    root         284       1 13 16:21 ?        00:00:06 strace -fttTvyy -s 128 -o /strace/init.strace -D -- /init
    root       12998   12644  0 16:21 ttyS0    00:00:00 grep strace
    :/# kill -9 284
    :/# mount -o rw,remount /sysroot
    :/# mv /strace/init.strace /sysroot/
    :/# exit
    -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

  5. Once exiting (last line above), the system hangs and never switches root, even though /sysroot got remounted (as a multipath device)

Comment 1 Renaud Métrich 2023-02-14 16:29:41 UTC
Created attachment 1944134 [details]
strace of the boot showing the initial mount failing (/dev/sda2)

Comment 2 Renaud Métrich 2023-02-14 16:30:16 UTC
Created attachment 1944135 [details]
strace of the boot showing multipathd execution

Comment 3 Ben Marzinski 2023-02-14 23:43:11 UTC
Leapp should not be using "find_multipaths no".

"find_multipaths yes", which is what mpathconf defaults to,  means that multipath will find the devices with multiple paths, and only multipath those devices. "find_multipaths no" means that multipath will rely on the blacklist and blacklist_exceptions sections of multipath.conf to control what devices are multipathed.  Any device that is not blacklisted will get multipathed. This option should not be used unless you then manually edit /etc/multipath.conf to only include the devices you want multipathed.

I don't understand why leapp doesn't just use the default configuration:

# mpathconf --enable --with_multipathd y

"--with_multipathd y" just starts up multipath after enabling in. If create your multipath.conf with that command, does everything work correctly?

Comment 4 Ben Marzinski 2023-02-14 23:52:40 UTC
The other options that leapp configured are:

--user_friendly_names n: This uses the WWID for the multipath device name instead of a name in the form mpath<X>. This is a safe parameter to set. It's just not the default.
--enable_foreign y: This allows the multipath command to display Native NVMe multipath devices.  Again, this is perfectly safe to set. It's just not the default.

Comment 5 Petr Stodulka 2023-02-15 09:03:33 UTC
Hi Ben, leapp is not using "find_multipaths no" anywhere unless I overlooked something. So if this is set, then it seems to be "default". Also if I understand right that this is happening when booting to the upgrade initramfs, then it's good to know that multipath.conf is not present currently in the environment where the initramfs is created.

Comment 6 Ben Marzinski 2023-02-15 16:01:54 UTC
I'm pretty sure I know what's going on here.  There's a difference between the default configuration generated by mpathconf, and the what you get with an empty /etc/multipath.conf file. Multipath can't be enabled with no /etc/multipath.conf file, but it will work with an empty one.  If a configuration option isn't set the in the config file, multipath uses the compiled in default, which we share with upstream. If the multipath.conf file is empty, multipath will just use all the compiled in defaults, which for the multipath version in RHEL-8 include:

find_multipaths no
enable_foreign ".*"
user_friendly_names no

These aren't the defaults we want, since they make life more complicated for our users. So when you 
run

# mpathconf --enable

the config file it creates isn't blank, it is populated with our desired defaults. I assume Renaud's mpathconf command was attempting to create something like an empty multipath.conf file.

Does the upgrade initramfs include an empty /etc/multipath.conf file if don't have multipath set up on the machines? If multipath isn't needed, then you probably shouldn't have any multipath.conf file. That will disable multipath. Not including the module would also work.  Stepping back, I guess the first question is, does the upgrade initramfs get made specifically for each leap upgrade, or do you use a single stock initramfs for all upgrades?

Comment 7 Petr Stodulka 2023-02-15 16:39:42 UTC
The upgrade initramfs is created on each machine inside the target userspace container (e.g. /var/lib/leapp/el9userspace), but the container does not see all host config files - only those that are explicitely told to be copied inside (which is not done for multipath). Here is the script generating the upgrade initramfs inside the container:
    https://github.com/oamg/leapp-repository/blob/master/repos/system_upgrade/common/actors/initramfs/upgradeinitramfsgenerator/files/generate-initram.sh

So it's possible to affect a lot of stuff around, e.g. via following msgs:
    TargetUserSpaceUpgradeTasks - affecting the content inside the container
       * https://github.com/oamg/leapp-repository/blob/master/repos/system_upgrade/common/models/targetuserspace.py#L99
    UpgradeInitramfsTasks       - affecting the creation of the upgrade initramfs
       * https://github.com/oamg/leapp-repository/blob/master/repos/system_upgrade/common/models/initramfs.py#L43

We are currently close to deadlines, so in case of need, we could sync during March.

Comment 11 Ben Marzinski 2023-02-28 17:40:25 UTC
It looks like the configuration here was always causing a race, since "find_multipaths no" was set, but no blacklists were set up, meaning that multipathd will try to multipath all block devices. If multipath should not be running on all devices, then either find_mutlipaths needs to be set to something like "yes", or blacklisting needs to be set up to make sure that only the correct devices are multipathed.  Without this, it was always possible that a system change, like the one caused by leapp, would switch the winner of the race, and cause these sorts of errors.