Bug 2007089 - [4.9] Intermittent failure mounting /run/media/iso when booting live ISO from USB stick
Summary: [4.9] Intermittent failure mounting /run/media/iso when booting live ISO from...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: RHCOS
Version: 4.10
Hardware: Unspecified
OS: Unspecified
urgent
medium
Target Milestone: ---
: 4.9.0
Assignee: Benjamin Gilbert
QA Contact: Michael Nguyen
URL:
Whiteboard:
Depends On: 2007085 2007086
Blocks: 2007090
TreeView+ depends on / blocked
 
Reported: 2021-09-23 06:01 UTC by RHCOS Bug Bot
Modified: 2021-10-18 17:52 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 2007085
: 2007090 (view as bug list)
Environment:
Last Closed: 2021-10-18 17:51:56 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github coreos coreos-assembler pull 2460 0 None open [rhcos-4.9] buildextend-live: fix `/run/media/iso` mount flake booting from disk 2021-09-23 16:37:37 UTC
Red Hat Product Errata RHSA-2021:3759 0 None None None 2021-10-18 17:52:20 UTC

Description RHCOS Bug Bot 2021-09-23 06:01:38 UTC
Backport the fix for bug 2007085 to 4.9.

Comment 1 Micah Abbott 2021-09-23 19:58:15 UTC
Original problem statement:

```
When booting the RHCOS live ISO image from a USB stick, boot sometimes fails with the following error:

[    9.410095] systemd[1]: Reached target Initrd Root Device.
[    9.415226] systemd[1]: Mounting /run/media/iso...
[    9.498191] ISOFS: Unable to identify CD-ROM format.
[    9.499552] mount[722]: mount: /run/media/iso: wrong fs type, bad option, bad superblock on /dev/sda2, missing codepage or helper program, or other error.
Failed to mount /run/media/iso.
See 'systemctl status run-media-iso.mount' for details.

35coreos-live/live-generator creates run-media-iso.mount, which mounts the ISO from /dev/disk/by-label/rhcos-<version>.  The latter should point to /dev/sda1, but in this case it's pointing to /dev/sda2, which is efiboot.img.  /dev/sda2 does not have a filesystem label, but because of https://github.com/systemd/systemd/issues/14408, 60-persistent-storage.rules falls back to using the filesystem label from /dev/sda (the whole-disk device) when evaluating symlinks for /dev/sda2.  As a result, /dev/sda1 and /dev/sda2 race to own the /dev/disk/by-label symlink, and if sda2 wins, the mount fails.

The race occurs on BIOS, UEFI El Torito, and UEFI GPT boots; kola testiso iso-as-disk has reproduced this on both BIOS and UEFI.  It does not affect CD or virtual CD boot because CD-ROM devices are not partitionable.  It does not affect Fedora CoreOS because the udev rules on FCOS include https://github.com/systemd/systemd/pull/14485.

The problem affects RHCOS 4.7 and up, but not the existing 4.6 ISO because the hybrid partition table in that release does not include a partition for efiboot.img.  However, the fix for bug 2004679 (https://github.com/coreos/coreos-assembler/pull/2439) backports the hybrid ESP to 4.6, so currently the next bootimage bump will cause 4.6 to regress.

To verify a fix for this bug, boot with a `udev.log_priority=debug` karg and ensure `journalctl -u systemd-udevd.service` has no mention of a `/dev/disk/by-label/rhcos-*` symlink being applied to `/dev/vda2` or `/dev/sda2`.
```

Comment 2 Micah Abbott 2021-09-23 19:59:41 UTC
Since there is a chance that a customer would not be able to boot into the RHCOS live ISO and therefore unable to install RHCOS/OCP, I am erring on the side of caution and marking this `blocker+`

Comment 3 Benjamin Gilbert 2021-09-24 17:17:51 UTC
Fixed in 49.84.202109241334-0 on x86_64, and the bug does not affect other arches.

Comment 4 RHCOS Bug Bot 2021-09-24 17:18:14 UTC
This bug has been reported fixed in a new RHCOS build.  Do not move this bug to MODIFIED until the fix has landed in a new bootimage.

Comment 5 RHCOS Bug Bot 2021-09-24 23:29:55 UTC
The fix for this bug has landed in a bootimage bump, as tracked in bug 2007086 (now in status MODIFIED).  Moving this bug to MODIFIED.

Comment 7 Michael Nguyen 2021-09-27 22:28:44 UTC
Verified on 49.84.202109241334-0

Downloads live ISO and wrote to USB drive with
`dd if=rhcos-49.84.202109241334-0-live.x86_64.iso of=/dev/sdb oflag=sync status=progress`

Then booted USB drive and edited the kernel arguments to include
`udev.log_priority=debug`

`journalctl -u systemd-udevd.service --no-pager | grep rhcos` returned no symlinks pointing to /dev/sda2 or /dev/vda2

Comment 10 errata-xmlrpc 2021-10-18 17:51:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3759


Note You need to log in before you can comment on or make changes to this bug.