Bug 2256843 - Booting from local ISO file no longer works in Fedora 39
Summary: Booting from local ISO file no longer works in Fedora 39
Keywords:
Status: ASSIGNED
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 39
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: 2184978
TreeView+ depends on / blocked
 
Reported: 2024-01-04 18:34 UTC by Jonathan Billings
Modified: 2024-02-07 14:03 UTC (History)
25 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Type: ---
Embargoed:


Attachments (Terms of Use)

Description Jonathan Billings 2024-01-04 18:34:53 UTC
Hello,

If I extract the vmlinuz and initrd from a Fedora minimal boot ISO, I can then set up a boot entry that has kernel parameters that look like this:

inst.repo=hd:UUID=whatever:/fedora.iso



Reproducible: Always

Steps to Reproduce:
1. On an existing linux system with an XFS or ext4 /boot, download a fedora minimal ISO, extract the vmlinuz and initrd.img and place them in /boot, along with the ISO. (call it fedora.iso)
2. Identify the UUID of the /boot volume. (lets just call it $BOOTUUID)
3. Create a boot entry (potentially in /boot/loader/entries/ that boots that vmlinuz and initrd, and add the kernel parameter: inst.repo=hd:UUID=$BOOTUUID:/fedora.iso
4. Boot the new boot entry.
Actual Results:  
In Fedora 38, this would pull down the stage2 installer and start the boot from there.

In Fedora 39, I get this error:
[    5.213316] dracut-initqueue[1311]: mount: /: not mount point or bad option.
[    5.213347] dracut-initqueue[1311]:        dmesg(1) may have more information after failed mount system call.
[    5.214527] dracut-initqueue[1312]: mount: /run/install/isodir: bad option: moving from a mount residing under a shared mount is unsupported.


Expected Results:  
Booting into the Fedora installer with no issues.

I have a script to automate migrating people's laptops from RHEL to Fedora (reloading in place) and it's been working fine for over 6 months using Fedora 37 and 38.  Fedora 39 seems to be when it stopped working.

Comment 1 Lukáš Nykrýn 2024-01-23 11:53:44 UTC
Can you please add rd.debug to the kernel cmdline, reproduce the issue and post the logs here? In ideal case, both from working and broken setup.

Comment 2 Jonathan Billings 2024-01-23 16:16:43 UTC
I've created a serial console on my VM and dumped the output to a file.  Some additional information which I didn't realize was pertinent: I have a kickstart on the same device and filesystem as the ISO image that is read by the inst.stage2.


I followed these steps for both the Fedora 38 and Fedora 39 netinst ISO on a CentOS 9 VM

1.) Downloaded Fedora-Everything-netinst-x86_64-38-1.6.iso and Fedora-Everything-netinst-x86_64-39-1.5.iso
2.) Install the 'libcdio' package (which includes /usr/bin/iso-read)
3.) Copy the ISO I'm testing to /boot/fedora.iso
4.) Run: iso-read -i /boot/fedora.iso --extract /images/pxeboot/vmlinuz --output-file /boot/vmlinuz
5.) Run: iso-read -i /boot/fedora.iso --extract /images/pxeboot/initrd.img --output-file /boot/initrd.img
6.) Copy a kickstart file to /boot/kickstart.cfg.  I intentionally put one with a syntax error so the installer errors out before loading. This is fine for the test because the error in Fedora 39 happens during dracut-initqueue, well before we start parsing the kickstart.
7.) Create a BLS entry so grub2 can load the new install:
# Get the machine-id
MACHINE_ID=$(cat /etc/machine-id)
# Get UUID of /boot
BOOT_UUID=$( findmnt -no UUID /boot )
# Write boot entry
cat > /boot/loader/entries/${MACHINE_ID}-99-fedora.conf <<EOF
title Install Fedora
version 1.0
linux /vmlinuz
initrd /initrd.img
options inst.stage2=hd:UUID=${BOOT_UUID}:/fedora.iso inst.ks=hd:UUID=${BOOT_UUID}:/kickstartcfg rd.debug console=ttyS1
id fedora-test
grub_users \$grub_users
grub_arg --unrestricted
grub_class kernel
EOF
8.) Add a serial device (in this example, ttyS1) that writes to a file.
9.) Reboot into the "Install Fedora" boot entry in GRUB2.
10.) Capture the serial output.

I will attach the two serial log outputs.

Comment 3 Jonathan Billings 2024-01-23 16:18:02 UTC
Created attachment 2009941 [details]
Fedora 38 netinst boot with kickstart

Comment 4 Jonathan Billings 2024-01-23 16:18:33 UTC
Created attachment 2009942 [details]
Fedora 39 netinst with kickstart

Comment 5 Jonathan Billings 2024-01-23 16:22:50 UTC
You can see that line 4176 in the Fedora 38 boot log, it runs 'mount --make-rprivate /' with no error, but on line 4205 of the Fedora 39 boot log, it runs 'mount --make-rprivate' and mount errors out with: mount: /run/install/isodir: bad option; moving a mount residing under a shared mount is unsupported.

Comment 6 Lukáš Nykrýn 2024-01-24 15:29:18 UTC
I've pinged util-linux maintainer to look at that. BUt honestly I have a feeling that this is a red herring. rprivate is the default. Also I know nothing about that part of the code, since that is probably called from the anaconda dracut module.

Comment 7 Lukáš Nykrýn 2024-01-24 15:51:56 UTC
Ok, I was wrong; it is where things go south.

[    7.505027] dracut-initqueue[1137]: + mount --make-rprivate /
[    7.611779] loop: module loaded
[    7.505093] dracut-initqueue[1178]: mount: /: not mount point or bad option.
[    7.505104] dracut-initqueue[1178]:        dmesg(1) may have more information after failed mount system call.
[    7.505118] dracut-initqueue[1137]: + mount --move /run/install/repo /run/install/isodir
[    7.506342] dracut-initqueue[1179]: mount: /run/install/isodir: bad option; moving a mount residing under a shared mount is unsupported.
[    7.506360] dracut-initqueue[1179]:        dmesg(1) may have more information after failed mount system call.
[    7.506375] dracut-initqueue[1137]: + iso=/run/install/isodir//fedora.iso
[    7.506387] dracut-initqueue[1137]: + mount -o loop,ro /run/install/isodir//fedora.iso /run/install/repo
[    7.518671] dracut-initqueue[1180]: mount: /run/install/repo: failed to setup loop device for /run/install/isodir//fedora.iso.

I will need some help from Karel; let's move it to util-linux

Comment 8 Lukáš Nykrýn 2024-01-24 16:28:29 UTC
Btw I was partly wrong about private being default. Systemd remounts it to be shared

https://github.com/systemd/systemd/blob/main/src/shared/mount-setup.c#L553

Comment 9 Karel Zak 2024-01-24 21:26:30 UTC
It would be nice to have strace output from the mount call (--make-rprivate), or define LIBMOUNT_DEBUG=all for the script ;-)

Comment 10 Karel Zak 2024-01-26 10:14:46 UTC
OK, I'm able to reproduce the problem. The problem is mount_setattr() syscall, which ends with EINVAL. In the same situation, mount(2) is successful ... not sure why.

A simple workaround is to call mount(8) with "LIBMOUNT_FORCE_MOUNT2=always mount --make-rprivate /". The variable disables the new mount kernel API.

Comment 11 Karel Zak 2024-01-30 14:48:38 UTC
Just for the record.

The simplest way to reproduce the problem is to reboot arbitrary Fedora 39 and add "rd.break" to the kernel command line. It will stop booting before the real system root is mounted, then you can use "mount --make-rprivate /" to see the problem.

Example (with strace):

# mount --make-rprivate /
   
open_tree(AT_FDCWD, "/", OPEN_TREE_CLOEXEC) = 3     
mount_setattr(-1, NULL, 0, NULL, 0)     = -1 EINVAL (Invalid argument)
mount_setattr(3, "", AT_EMPTY_PATH|AT_RECURSIVE, {attr_set=0, attr_clr=0, propagation=MS_PRIVATE, userns_
   
mount: /: not mount point or bad option.       
       dmesg(1) may have more information after failed mount system call.
+++ exited with 32 +++
   

The same situation but with mount(2) syscall:
   
# LIBMOUNT_FORCE_MOUNT2=always mount --make-rprivate /
   
mount("none", "/", NULL, MS_REC|MS_PRIVATE, NULL) = 0
+++ exited with 0 +++ 
   
# findmnt -o+PROPAGATION
TARGET                   SOURCE           FSTYPE     OPTIONS                                                            PROPAGATION
/                        rootfs           rootfs     rw                                                                 private
|-/proc                  proc             proc       rw,nosuid,nodev,noexec,relatime                                    private
|-/sys                   sysfs            sysfs      rw,nosuid,nodev,noexec,relatime                                    private
| |-/sys/kernel/security securityfs       securityfs rw,nosuid,nodev,noexec,relatime                                    private
| |-/sys/fs/cgroup       cgroup2          cgroup2    rw,nosuid,nodev,noexec,relatime,nsdelegate,memory_recursiveprot    private
| |-/sys/fs/pstore       pstore           pstore     rw,nosuid,nodev,noexec,relatime                                    private
| |-/sys/fs/bpf          bpf              bpf        rw,nosuid,nodev,noexec,relatime,mode=700                           private
| `-/sys/kernel/config   configfs         configfs   rw,nosuid,nodev,noexec,relatime                                    private
|-/dev                   devtmpfs         devtmpfs   rw,nosuid,size=4096k,nr_inodes=246475,mode=755,inode64             private
| |-/dev/shm             tmpfs            tmpfs      rw,nosuid,nodev,inode64                                            private
| `-/dev/pts             devpts           devpts     rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000              private
|-/run                   tmpfs            tmpfs      rw,nosuid,nodev,size=400108k,nr_inodes=819200,mode=755,inode64     private
`-/sysroot               /dev/vda3[/root] btrfs      ro,relatime,discard=async,space_cache=v2,subvolid=257,subvol=/root private


I've tested it with ext4 and btrfs, and the result is the same as expected.

Comment 12 Karel Zak 2024-01-30 14:58:35 UTC
Ah... the strace output without truncation:

open_tree(AT_FDCWD, "/", OPEN_TREE_CLOEXEC) = 3
mount_setattr(-1, NULL, 0, NULL, 0)     = -1 EINVAL (Invalid argument)
mount_setattr(3, "", AT_EMPTY_PATH|AT_RECURSIVE, {attr_set=0, attr_clr=0, propagation=MS_PRIVATE, userns_fd=0}, 32) = -1 EINVAL (Invalid argument)

Note that the first mount_setattr(-1, ...) call is just a libmount test to verify that the kernel supports the new mount API.

Comment 13 Christian Brauner 2024-02-05 13:52:23 UTC
So the only reason I can currently see for this is that check_mnt() fails. And for that to be the case the caller must be in a different mount namespace than the mount.
So when that script runs does it somehow unshare or create a mount namespace?

Comment 14 Christian Brauner 2024-02-06 10:34:01 UTC
Ok, figure it out afaict: https://lore.kernel.org/all/20240206-vfs-mount-rootfs-v1-1-19b335eee133@kernel.org

Comment 15 Karel Zak 2024-02-07 14:03:07 UTC
VFS issue, moving to the kernel.


Note You need to log in before you can comment on or make changes to this bug.