2236345 – NFS mounts during installer dracut phase (for repos, kickstarts...) fail with SELinux enabled due to an 'incorrect mount option' with kernel 6.6

Bug 2236345 - NFS mounts during installer dracut phase (for repos, kickstarts...) fail with SELinux enabled due to an 'incorrect mount option' with kernel 6.6

Summary: NFS mounts during installer dracut phase (for repos, kickstarts...) fail with...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	39
Hardware:	Unspecified
OS:	Linux
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Assignee:	Ondrej Mosnacek
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:	openqa
Depends On:
Blocks:	F39FinalBlocker F40BetaBlocker
TreeView+	depends on / blocked

Reported:	2023-08-31 01:54 UTC by Adam Williamson
Modified:	2023-09-21 00:16 UTC (History)
CC List:	21 users (show)
Fixed In Version:	kernel-6.5.4-300.fc39
Clone Of:
Environment:
Last Closed:	2023-09-21 00:16:14 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Adam Williamson 2023-08-31 01:54:40 UTC

In today's Rawhide compose, several openQA tests failed that passed with the last compose. They're all tests that use NFS for an install repo or a kickstart, and so hit the path where the NFS repo has to get mounted during the dracut phase before the installer starts.

In each case, the error "mount.nfs: an incorrect mount option was specified for <mount point>" appears.

Neither dracut nor nfs-utils changed between the working compose (Fedora-Rawhide-20230828.n.0) and the broken on (20230830.n.0). anaconda did change, but I'm pretty sure nothing related to this was touched. I think the kernel is the most likely suspect.

I haven't yet managed to reproduce this manually, however. The dracut codepath here is quite convoluted and it doesn't print exactly what mount options it's actually using. I will try to dig into this further and find out exactly what's going on tomorrow if nobody beats me to it, I ran out of time today. Annoyingly it seems like when this happens the failures just loop forever, they never give up and dump you at a dracut console, so you can't look at logs and stuff to try and figure out what the mount options were...

Reproducible: Always

Comment 1 Adam Williamson 2023-08-31 19:21:34 UTC

So, hmm. This only fails in the initramfs environment for some reason. This is the mount command it's using:

mount -t nfs -o ro,nfsvers=4 (serverip):(path) (target)

If I run that within a current booted Rawhide system, it works fine. If I run it in the dracut environment, it shows the error.

It seems like every time I try it in the dracut environment, three identical messages appear in the journal:

SELinux: Unable to set superblock options before the security server is initialized

This seems to be related (I think), because if I do the same on a recent F39 image which doesn't have this problem, the mount works and I do not see those errors in the journal.

Comment 2 Adam Williamson 2023-08-31 19:31:10 UTC

If I boot with selinux=0 , the mount works in the installer initramfs environment. CCing Zdenek (selinux maintainer). It doesn't look like anything SELinux-related outside the kernel changed in the affected compose, though, so this is still most likely the kernel or possibly glibc (which *also* did a big version bump in the same compose).

Comment 3 Adam Williamson 2023-08-31 19:32:42 UTC

Proposing as an F40 Beta blocker, this violates Beta criterion "The installer must be able to use all available kickstart delivery methods" (can't use NFS kickstart delivery).

Comment 4 Zdenek Pytela 2023-08-31 19:41:50 UTC

Note in initramfs SELinux is not initialized, it is one of the systemd first actions, but *after* switchroot.
Anyway adding Ondrej.

Comment 5 Adam Williamson 2023-08-31 19:50:05 UTC

Here's my best explanation of how to reproduce, using a VM.

You'll need to set your host up with an NFS share. I put this in /etc/exports:

/var/tmp 192.168.124.0/24(ro)

this is assuming you have the setup where your host has a virbr0 interface that's 192.168.124.1 and VMs get an IP in the range 192.168.124.x ; if not, adjust to taste. Then I started nfs-server.service , and in firewall-config , ensure the 'nfs' service is enabled in the 'libvirt' zone.

Now, grab https://kojipkgs.fedoraproject.org/compose/rawhide/Fedora-Rawhide-20230831.n.0/compose/Everything/x86_64/iso/Fedora-Everything-netinst-x86_64-Rawhide-20230831.n.0.iso , and boot it in the VM. Add 'ip=dhcp rd.break' to the boot options. You should boot into the installer initramfs environment.

Run:

mkdir -p /mnt/tmp
mount -t nfs -o ro,nfsvers=4 192.168.124.1:/var/tmp /mnt/tmp

...and that should reproduce the problem.

Now you can try adding 'selinux=0' to the boot params, do everything else the same, and it should work. You can also try an older Rawhide ISO - https://kojipkgs.fedoraproject.org/compose/rawhide/Fedora-Rawhide-20230828.n.0/compose/Everything/x86_64/iso/Fedora-Everything-netinst-x86_64-Rawhide-20230828.n.0.iso - or a current F39 ISO, and check that this works without needing selinux=0 .

Comment 6 Adam Williamson 2023-08-31 19:51:39 UTC

jforbes points out that https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d80a8f1b58c2bc8d7c6bfb65401ea4f7ec8cddc2 looks relevant here, CCing David and Jeff.

Comment 7 Adam Williamson 2023-09-08 22:25:12 UTC

Ping? This is still broken in Rawhide...

Comment 8 Ondrej Mosnacek 2023-09-09 16:07:31 UTC

I'm having trouble reproducing it... I'm using the cloud image to start a VM and then add a dracut module that attempts to mount a NFS filesystem served by another VM (I'm trying to avoid manual steps). I also add ip=dhcp on the kernel command-line as you suggest in comment 5, but it doesn't seem to work (I get "ping: connect: Network is unreachable" when I try to ping the other machine from the dracut hook). According to [1] it should only work with CONFIG_IP_PNP and CONFIG_IP_PNP_DHCP enabled in the kernel, but the Fedora kernel doesn't have them enabled, so I'm confused as to why it appears to work in your case.

Any ideas?

[1] https://serverfault.com/questions/110218/kernel-level-ip-configuration-not-working-in-linux-kernel

Comment 9 Adam Williamson 2023-09-09 16:33:02 UTC

The installer initramfs is customized to some extent, so it may not be reproducible with the setup you're using. It's not the kernel that's handling the `ip=dhcp` arg, it's dracut with an assist from NetworkManager - see https://man7.org/linux/man-pages/man7/dracut.cmdline.7.html and https://man.archlinux.org/man/nm-initrd-generator.8.en (I'm not sure off the top of my head if we use the network-legacy thing or nm-initrd-generator in Fedora initramfses, but `ip=dhcp` is valid for either).

Comment 10 Adam Williamson 2023-09-09 16:34:23 UTC

You, uh, are ensuring your virtual network config is setup such that the two VMs can talk to each other, I assume? Default libvirt VMs cannot, they can only talk to the host and the wider internet via the host. You need to do custom config for VMs to be able to communicate with each other. See if you can ping the host...

Comment 11 Ondrej Mosnacek 2023-09-11 08:04:39 UTC

Ok, I figured out the dracut networking problem. I had to install the dracut-network package and add rd.neednet=1 instead of ip=dhcp to the kernel cmdline. But the real problem in my reproducers was which directory and how I exported on the server. I had copy pasted it from another reproducer and when I changed it to export and mount /var/tmp as in your example, it failed as you describe. (May likely have something to do with the no_root_squash export option I was using.)

I think I actually see the problem now. It looks like d80a8f1b58c2bc8d7c6bfb65401ea4f7ec8cddc2 is indeed the culprit. Let me draft and test a patch...

Comment 12 Ondrej Mosnacek 2023-09-12 07:43:15 UTC

Patch posted upstream for review:
https://lore.kernel.org/selinux/20230911142358.883728-1-omosnace@redhat.com/

Comment 13 Ondrej Mosnacek 2023-09-15 20:47:33 UTC

The fix is now in mainline:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ccf1dab96be4caed7c5235b1cfdb606ac161b996

Comment 14 Adam Williamson 2023-09-18 15:34:24 UTC

Thanks. Meanwhile, this seems to have made its way into F39 now, even though F39 is on kernel 6.5:

https://openqa.fedoraproject.org/tests/2148986#step/_boot_to_anaconda/23

so perhaps this needs a backport to 6.5 upstream as well? Justin, can we have the fix backported for F39 and Rawhide? Thanks!

Comment 15 Adam Williamson 2023-09-18 15:35:18 UTC

Marking as an F39 Final blocker, same criterion as cited for F40, but now this is affecting F39 too (presumably since kernel 6.5.3 went stable).

Comment 16 Fedora Update System 2023-09-19 16:06:54 UTC

FEDORA-2023-1b206dc9f4 has been submitted as an update to Fedora 39. https://bodhi.fedoraproject.org/updates/FEDORA-2023-1b206dc9f4

Comment 17 Fedora Update System 2023-09-20 02:07:17 UTC

FEDORA-2023-1b206dc9f4 has been pushed to the Fedora 39 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2023-1b206dc9f4`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2023-1b206dc9f4

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 18 Fedora Update System 2023-09-21 00:16:14 UTC

FEDORA-2023-1b206dc9f4 has been pushed to the Fedora 39 stable repository.
If problem still persists, please make note of it in this bug report.

Note You need to log in before you can comment on or make changes to this bug.

acaringi
adscvr
airlied
alciregi
bskeggs
dhowells
hdegoede
hpa
jarod
jlayton
josef
kernel-maint
lgoncalv
linville
masami256
mchehab
omosnace
ptalbert
robatino
steved
zpytela