Created attachment 1810130 [details] rdsosreport Description of problem: The machine is using an NFS system as root. After upgrading the machine from Fedora 32 to 34, it no longer boots, but hangs while running the code in the initramfs. It finally writes a message "Could not boot" and drops to an emergency shell. Version-Release number of selected component (if applicable): dracut-055-3.fc34.x86_64 How reproducible: Every time Steps to Reproduce: 1. Boot latest kernel Actual results: Ends up in emergency shell. Expected results: Successful boot. Additional info: In the emergency shell, trying the command to mount the root file system, (mount 172.17.0.1:/remote/pluto /sysroot) works fine without any problems. Going back to the last kernel from Fedora 32, 5.11.22-100.fc32.x86_64, the boot works fine and the machine comes up, using the newly installed userspace for Fedora 34. My understanding is the problem is not because of differences in the kernel proper, but some change in how dracut tries to boot the system in the initramfs, thus the assignment of the bugzilla. As a side note: I previously used the short name for the NFS server (mimmi) on the kernel command line. The system was able to add the search domain and look up the name. This still works for the Fedora 32 kernel, but in the emergency shell name resolution does not seem to work. This might very well be a separate problem or simply a consequence of the other error, but I wanted to mention it just in case it is relevant. I have changed to the IP address on the kernel command line to avoid this particular issue for the time being. I've tried to debug it. I'm not sure if the following helps, but in case it does this is what I have figured out. It seems to me dracut-initqueue fails since it is waiting for the file "$hookdir"/initqueue/finished/nfsroot.sh to succeed. This file contains code looking for the /proc directory in /sysroot. I fail to understand how that could show up during the initqueue phase. It depends on the (NFS) root directory being mounted, but that doesn't happen until sysroot.mount is done. Sysroot.mount in turn is waiting for dracut-pre-mount.service, which in turn waits for dracut-initqueue.service. It looks like a circular dependency to me. But when I look at the Fedora 32 initramfs, it looks very much the same, so it should have been a loop there too. It is clearly something important I don't understand. Comparing the dracut-initqueue scripts between the two versions (ignoring whitespace adjustments etc.) there is a difference in the loop inside the "if [ $main_loop -gt $((2 * RDRETRY / 3)) ]" conditional on line 61. An inner conditional wasn't there in the previous version, and it looks a little strange there is code checking in initqueue/finished but then actually running initqueue/timeout. But again, I don't see how this makes any difference; the conditional should succeed since there always is something in initqueue/finished, nfsroot.sh to be precise.
I've tried to debug this and think I have a bit more information. A relevant difference seems to be that while the F32 dracut used ifup-style scripting to bring the network up, the F34 version doesn't. I thought I had switched to networkd long ago, but although there were definitions in place in /etc/systemd/network when I made the last initramfs under 32, it still uses ifup-style scripting in the initramfs. To my surprise, somewhere in those scripts, the NFS root gets mounted on /sysroot. That's why it doesn't hang the same way in the oldy dracut; the file system is mounted and thus /proc is there. Is this really how things are intended to work? I would have thought the system should be mounted later in sysroot.mount (from man dracut.bootup). In the new system, I can't seem to use old ifup-style activation of the network, even if I try. But rather, the networkd managed network does come up. As a workaround, I simply commented out the line "echo '[ -e $NEWROOT/proc ]' …" in /usr/lib/dracut/modules.d/95nfs/parse-nfsroot.sh and generated a new initramfs. That got me past the initqueue stage, but the next problem was that the (generated) sysroot.mount entry had "What=nfs4:172.17.0.1:/remote/pluto", which will run the command "/usr/bin/mount nfs4:172.17.0.1:/remote/pluto /sysroot …". The mount command doesn't understand that syntax. I had to remove the "nfs4:" part from the kernel command line. Then it successfully mount the root filesystem. But when it tries to do the switch root step it hangs for a while and starts to complain that NFS server 172.17.0.1 is not responding. My impression is that the switch root somehow breaks networkd and the machine looses its configuration. I tried to add rd.break=cleanup and look around. Everything looked fine as far as I could tell. When I tried to manually do the switch (systemctl --no-block switch-root /sysroot) I can repeat the problem; it looses contact to the nfs server.
Here is some further updates. In order to switch from using systemd-networkd to NetworkManager I removed my configuration files from /etc/systemd/network, and generated a new initramfs. Booting with that one works! It works even with the original versions of the module files from dracut, and with the original "root=nfs4:mimmi:/remote/pluto" parameter on the kernel command line. So this problem seems to be specific to trying to use systemd-networkd in combination with NFS root. I've updated the summary line accordingly.
I'm seeing exactly the same problem except with Fedora 33 (works) and Fedora 35 (doesn't work). If I stop at rd.break=initqueue I can manually mount the NFS root volume but the systemd sysroot.mount unit never succeeds. Using systemctl list-jobs, it seems as though sysroot.mount is waiting for dracut-initqueue.service and dracut-initqueue.service is running, but not finished because it is waiting for /sysroot/proc to appear, which never happens until sysroot.mount succeeds. There is also a problem not starting systemd-resolved early enough and with systemd-resolve and systemd-timesync missing from /etc/{passwd,group}, which are really different bugs. However: * if those users/groups are added by modifying the 00systemd dracut module and regenerating initramfs); then * stop before dracut-initqueue using the command line option rd.break=initqueue; then * start systemd-resolved manually; then * mount the NFS volume on /sysroot; and then * continue the boot (^D); the system proceeds to boot as expected. The weird thing is that the Fedora 33 version seems to contain the same circular dependency but it works. In my case the working Fedora 33 version uses Network Manager for network configuration (as for comment #2). It seem to me that maybe dracut-initqueue.service should be constrained to run before initrd-root-fs.target rather than before dracut-pre-mount.service.
This message is a reminder that Fedora Linux 34 is nearing its end of life. Fedora will stop maintaining and issuing updates for Fedora Linux 34 on 2022-06-07. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a 'version' of '34'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, change the 'version' to a later Fedora Linux version. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora Linux 34 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora Linux, you are encouraged to change the 'version' to a later version prior to this bug being closed.
I can't easily check this on Fedora Linux 36 currently, but according to comment 3 the problem still remains at least in Fedora Linux 35.
See bug 2036214
This message is a reminder that Fedora Linux 35 is nearing its end of life. Fedora will stop maintaining and issuing updates for Fedora Linux 35 on 2022-12-13. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a 'version' of '35'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, change the 'version' to a later Fedora Linux version. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora Linux 35 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora Linux, you are encouraged to change the 'version' to a later version prior to this bug being closed.
This bug appears to have been reported against 'rawhide' during the Fedora Linux 38 development cycle. Changing version to 38.
This message is a reminder that Fedora Linux 38 is nearing its end of life. Fedora will stop maintaining and issuing updates for Fedora Linux 38 on 2024-05-21. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a 'version' of '38'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, change the 'version' to a later Fedora Linux version. Note that the version field may be hidden. Click the "Show advanced fields" button if you do not see it. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora Linux 38 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora Linux, you are encouraged to change the 'version' to a later version prior to this bug being closed.
You switched this to "rawhide", Tomasz Torcz, and then it was switched to 38 by a mass change. Is there any interest in fixing this? For my own case the system where this was discovered is nowadays booting using NetworkManager- It is being used and I don't want to experiment with it to check if the problem is gone. (I doubt it, but I don't actually know.) Therefore I won't change the version.
I will check if this is still the case shortly. But I suppose this doesn't work still. I saw no development around nfsroot and dracut seems to be abandoned.
Fedora Linux 38 entered end-of-life (EOL) status on 2024-05-21. Fedora Linux 38 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora Linux please feel free to reopen this bug against that version. Note that the version field may be hidden. Click the "Show advanced fields" button if you do not see the version field. If you are unable to reopen this bug, please file a new report against an active release. Thank you for reporting this bug and we are sorry it could not be fixed.