Note: This bug is displayed in read-only format because
the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Description of problem:
We have some customers getting the emergency prompt below because the unpack of the initramfs isn't complete at the time the "shutdown" script in the initramfs executes "reboot":
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
/shutdown: line 115: reboot: command not found
[ 145.951127] dracut Warning: reboot failed!
dracut Warning: reboot failed!
[...]
Dropping to debug shell.
shutdown:/#
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
This happens rarely (something like once over 100 times), but it's quite critical since this leaves the system unusable and requires administrator's intervention.
Digging into this, I found out that there was an ordering cycle happening during shutdown, when using a generic initramfs (or any initramfs having the "nfs" module installed):
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
[ 117.274806] systemd[1]: local-fs.target: Found ordering cycle on var-lib-nfs-rpc_pipefs.mount/stop
[ 117.276183] systemd[1]: local-fs.target: Found dependency on systemd-tmpfiles-setup.service/stop
[ 117.277666] systemd[1]: local-fs.target: Found dependency on local-fs.target/stop
[ 117.278777] systemd[1]: local-fs.target: Job var-lib-nfs-rpc_pipefs.mount/stop deleted to break ordering cycle starting with local-fs.target/stop
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
When the ordering cycle is resolved by killing another job than "local-fs.target/stop" (as shown below), this doesn't create the condition (at least we observed that "local-fs.target/stop" was deleted when emergency was entered).
On my system, even having systemd delete "local-fs.target/stop" isn't enough to reproduce because it looks like it's due to the dynamic as well, in particular how long it takes for /usr/lib/dracut/dracut-initramfs-restore to restore the initramfs.
The ordering cycle appears because because /var/lib/nfs/rpc_pipefs mount isn't in sync with the mount unit defined as /usr/lib/systemd/system/var-lib-nfs-rpc_pipefs.mount:
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
# systemctl status var-lib-nfs-rpc_pipefs.mount
● var-lib-nfs-rpc_pipefs.mount - RPC Pipe File System
Loaded: loaded (/proc/self/mountinfo; static; vendor preset: disabled)
Active: active (mounted)
Where: /var/lib/nfs/rpc_pipefs
What: rpc_pipefs
Tasks: 0 (limit: 10840)
Memory: 0B
CGroup: /system.slice/var-lib-nfs-rpc_pipefs.mount
# systemctl show var-lib-nfs-rpc_pipefs.mount | egrep "Before|After"
Before=local-fs.target rpc_pipefs.target
After=systemd-journald.socket system.slice systemd-tmpfiles-setup.service -.mount
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
From above, we see systemd didn't use /usr/lib/systemd/system/var-lib-nfs-rpc_pipefs.mount but generated a mount unit based on mountinfo.
This is due to having /var/lib/nfs/rpc_pipefs mounted IN the initramfs already.
Due to this, systemd automatically generates the mount unit and adds Before=local-fs.target which creates the ordering cycle on shutdown.
Version-Release number of selected component (if applicable):
systemd-239
How reproducible:
Sometimes on customer system, wasn't able to reproduce locally
Steps to Reproduce:
1. Install "dracut-config-generic" and rebuild the initramfs
# yum -y install dracut-config-generic
# dracut -f
2. Check the mount point after rebooting
# systemctl show var-lib-nfs-rpc_pipefs.mount | egrep "Before|After"
Actual results:
Before=local-fs.target rpc_pipefs.target
After=systemd-journald.socket system.slice systemd-tmpfiles-setup.service -.mount
Expected results:
No "Before=local-fs.target" dependency
Additional info:
This is closely related to BZ#1924587 (RFE) but happens in another condition (there is no timeout here).
I wrote the small tool called unlinksnoop that should help us rule out the possibility that something is indeed removing reboot binary from unpacked initramfs.
https://github.com/msekletar/unlinksnoop
It can be compiled on RHEL-8 and Makefile also contains the target to create updates.img that can be then used to update the installer environment. Tools is designed, when run as systemd service, to continue running even after service is stopped and it will also survive process killing during shutdown and will be finally terminated only by the kernel on halt. Using unlinksnoop.syslog=<IP> kernel command line option you can easily setup log forwarding to different host where syslog server is running (tcp/514).
To compile on RHEL-8 you should install,
- glibc-static (from CRB repo)
- bpftool
- clang
- Go 1.18 - I was developing on Fedora 36 where this is default go version. It is not shipped on RHEL-8 but can be easily obtained from upstream (https://go.dev/dl/).
Tool will produce log output like this,
Feb 22 19:58:16 localhost.localdomain unlinksnoop[1793]: comm,pid,filename
Feb 22 19:58:16 localhost.localdomain unlinksnoop[1793]: rm,1885,/run/chrony-helper/nm-dhcp.ens160
Feb 22 19:58:16 localhost.localdomain unlinksnoop[1793]: rm,1886,/var/lib/dhclient/chrony.servers.ens160
I'm pretty much convinced the root cause of this is that unpacking of the "exitramfs" hasn't been completed. Whether the unpacking process just failed or was killed doesn't make much difference; in either case the fix for bug 1924587 should help. Hence, let's close this as a duplicate.
*** This bug has been marked as a duplicate of bug 1924587 ***
Description of problem: We have some customers getting the emergency prompt below because the unpack of the initramfs isn't complete at the time the "shutdown" script in the initramfs executes "reboot": -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- /shutdown: line 115: reboot: command not found [ 145.951127] dracut Warning: reboot failed! dracut Warning: reboot failed! [...] Dropping to debug shell. shutdown:/# -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- This happens rarely (something like once over 100 times), but it's quite critical since this leaves the system unusable and requires administrator's intervention. Digging into this, I found out that there was an ordering cycle happening during shutdown, when using a generic initramfs (or any initramfs having the "nfs" module installed): -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- [ 117.274806] systemd[1]: local-fs.target: Found ordering cycle on var-lib-nfs-rpc_pipefs.mount/stop [ 117.276183] systemd[1]: local-fs.target: Found dependency on systemd-tmpfiles-setup.service/stop [ 117.277666] systemd[1]: local-fs.target: Found dependency on local-fs.target/stop [ 117.278777] systemd[1]: local-fs.target: Job var-lib-nfs-rpc_pipefs.mount/stop deleted to break ordering cycle starting with local-fs.target/stop -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- When the ordering cycle is resolved by killing another job than "local-fs.target/stop" (as shown below), this doesn't create the condition (at least we observed that "local-fs.target/stop" was deleted when emergency was entered). On my system, even having systemd delete "local-fs.target/stop" isn't enough to reproduce because it looks like it's due to the dynamic as well, in particular how long it takes for /usr/lib/dracut/dracut-initramfs-restore to restore the initramfs. The ordering cycle appears because because /var/lib/nfs/rpc_pipefs mount isn't in sync with the mount unit defined as /usr/lib/systemd/system/var-lib-nfs-rpc_pipefs.mount: -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- # systemctl status var-lib-nfs-rpc_pipefs.mount ● var-lib-nfs-rpc_pipefs.mount - RPC Pipe File System Loaded: loaded (/proc/self/mountinfo; static; vendor preset: disabled) Active: active (mounted) Where: /var/lib/nfs/rpc_pipefs What: rpc_pipefs Tasks: 0 (limit: 10840) Memory: 0B CGroup: /system.slice/var-lib-nfs-rpc_pipefs.mount # systemctl show var-lib-nfs-rpc_pipefs.mount | egrep "Before|After" Before=local-fs.target rpc_pipefs.target After=systemd-journald.socket system.slice systemd-tmpfiles-setup.service -.mount -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- From above, we see systemd didn't use /usr/lib/systemd/system/var-lib-nfs-rpc_pipefs.mount but generated a mount unit based on mountinfo. This is due to having /var/lib/nfs/rpc_pipefs mounted IN the initramfs already. Due to this, systemd automatically generates the mount unit and adds Before=local-fs.target which creates the ordering cycle on shutdown. Version-Release number of selected component (if applicable): systemd-239 How reproducible: Sometimes on customer system, wasn't able to reproduce locally Steps to Reproduce: 1. Install "dracut-config-generic" and rebuild the initramfs # yum -y install dracut-config-generic # dracut -f 2. Check the mount point after rebooting # systemctl show var-lib-nfs-rpc_pipefs.mount | egrep "Before|After" Actual results: Before=local-fs.target rpc_pipefs.target After=systemd-journald.socket system.slice systemd-tmpfiles-setup.service -.mount Expected results: No "Before=local-fs.target" dependency Additional info: This is closely related to BZ#1924587 (RFE) but happens in another condition (there is no timeout here).