Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1961659

Summary: shutdown enters emergency prompt after printing "reboot: command not found"
Product: Red Hat Enterprise Linux 8 Reporter: Renaud Métrich <rmetrich>
Component: systemdAssignee: Michal Sekletar <msekleta>
Status: CLOSED DUPLICATE QA Contact: Frantisek Sumsal <fsumsal>
Severity: high Docs Contact:
Priority: medium    
Version: 8.3CC: dtardon, ldoktor, msekleta, myamazak, systemd-maint-list
Target Milestone: rcFlags: pm-rhel: mirror+
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-07-14 08:52:23 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Renaud Métrich 2021-05-18 12:31:35 UTC
Description of problem:

We have some customers getting the emergency prompt below because the unpack of the initramfs isn't complete at the time the "shutdown" script in the initramfs executes "reboot":
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
/shutdown: line 115: reboot: command not found
[  145.951127] dracut Warning: reboot failed!
dracut Warning: reboot failed!


[...]

Dropping to debug shell.

shutdown:/# 
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

This happens rarely (something like once over 100 times), but it's quite critical since this leaves the system unusable and requires administrator's intervention.

Digging into this, I found out that there was an ordering cycle happening during shutdown, when using a generic initramfs (or any initramfs having the "nfs" module installed):
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
[  117.274806] systemd[1]: local-fs.target: Found ordering cycle on var-lib-nfs-rpc_pipefs.mount/stop
[  117.276183] systemd[1]: local-fs.target: Found dependency on systemd-tmpfiles-setup.service/stop
[  117.277666] systemd[1]: local-fs.target: Found dependency on local-fs.target/stop
[  117.278777] systemd[1]: local-fs.target: Job var-lib-nfs-rpc_pipefs.mount/stop deleted to break ordering cycle starting with local-fs.target/stop
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

When the ordering cycle is resolved by killing another job than "local-fs.target/stop" (as shown below), this doesn't create the condition (at least we observed that "local-fs.target/stop" was deleted when emergency was entered).

On my system, even having systemd delete "local-fs.target/stop" isn't enough to reproduce because it looks like it's due to the dynamic as well, in particular how long it takes for /usr/lib/dracut/dracut-initramfs-restore to restore the initramfs.

The ordering cycle appears because because /var/lib/nfs/rpc_pipefs mount isn't in sync with the mount unit defined as /usr/lib/systemd/system/var-lib-nfs-rpc_pipefs.mount:
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
# systemctl status var-lib-nfs-rpc_pipefs.mount 
● var-lib-nfs-rpc_pipefs.mount - RPC Pipe File System
   Loaded: loaded (/proc/self/mountinfo; static; vendor preset: disabled)
   Active: active (mounted)
    Where: /var/lib/nfs/rpc_pipefs
     What: rpc_pipefs
    Tasks: 0 (limit: 10840)
   Memory: 0B
   CGroup: /system.slice/var-lib-nfs-rpc_pipefs.mount

# systemctl show var-lib-nfs-rpc_pipefs.mount | egrep "Before|After"
Before=local-fs.target rpc_pipefs.target
After=systemd-journald.socket system.slice systemd-tmpfiles-setup.service -.mount
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

From above, we see systemd didn't use /usr/lib/systemd/system/var-lib-nfs-rpc_pipefs.mount but generated a mount unit based on mountinfo.
This is due to having /var/lib/nfs/rpc_pipefs mounted IN the initramfs already.
Due to this, systemd automatically generates the mount unit and adds Before=local-fs.target which creates the ordering cycle on shutdown.


Version-Release number of selected component (if applicable):

systemd-239


How reproducible:

Sometimes on customer system, wasn't able to reproduce locally

Steps to Reproduce:
1. Install "dracut-config-generic" and rebuild the initramfs

  # yum -y install dracut-config-generic
  # dracut -f

2. Check the mount point after rebooting

  # systemctl show var-lib-nfs-rpc_pipefs.mount | egrep "Before|After"

Actual results:

  Before=local-fs.target rpc_pipefs.target
  After=systemd-journald.socket system.slice systemd-tmpfiles-setup.service -.mount

Expected results:

  No "Before=local-fs.target" dependency


Additional info:

This is closely related to BZ#1924587 (RFE) but happens in another condition (there is no timeout here).

Comment 5 David Tardon 2022-01-25 09:54:43 UTC
The problem here is that mount units created from /proc/self/mountinfo always have Before={local,remote}-fs.target dependency even if there's a mount unit for the mount point that contains DefaultDependencies=no. It should be fixed by https://github.com/systemd/systemd/pull/10980/commits/d54bab90e64f70c1ecf9b0683a98adb8485ed09e and https://github.com/systemd/systemd/pull/10980/commits/26e35b164b8d0603629b3d394554cfa728e8c3e4, which look easy to backport.

Comment 7 David Tardon 2022-01-25 10:00:11 UTC
Reproducer:
cat > /etc/systemd/system/run-test.mount <<EOF
[Unit]
DefaultDependencies=no
Conflicts=umount.target

[Mount]
What=tmpfs
Where=/run/test
Type=tmpfs
EOF
mkdir /run/test
systemctl daemon-reload
mount -t tmpfs tpmfs /run/test
sleep 5 # just to be sure
systemctl show --value -p Before run-test.mount

Comment 10 Michal Sekletar 2022-02-22 19:08:50 UTC
I wrote the small tool called unlinksnoop that should help us rule out the possibility that something is indeed removing reboot binary from unpacked initramfs.

https://github.com/msekletar/unlinksnoop

It can be compiled on RHEL-8 and Makefile also contains the target to create updates.img that can be then used to update the installer environment. Tools is designed, when run as systemd service, to continue running even after service is stopped and it will also survive process killing during shutdown and will be finally terminated only by the kernel on halt. Using unlinksnoop.syslog=<IP> kernel command line option you can easily setup log forwarding to different host where syslog server is running (tcp/514).

To compile on RHEL-8 you should install,

- glibc-static (from CRB repo)
- bpftool
- clang
- Go 1.18 - I was developing on Fedora 36 where this is default go version. It is not shipped on RHEL-8 but can be easily obtained from upstream (https://go.dev/dl/).

Tool will produce log output like this,

Feb 22 19:58:16 localhost.localdomain unlinksnoop[1793]: comm,pid,filename
Feb 22 19:58:16 localhost.localdomain unlinksnoop[1793]: rm,1885,/run/chrony-helper/nm-dhcp.ens160
Feb 22 19:58:16 localhost.localdomain unlinksnoop[1793]: rm,1886,/var/lib/dhclient/chrony.servers.ens160

Comment 18 David Tardon 2022-07-14 08:52:23 UTC
I'm pretty much convinced the root cause of this is that unpacking of the "exitramfs" hasn't been completed. Whether the unpacking process just failed or was killed doesn't make much difference; in either case the fix for bug 1924587 should help. Hence, let's close this as a duplicate.

*** This bug has been marked as a duplicate of bug 1924587 ***