Note: This bug is displayed in read-only format because
the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Description of problem:
We are seeing a situation when, at system boot, systemd becomes confused over the state of var-lib-nfs-rpc_pipefs.mount. Specifically, systemd believes that /var/lib/nfs/rpc_pipefs is not mounted, when in fact it is mounted, with the exact same options with which var-lib-nfs-rpc_pipefs.mount would mount it:
```
$ systemctl status var-lib-nfs-rpc_pipefs.mount
● var-lib-nfs-rpc_pipefs.mount - RPC Pipe File System
Loaded: loaded (/usr/lib/systemd/system/var-lib-nfs-rpc_pipefs.mount; static; vendor preset: disabled)
Active: failed (Result: exit-code) since Tue 2023-03-21 17:17:51 EDT; 1min 56s ago
Where: /var/lib/nfs/rpc_pipefs
What: sunrpc
Mar 21 17:17:51 host.example.org systemd[1]: Mounting RPC Pipe File System...
Mar 21 17:17:51 host.example.org mount[905509]: mount: /var/lib/nfs/rpc_pipefs: sunrpc already mounted on /var/lib/nfs/rpc_pipefs.
Mar 21 17:17:51 host.example.org systemd[1]: var-lib-nfs-rpc_pipefs.mount: Mount process exited, code=exited status=32
Mar 21 17:17:51 host.example.org systemd[1]: var-lib-nfs-rpc_pipefs.mount: Failed with result 'exit-code'.
Mar 21 17:17:51 host.example.org systemd[1]: Failed to mount RPC Pipe File System.
$ awk '$2 == "/var/lib/nfs/rpc_pipefs" {print $0}' /proc/mounts
sunrpc /var/lib/nfs/rpc_pipefs rpc_pipefs rw,relatime 0 0
```
When this issue occurs, it breaks our NFS home directories, because we mount our home directories with sec=krb5p, which requires rpc-gssd.service, which requires rpc_pipefs.target, which requires var-lib-nfs-rpc_pipefs.mount. So systemd prevents rpc-gssd.service from running because it thinks a prerequisite hasn’t been met, when in fact all prerequisites are satisfied as per the on-disk state of the system.
Neither `systemctl daemon-reload` nor `systemctl daemon-reexec` corrects systemd’s erroneous perception of the state of the /var/lib/nfs/rpc_pipefs mount.
The only way we have found to recover from this erroneous systemd state is to align the on-disk state with what systemd believes. We cannot do this by starting var-lib-nfs-rpc_pipefs.mount, because it will always fail when /var/lib/nfs/rpc_pipefs is already mounted. We briefly considered adjusting the mount options to add `remount`, but that will break in the case where /var/lib/nfs/rpc_pipefs is *not* already mounted. And there is no `silently_succeed_if_already_mounted` option to mount. So the only way we have found to correct the state is to manually unmount /var/lib/nfs/rpc_pipefs:
```
$ umount /var/lib/nfs/rpc_pipefs
```
(Doing this will break anything that depends on /var/lib/nfs/rpc_pipefs being mounted, but the main thing that requires /var/lib/nfs/rpc_pipefs to be mounted is rpc-gssd.service, which is systemd refused to start because var-lib-nfs-rpc_pipefs.mount failed, so this is the least-bad way we have found to correct systemd’s erroneous state.)
At this point, systemd will correctly reflect the state of /var/lib/nfs/rpc_pipefs: if it is mounted, systemd will show the state of var-lib-nfs-rpc_pipefs.mount as active; if not, system will show the state as inactive. Starting/stopping var-lib-nfs-rpc_pipefs.mount will correctly mount/unmount /var/lib/nfs/rpc_pipefs; manually mounting/unmounting /var/lib/nfs/rpc_pipefs will toggle the var-lib-nfs-rpc_pipefs.mount systemd state between active/inactive:
```
$ mount -t rpc_pipefs sunrpc /var/lib/nfs/rpc_pipefs
$ awk '$2 == "/var/lib/nfs/rpc_pipefs" {print $0}' /proc/mounts
sunrpc /var/lib/nfs/rpc_pipefs rpc_pipefs rw,relatime 0 0
$ systemctl status var-lib-nfs-rpc_pipefs.mount
● var-lib-nfs-rpc_pipefs.mount - RPC Pipe File System
Loaded: loaded (/usr/lib/systemd/system/var-lib-nfs-rpc_pipefs.mount; static; vendor preset: disabled)
Active: active (mounted) (Result: exit-code) since Tue 2023-03-21 17:21:24 EDT; 2s ago
Where: /var/lib/nfs/rpc_pipefs
What: sunrpc
```
We use Puppet to manage our RHEL hosts, and have automated this work-around via an exec resource:
```
exec { 'fix systemd state of /var/lib/nfs/rpc_pipefs':
command => '/bin/sh -c "umount /var/lib/nfs/rpc_pipefs 1>/dev/null 2>&1"',
unless => '/bin/sh -c "systemctl is-active -q var-lib-nfs-rpc_pipefs.mount"',
notify => Service['rpc-gssd.service'],
}
```
In English: if var-lib-nfs-rpc_pipefs.mount is not active, attempt to manually unmount /var/lib/nfs/rpc_pipefs, then notify the rpc-gssd.service (which will cause Puppet to [re]start it).
But this is a kluge, and should not be necessary: systemd should not get confused over the state of the /var/lib/nfs/rpc_pipefs. That it does is a bug.
Version-Release number of selected component (if applicable):
systemd-239-68.el8_7.4.x86_64
How reproducible:
We have not found a way to reliably reproduce this bug. It sometimes happens at system boot; sometimes not.
We have a few hundred RHEL7 hosts, a few hundred RHEL8 hosts, and a handful of RHEL9 hosts; we have seen the issue only on our RHEL8 hosts. We have looked at the set of RHEL8 hosts that show the problem, but have been unable to find a common factor.
So, one of two things is true:
* This bug is specific to the RHEL8 systemd
* This bug is specific to RHEL8 and later systemd, but we do not yet have enough RHEL9 hosts to trigger the race condition and evince the bug.
An update: going back further in time in our log data, I was able to find an instance of this bug occurring on a RHEL9 host.
So, this bug is specific to RHEL8 and later systemd.
No matter how far back I looked, I was unable to find any instance of this bug occurring on a RHEL7 host, so whatever the issue is, it was clearly introduced between the RHEL7 and RHEL8 systemd.
Description of problem: We are seeing a situation when, at system boot, systemd becomes confused over the state of var-lib-nfs-rpc_pipefs.mount. Specifically, systemd believes that /var/lib/nfs/rpc_pipefs is not mounted, when in fact it is mounted, with the exact same options with which var-lib-nfs-rpc_pipefs.mount would mount it: ``` $ systemctl status var-lib-nfs-rpc_pipefs.mount ● var-lib-nfs-rpc_pipefs.mount - RPC Pipe File System Loaded: loaded (/usr/lib/systemd/system/var-lib-nfs-rpc_pipefs.mount; static; vendor preset: disabled) Active: failed (Result: exit-code) since Tue 2023-03-21 17:17:51 EDT; 1min 56s ago Where: /var/lib/nfs/rpc_pipefs What: sunrpc Mar 21 17:17:51 host.example.org systemd[1]: Mounting RPC Pipe File System... Mar 21 17:17:51 host.example.org mount[905509]: mount: /var/lib/nfs/rpc_pipefs: sunrpc already mounted on /var/lib/nfs/rpc_pipefs. Mar 21 17:17:51 host.example.org systemd[1]: var-lib-nfs-rpc_pipefs.mount: Mount process exited, code=exited status=32 Mar 21 17:17:51 host.example.org systemd[1]: var-lib-nfs-rpc_pipefs.mount: Failed with result 'exit-code'. Mar 21 17:17:51 host.example.org systemd[1]: Failed to mount RPC Pipe File System. $ awk '$2 == "/var/lib/nfs/rpc_pipefs" {print $0}' /proc/mounts sunrpc /var/lib/nfs/rpc_pipefs rpc_pipefs rw,relatime 0 0 ``` When this issue occurs, it breaks our NFS home directories, because we mount our home directories with sec=krb5p, which requires rpc-gssd.service, which requires rpc_pipefs.target, which requires var-lib-nfs-rpc_pipefs.mount. So systemd prevents rpc-gssd.service from running because it thinks a prerequisite hasn’t been met, when in fact all prerequisites are satisfied as per the on-disk state of the system. Neither `systemctl daemon-reload` nor `systemctl daemon-reexec` corrects systemd’s erroneous perception of the state of the /var/lib/nfs/rpc_pipefs mount. The only way we have found to recover from this erroneous systemd state is to align the on-disk state with what systemd believes. We cannot do this by starting var-lib-nfs-rpc_pipefs.mount, because it will always fail when /var/lib/nfs/rpc_pipefs is already mounted. We briefly considered adjusting the mount options to add `remount`, but that will break in the case where /var/lib/nfs/rpc_pipefs is *not* already mounted. And there is no `silently_succeed_if_already_mounted` option to mount. So the only way we have found to correct the state is to manually unmount /var/lib/nfs/rpc_pipefs: ``` $ umount /var/lib/nfs/rpc_pipefs ``` (Doing this will break anything that depends on /var/lib/nfs/rpc_pipefs being mounted, but the main thing that requires /var/lib/nfs/rpc_pipefs to be mounted is rpc-gssd.service, which is systemd refused to start because var-lib-nfs-rpc_pipefs.mount failed, so this is the least-bad way we have found to correct systemd’s erroneous state.) At this point, systemd will correctly reflect the state of /var/lib/nfs/rpc_pipefs: if it is mounted, systemd will show the state of var-lib-nfs-rpc_pipefs.mount as active; if not, system will show the state as inactive. Starting/stopping var-lib-nfs-rpc_pipefs.mount will correctly mount/unmount /var/lib/nfs/rpc_pipefs; manually mounting/unmounting /var/lib/nfs/rpc_pipefs will toggle the var-lib-nfs-rpc_pipefs.mount systemd state between active/inactive: ``` $ mount -t rpc_pipefs sunrpc /var/lib/nfs/rpc_pipefs $ awk '$2 == "/var/lib/nfs/rpc_pipefs" {print $0}' /proc/mounts sunrpc /var/lib/nfs/rpc_pipefs rpc_pipefs rw,relatime 0 0 $ systemctl status var-lib-nfs-rpc_pipefs.mount ● var-lib-nfs-rpc_pipefs.mount - RPC Pipe File System Loaded: loaded (/usr/lib/systemd/system/var-lib-nfs-rpc_pipefs.mount; static; vendor preset: disabled) Active: active (mounted) (Result: exit-code) since Tue 2023-03-21 17:21:24 EDT; 2s ago Where: /var/lib/nfs/rpc_pipefs What: sunrpc ``` We use Puppet to manage our RHEL hosts, and have automated this work-around via an exec resource: ``` exec { 'fix systemd state of /var/lib/nfs/rpc_pipefs': command => '/bin/sh -c "umount /var/lib/nfs/rpc_pipefs 1>/dev/null 2>&1"', unless => '/bin/sh -c "systemctl is-active -q var-lib-nfs-rpc_pipefs.mount"', notify => Service['rpc-gssd.service'], } ``` In English: if var-lib-nfs-rpc_pipefs.mount is not active, attempt to manually unmount /var/lib/nfs/rpc_pipefs, then notify the rpc-gssd.service (which will cause Puppet to [re]start it). But this is a kluge, and should not be necessary: systemd should not get confused over the state of the /var/lib/nfs/rpc_pipefs. That it does is a bug. Version-Release number of selected component (if applicable): systemd-239-68.el8_7.4.x86_64 How reproducible: We have not found a way to reliably reproduce this bug. It sometimes happens at system boot; sometimes not. We have a few hundred RHEL7 hosts, a few hundred RHEL8 hosts, and a handful of RHEL9 hosts; we have seen the issue only on our RHEL8 hosts. We have looked at the set of RHEL8 hosts that show the problem, but have been unable to find a common factor. So, one of two things is true: * This bug is specific to the RHEL8 systemd * This bug is specific to RHEL8 and later systemd, but we do not yet have enough RHEL9 hosts to trigger the race condition and evince the bug.