RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1334573 - During shutdown, systemd causes the machine to hang indefinitely if /usr/local is a symlink to an automounted location, requiring someone to physically hit the power button to power cycle
Summary: During shutdown, systemd causes the machine to hang indefinitely if /usr/loca...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: systemd
Version: 7.0
Hardware: x86_64
OS: Linux
high
high
Target Milestone: rc
: ---
Assignee: Michal Sekletar
QA Contact: qe-baseos-daemons
URL:
Whiteboard:
Depends On:
Blocks: 1298243 1420851
TreeView+ depends on / blocked
 
Reported: 2016-05-10 05:02 UTC by Ashima Rawat
Modified: 2021-09-09 11:50 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-06-17 19:27:49 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Ashima Rawat 2016-05-10 05:02:59 UTC
Description of problem:

During shutdown, systemd causes the machine to hang indefinitely if /usr/local is a symlink to an automounted location, requiring someone to physically hit the power button to power cycle.

In customer environment, they have a set of software hosted on a Solaris file server served via NFSv3.  They use autofs to mount this software under /misc/linux in RHEL7.  They then turn /usr/local into a symlink to /misc/linux.  This /misc/linux subdirectory contains what looks like a typical /usr/local file tree (bin, lib, lib64, pkg, share, etc.).

In "out-of-the-box" RHEL7 (no deep customization, details below), 100% of the time, the shutdown process will hang to the point where the system becomes unreachable remotely, and the only way to recover the system is to power cycle the machine.  The final log in journalctl in the instances observed is, "network: Failed to get properties: Connection timed out."

When they remove this /usr/local symlink, this problem goes away.  It has never occurred under a condition where /usr/local is not symlinked to an NFS automounted location.

-------------------------------------------------------------------------------------

Issues with /usr/local being a symlink to an NFS automounted location and systemd have been identified and discussed before on online forums, but don't seem to have ever been resolved.  The most accurate example of someone else encountering this issue, that I can find, is here: https://lists.opensuse.org/opensuse-bugs/2014-02/msg02447.html  The user in that post makes the same observations as the above: when /usr/local is symlinked to an automounted NFS location, systemd inconsistently hangs.

How reproducible: /usr/local into a symlink to /misc/linux

Steps to Reproduce:

[root@arawatnfsclient ~]# grep /mnt /etc/auto.master
/mnt	/etc/auto.local

[root@arawatnfsclient ~]# cat /etc/auto.local
testautofs	10.65.5.106:/test1

[root@arawatnfsclient /]# cd /mnt/testautofs
[root@arawatnfsclient testautofs]# df -h
Filesystem                         Size  Used Avail Use% Mounted on
/dev/mapper/rhel_dhcp210--60-root  8.5G  1.3G  7.2G  16% /
devtmpfs                           490M     0  490M   0% /dev
tmpfs                              497M     0  497M   0% /dev/shm
tmpfs                              497M   50M  447M  11% /run
tmpfs                              497M     0  497M   0% /sys/fs/cgroup
/dev/vda1                          497M   96M  401M  20% /boot
10.65.5.106:/test1                 8.5G  1.8G  6.8G  21% /mnt/testautofs

[root@arawatnfsclient usr]# mv local local.orig
[root@arawatnfsclient testautofs]# ln -s /mnt/testautofs /usr/local
[root@arawatnfsclient testautofs]# cd /usr
[root@arawatnfsclient usr]# ll
total 116
dr-xr-xr-x.  2 root root 16384 May  9 04:14 bin
drwxr-xr-x.  2 root root     6 Mar 13  2014 etc
drwxr-xr-x.  2 root root     6 Mar 13  2014 games
drwxr-xr-x.  3 root root    22 Sep 23  2015 include
dr-xr-xr-x. 26 root root  4096 Sep 23  2015 lib
dr-xr-xr-x. 48 root root 24576 May  9 04:14 lib64
drwxr-xr-x. 14 root root  4096 Sep 23  2015 libexec
lrwxrwxrwx.  1 root root    15 May  9 04:20 local -> /mnt/testautofs
drwxr-xr-x. 12 root root  4096 May  9 04:19 local.orig
dr-xr-xr-x.  2 root root 16384 May  9 04:14 sbin
drwxr-xr-x. 80 root root  4096 Apr 16 10:46 share
drwxr-xr-x.  4 root root    32 Sep 23  2015 src
lrwxrwxrwx.  1 root root    10 Sep 23  2015 tmp -> ../var/tmp


Actual results: Reboot hangs indefinitely on every occurrence and the system has to be powered off manually. 

Expected results:


Additional info:

Comment 2 Lukáš Nykrýn 2016-05-10 10:08:33 UTC
I would say, that this is unsupported scenario. Systemd requires to have full /usr prepared from the initrd. So definitely not an automount and if that directory should be mounted from nfs, then it should be done inside initrd.

Comment 3 Michael Ward 2016-05-10 16:02:23 UTC
(In reply to Lukáš Nykrýn from comment #2)
> I would say, that this is unsupported scenario. Systemd requires to have
> full /usr prepared from the initrd. So definitely not an automount and if
> that directory should be mounted from nfs, then it should be done inside
> initrd.

Hi there.  I'm the one who originally reported this to RedHat, through which Ashima was able to reproduce the issue on RedHat's end, confirm the issue, and create this report for us.  (Great job Ashima, your patience through all the back-and-forth is greatly appreciated).

I can't find any documentation which states that /usr and it's subdirectories must be in a certain state for systemd to function properly.  If such documentation exists, could you link me to it?  I could have missed it.

Because RHEL7 is capable of being rendered unusable for production environments with a few basic commands, if this can't be fixed, I think it needs to be documented somewhere that /usr and all of it's subdirectories must exist in some specific state for systemd to function properly.

I don't want to get too much into non-technical, debatable discussion here, but the one thing I will say is that the use of /usr/local as an NFS mount is not too bizarre.  Our organization has been doing it for about 20+ years across various non-systemd operating systems, including RHEL5 and RHEL6.  A quick Google search for "/usr/local nfs" brings back an enormous amount of discussion on the use of /usr/local as an NFS mount.

One of the leading sources of documentation on Linux filesystem hierarchy practices (in general, not specific to any distribution), states that /usr/local "might be just mounted read-only from somewhere else": http://www.tldp.org/LDP/Linux-Filesystem-Hierarchy/html/usr.html

I realize that RedHat is not constricted to the practices set forth by third-parties, but I just want to show that investigation and correction of the root cause of this issue would not just be for us, it would be beneficial to the entire Linux community in keeping existing, pre-systemd practices compatible in the systemd world.  We are not the only ones with this issue, and it will continue to slowly effect organizations who haven't yet, but eventually will, make the jump to systemd.

Comment 4 Lukáš Nykrýn 2016-05-10 16:26:25 UTC
Sorry, I have missed the part that this is shutdown related, I don't know why I though that you wrote that this is during boot.

Do you use NetworkManager? If so, could you try adding After=dbus.service to its unit-file?

Comment 5 Michael Ward 2016-05-10 18:59:13 UTC
(In reply to Lukáš Nykrýn from comment #4)
> Do you use NetworkManager? If so, could you try adding After=dbus.service to
> its unit-file?

Thank you for your time and consideration, Lukáš.  Unfortunately "After=dbus.service" in the NetworkManager.service unit-file didn't show any improvement.  Just to eliminate NetworkManager from the equation, I disabled NetworkManager and the same problem occurs during shutdown.

Comment 6 Michael Ward 2016-05-16 20:01:18 UTC
At Ashima's suggestion in my open support case with RedHat, adding "remote-fs.target" to the After= clause of autofs.service did improve the situation, in that the reboot failure rate has gone down from 100% to about 20%-50%.  Unlike before where the failure was consistent, it now seems to be random luck as to whether or not it hangs.  The first 5 reboots I did with that change were fine, and I was overjoyed thinking the problem had been solved.  But then over the next 5 reboots, it failed twice.  I went on to do about 20-30 additional reboots, and found the failure rate to be somewhere between 20%-50%.

Comment 8 Ashima Rawat 2016-05-27 09:40:45 UTC
Hi Lukáš,

Please let me know for any progress on this bugzilla raised.
I suggested a workaround to the customer but supposedly it doesnt seem to work.

Looking forward for your inputs on the same,

Thanks
Ashima

Comment 17 Michal Szymanski 2016-10-17 18:12:15 UTC
I get similar problem on up-to-date CentOS 7 system: kernel 3.10.0-327.36.2.el7.x86_64, systemd-219-19.el7_2.13, autofs-5.0.7-54.el7, where /usr/local is automounted directly (by auto.direct) instead of a symlink.

The shutdown process stops displaying:

Stopping LSB: Bring up/down networking...

It usually helps to manually stop the autofs (systemctl stop autofs) before shutdown but I also encountered (once or twice) this command to hang - I could still log to the machine remotely but it was impossible to reboot it unless hard reset. 

Adding "remote-fs.target" to the After= clause of autofs.service did not help at all. The shutdown process hanged immediately showing

(1 of 2) A start job is running for Restore of /run/initramfs (14 s/no limit)

regards, Michal

Comment 24 Kyle Walker 2018-03-14 18:05:01 UTC
There have been a number of alterations to avoid hangs during final stages of the shutdown process in recent systemd revisions. One of which was a backport for another bug report below.

    Bug 1519245 - hangs on reboot or shutdown when nfs file system mounted [rhel-7.4.z]

Including the following patchset:

    * Thu Dec 07 2017 Lukas Nykryn <lnykryn> - 219-42.5
    - unmount: Pass in mount options when remounting read-only (#1312002)
    - shutdown: don't remount,ro network filesystems. (#6588) (#1312002)
    - shutdown: fix incorrect fscanf() result check (#6806) (#1312002)

This may avoid the complete hang condition on shutdown by carefully avoiding NFS filesystems during remount operations. Would it be possible to verify if an update to the systemd version in the following errata resolves the condition?

    https://access.redhat.com/errata/RHBA-2018:0155

- Kyle Walker

Comment 25 Awez 2018-04-10 11:21:50 UTC
I have asked the customer to upgrade systemd package and let us know the results.

Comment 27 Kyle Walker 2019-06-17 19:27:49 UTC
Based on my update in comment 24, and the lack of further reported instances, I am closing this bug as CURRENTRELEASE. Please open a further bug report and refer to this instance in the event that this particular issue is suspected to be related to the further occurrence.


Note You need to log in before you can comment on or make changes to this bug.