Bug 2025963
| Summary: | autofs service has not proper limits set to be able to handle many mounts | |||
|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 8 | Reporter: | Renaud Métrich <rmetrich> | |
| Component: | autofs | Assignee: | Ian Kent <ikent> | |
| Status: | CLOSED ERRATA | QA Contact: | Kun Wang <kunwan> | |
| Severity: | medium | Docs Contact: | ||
| Priority: | medium | |||
| Version: | 8.5 | CC: | plambri, xzhou | |
| Target Milestone: | rc | Keywords: | Triaged | |
| Target Release: | --- | Flags: | pm-rhel:
mirror+
|
|
| Hardware: | All | |||
| OS: | Linux | |||
| Whiteboard: | ||||
| Fixed In Version: | autofs-5.1.4-76.el8 | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 2028746 (view as bug list) | Environment: | ||
| Last Closed: | 2022-05-10 15:29:06 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 2028746 | |||
Stracing autofs (but with not much options or else a deadlock occurs: "-fttT" is ok), we can see that once 1000 mounts a done, the workers cannot create new pipes to talk to autofs main daemon:
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
9797 16:09:18.536877 sendto(3, "<30>Nov 23 16:09:18 automount[1150]: mount_mount: mount(nfs): calling mkdir_path /net/vm-systemd8-autofs-nfsserver1/home/user369"..., 129, MSG_NOSIGNAL, NULL, 0) = 129 <0.000063>
9797 16:09:18.536974 stat("/net", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0 <0.000009>
9797 16:09:18.537007 stat("/net/vm-systemd8-autofs-nfsserver1", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0 <0.000009>
9797 16:09:18.537035 stat("/net/vm-systemd8-autofs-nfsserver1/home", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0 <0.000008>
9797 16:09:18.537062 stat("/net/vm-systemd8-autofs-nfsserver1/home/user3696", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0 <0.000009>
9797 16:09:18.537093 getpid() = 1150 <0.000007>
9797 16:09:18.537117 sendto(3, "<30>Nov 23 16:09:18 automount[1150]: mount(nfs): calling mount -t nfs -s -o nosuid,nodev vm-systemd8-autofs-nfsserver1:/home/use"..., 182, MSG_NOSIGNAL, NULL, 0) = 182 <0.000055>
9797 16:09:18.537206 readlink("/etc/mtab", "../proc/self/mounts", 4096) = 19 <0.000010>
9797 16:09:18.537241 pipe2(0x7fb28ed34930, O_CLOEXEC) = -1 EMFILE (Too many open files) <0.000012>
---> HERE ^^^^
9797 16:09:18.537345 getpid() = 1150 <0.000016>
9797 16:09:18.537395 sendto(3, "<30>Nov 23 16:09:18 automount[1150]: mount(nfs): nfs: mount failure vm-systemd8-autofs-nfsserver1:/home/user3696 on /net/vm-syst"..., 164, MSG_NOSIGNAL, NULL, 0) = 164 <0.000084>
9797 16:09:18.537614 lstat("/net/vm-systemd8-autofs-nfsserver1/home/user3696", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0 <0.000025>
9797 16:09:18.537731 getpid() = 1150 <0.000014>
9797 16:09:18.537775 sendto(3, "<30>Nov 23 16:09:18 automount[1150]: dev_ioctl_send_fail: token = 1005", 70, MSG_NOSIGNAL, NULL, 0) = 70 <0.000083>
9797 16:09:18.537919 ioctl(4, AUTOFS_DEV_IOCTL_FAIL, 0x7fb28ed3ea30) = 0 <0.000220>
9797 16:09:18.538637 ioctl(4, AUTOFS_DEV_IOCTL_CLOSEMOUNT, 0x7fb28ed3ead0) = 0 <0.000010>
9797 16:09:18.538667 getpid() = 1150 <0.000009>
9797 16:09:18.538700 sendto(3, "<30>Nov 23 16:09:18 automount[1150]: failed to mount /net/vm-systemd8-autofs-nfsserver1/home/user3696", 101, MSG_NOSIGNAL, NULL, 0) = 101 <0.000238>
9797 16:09:18.539021 madvise(0x7fb28eb00000, 2338816, MADV_DONTNEED) = 0 <0.000076>
9797 16:09:18.539130 exit(0) = ?
9797 16:09:18.539280 +++ exited with 0 +++
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
This happens because autofs keeps track of existing mounts, see /proc/<autofs/fd:
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
[...]
lr-x------. 1 root root 64 Nov 23 16:11 100 -> /net/vm-systemd8-autofs-nfsserver1/home/user925
lr-x------. 1 root root 64 Nov 23 16:11 1000 -> /net/vm-systemd8-autofs-nfsserver1/home/user3715
lr-x------. 1 root root 64 Nov 23 16:11 1001 -> /net/vm-systemd8-autofs-nfsserver1/home/user3714
lr-x------. 1 root root 64 Nov 23 16:11 1002 -> /net/vm-systemd8-autofs-nfsserver1/home/user3713
lr-x------. 1 root root 64 Nov 23 16:11 1003 -> /net/vm-systemd8-autofs-nfsserver1/home/user3712
lr-x------. 1 root root 64 Nov 23 16:11 1004 -> /net/vm-systemd8-autofs-nfsserver1/home/user3711
lr-x------. 1 root root 64 Nov 23 16:11 1005 -> /net/vm-systemd8-autofs-nfsserver1/home/user3710
lr-x------. 1 root root 64 Nov 23 16:11 1006 -> /net/vm-systemd8-autofs-nfsserver1/home/user371
[...]
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
So, even without a burst, as soon as nofile limit is reached, I would expect no new mount to be possible at all.
autofs should be able to increase its limit, knowing that hard nofile is very large.
Additionally I think one improvement would be to print a proper message to help the admin find out what's going on:
Current journal entry:
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
Nov 23 16:09:18 vm-systemd8-autofs automount[1150]: mount(nfs): nfs options="nosuid,nodev", nobind=0, nosymlink=0, ro=0
Nov 23 16:09:18 vm-systemd8-autofs automount[1150]: mount_mount: mount(nfs): calling mkdir_path /net/vm-systemd8-autofs-nfsserver1/home/user3696
Nov 23 16:09:18 vm-systemd8-autofs automount[1150]: mount(nfs): calling mount -t nfs -s -o nosuid,nodev vm-systemd8-autofs-nfsserver1:/home/user3696 /net/vm-systemd8-autofs-nfsserver1/home/user3696
Nov 23 16:09:18 vm-systemd8-autofs automount[1150]: mount(nfs): nfs: mount failure vm-systemd8-autofs-nfsserver1:/home/user3696 on /net/vm-systemd8-autofs-nfsserver1/home/user3696
Nov 23 16:09:18 vm-systemd8-autofs automount[1150]: dev_ioctl_send_fail: token = 1005
Nov 23 16:09:18 vm-systemd8-autofs automount[1150]: handle_packet: type = 5
Nov 23 16:09:18 vm-systemd8-autofs automount[1150]: handle_packet_missing_direct: token 1006, name /net/vm-systemd8-autofs-nfsserver1/home/user3695, request pid 1756
Nov 23 16:09:18 vm-systemd8-autofs automount[1150]: failed to mount /net/vm-systemd8-autofs-nfsserver1/home/user3696
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
Proposal:
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
...: failed to mount /net/vm-systemd8-autofs-nfsserver1/home/user3696 (Too many open files, please consider increasing the NOFILE limit)
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
(In reply to Renaud Métrich from comment #0) > Description of problem: > > When autofs needs to mount a lot of file systems rapidly, e.g. when someone > does a "ls /net/nfsserver/home" with lots of individual home dirs shared on > the nfs server, the command returns with ENOENT and the following message is > seen when autofs is in debug mode: > -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< > -------- > Nov 23 10:19:54 vm-systemd8-autofs automount[1150]: mount(nfs): > root=/net/vm-systemd8-autofs-nfsserver1/home/user894 > name=/net/vm-systemd8-autofs-nfsserver1/home/user894 > what=vm-systemd8-autofs-nfsserver1:/home/user894, fstype=nfs, > options=nosuid,nodev > Nov 23 10:19:54 vm-systemd8-autofs automount[1150]: mount(nfs): nfs > options="nosuid,nodev", nobind=0, nosymlink=0, ro=0 > Nov 23 10:19:54 vm-systemd8-autofs automount[1150]: mount_mount: mount(nfs): > calling mkdir_path /net/vm-systemd8-autofs-nfsserver1/home/user894 > Nov 23 10:19:54 vm-systemd8-autofs automount[1150]: mount(nfs): calling > mount -t nfs -s -o nosuid,nodev vm-systemd8-autofs-nfsserver1:/home/user894 > /net/vm-systemd8-autofs-nfsserver1/home/user894 > Nov 23 10:19:54 vm-systemd8-autofs automount[1150]: >> mount.nfs: mounting > vm-systemd8-autofs-nfsserver1:/home/user894 failed, reason given by server: > No such file or directory > Nov 23 10:19:54 vm-systemd8-autofs automount[1150]: mount(nfs): nfs: mount > failure vm-systemd8-autofs-nfsserver1:/home/user894 on > /net/vm-systemd8-autofs-nfsserver1/home/user894 > Nov 23 10:19:54 vm-systemd8-autofs automount[1150]: dev_ioctl_send_fail: > token = 5139 > Nov 23 10:19:54 vm-systemd8-autofs automount[1150]: failed to mount > /net/vm-systemd8-autofs-nfsserver1/home/user894 > -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< > -------- > > The real root cause is in fact a lack of file descriptors, which is > configured as the default (soft: 1024) by systemd, and clearly not > sufficient. > I think the soft limit should be modified or autofs modified to increase the > limit upon lacking file descriptors. You mean like this in daemon/automount.c: daemon/automount.c:98:#define MAX_OPEN_FILES 10240 res = getrlimit(RLIMIT_NOFILE, &rlim); if (res == -1 || rlim.rlim_max <= MAX_OPEN_FILES) { rlim.rlim_cur = MAX_OPEN_FILES; rlim.rlim_max = MAX_OPEN_FILES; } res = setrlimit(RLIMIT_NOFILE, &rlim); if (res) printf("%s: can't increase open file limit - continuing", program); I have recently been working with the internal hosts map using a server with 30k exports and don't see this problem. Whatever is happening here there must be more to it. (In reply to Renaud Métrich from comment #1) > Stracing autofs (but with not much options or else a deadlock occurs: > "-fttT" is ok), we can see that once 1000 mounts a done, the workers cannot > create new pipes to talk to autofs main daemon: > -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< > -------- > 9797 16:09:18.536877 sendto(3, "<30>Nov 23 16:09:18 automount[1150]: > mount_mount: mount(nfs): calling mkdir_path > /net/vm-systemd8-autofs-nfsserver1/home/user369"..., 129, MSG_NOSIGNAL, > NULL, 0) = 129 <0.000063> > 9797 16:09:18.536974 stat("/net", {st_mode=S_IFDIR|0755, st_size=0, ...}) = > 0 <0.000009> > 9797 16:09:18.537007 stat("/net/vm-systemd8-autofs-nfsserver1", > {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0 <0.000009> > 9797 16:09:18.537035 stat("/net/vm-systemd8-autofs-nfsserver1/home", > {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0 <0.000008> > 9797 16:09:18.537062 > stat("/net/vm-systemd8-autofs-nfsserver1/home/user3696", > {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0 <0.000009> > 9797 16:09:18.537093 getpid() = 1150 <0.000007> > 9797 16:09:18.537117 sendto(3, "<30>Nov 23 16:09:18 automount[1150]: > mount(nfs): calling mount -t nfs -s -o nosuid,nodev > vm-systemd8-autofs-nfsserver1:/home/use"..., 182, MSG_NOSIGNAL, NULL, 0) = > 182 <0.000055> > 9797 16:09:18.537206 readlink("/etc/mtab", "../proc/self/mounts", 4096) = > 19 <0.000010> > 9797 16:09:18.537241 pipe2(0x7fb28ed34930, O_CLOEXEC) = -1 EMFILE (Too many > open files) <0.000012> > > ---> HERE ^^^^ > > 9797 16:09:18.537345 getpid() = 1150 <0.000016> > 9797 16:09:18.537395 sendto(3, "<30>Nov 23 16:09:18 automount[1150]: > mount(nfs): nfs: mount failure vm-systemd8-autofs-nfsserver1:/home/user3696 > on /net/vm-syst"..., 164, MSG_NOSIGNAL, NULL, 0) = 164 <0.000084> > 9797 16:09:18.537614 > lstat("/net/vm-systemd8-autofs-nfsserver1/home/user3696", > {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0 <0.000025> > 9797 16:09:18.537731 getpid() = 1150 <0.000014> > 9797 16:09:18.537775 sendto(3, "<30>Nov 23 16:09:18 automount[1150]: > dev_ioctl_send_fail: token = 1005", 70, MSG_NOSIGNAL, NULL, 0) = 70 > <0.000083> > 9797 16:09:18.537919 ioctl(4, AUTOFS_DEV_IOCTL_FAIL, 0x7fb28ed3ea30) = 0 > <0.000220> > 9797 16:09:18.538637 ioctl(4, AUTOFS_DEV_IOCTL_CLOSEMOUNT, 0x7fb28ed3ead0) > = 0 <0.000010> > 9797 16:09:18.538667 getpid() = 1150 <0.000009> > 9797 16:09:18.538700 sendto(3, "<30>Nov 23 16:09:18 automount[1150]: failed > to mount /net/vm-systemd8-autofs-nfsserver1/home/user3696", 101, > MSG_NOSIGNAL, NULL, 0) = 101 <0.000238> > 9797 16:09:18.539021 madvise(0x7fb28eb00000, 2338816, MADV_DONTNEED) = 0 > <0.000076> > 9797 16:09:18.539130 exit(0) = ? > 9797 16:09:18.539280 +++ exited with 0 +++ > -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< > -------- > > This happens because autofs keeps track of existing mounts, see > /proc/<autofs/fd: This is correct. Historically autofs had to do this because the file handle is needed to check a direct mounts expiration but the trigger mount is covered by a real mount so opening a file handle to the underlying trigger mount wasn't possible. But it has been possible for autofs to open a file handle on the trigger mount that is covered for a long time now. That still isn't done because opening a file handle for each expire check is a fairly large overhead (and there's more that needs to be done to do this besides the open) and the check is done often. Even if we were to decide to change that the assumption mounted direct mounts have an open file handle on the trigger mount is made throughout the code so it would be a fairly tricky change. > -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< > -------- > [...] > lr-x------. 1 root root 64 Nov 23 16:11 100 -> > /net/vm-systemd8-autofs-nfsserver1/home/user925 > lr-x------. 1 root root 64 Nov 23 16:11 1000 -> > /net/vm-systemd8-autofs-nfsserver1/home/user3715 > lr-x------. 1 root root 64 Nov 23 16:11 1001 -> > /net/vm-systemd8-autofs-nfsserver1/home/user3714 > lr-x------. 1 root root 64 Nov 23 16:11 1002 -> > /net/vm-systemd8-autofs-nfsserver1/home/user3713 > lr-x------. 1 root root 64 Nov 23 16:11 1003 -> > /net/vm-systemd8-autofs-nfsserver1/home/user3712 > lr-x------. 1 root root 64 Nov 23 16:11 1004 -> > /net/vm-systemd8-autofs-nfsserver1/home/user3711 > lr-x------. 1 root root 64 Nov 23 16:11 1005 -> > /net/vm-systemd8-autofs-nfsserver1/home/user3710 > lr-x------. 1 root root 64 Nov 23 16:11 1006 -> > /net/vm-systemd8-autofs-nfsserver1/home/user371 > [...] > -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< > -------- > > So, even without a burst, as soon as nofile limit is reached, I would expect > no new mount to be possible at all. > autofs should be able to increase its limit, knowing that hard nofile is > very large. Granted the 10k limit is probably too small these days but in the case here I'd say there must be a file handle leak or some other reason automount isn't able to increase the limit. We will need to investigate that first. > > Additionally I think one improvement would be to print a proper message to > help the admin find out what's going on: > > Current journal entry: > -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< > -------- > Nov 23 16:09:18 vm-systemd8-autofs automount[1150]: mount(nfs): nfs > options="nosuid,nodev", nobind=0, nosymlink=0, ro=0 > Nov 23 16:09:18 vm-systemd8-autofs automount[1150]: mount_mount: mount(nfs): > calling mkdir_path /net/vm-systemd8-autofs-nfsserver1/home/user3696 > Nov 23 16:09:18 vm-systemd8-autofs automount[1150]: mount(nfs): calling > mount -t nfs -s -o nosuid,nodev vm-systemd8-autofs-nfsserver1:/home/user3696 > /net/vm-systemd8-autofs-nfsserver1/home/user3696 > Nov 23 16:09:18 vm-systemd8-autofs automount[1150]: mount(nfs): nfs: mount > failure vm-systemd8-autofs-nfsserver1:/home/user3696 on > /net/vm-systemd8-autofs-nfsserver1/home/user3696 > Nov 23 16:09:18 vm-systemd8-autofs automount[1150]: dev_ioctl_send_fail: > token = 1005 > Nov 23 16:09:18 vm-systemd8-autofs automount[1150]: handle_packet: type = 5 > Nov 23 16:09:18 vm-systemd8-autofs automount[1150]: > handle_packet_missing_direct: token 1006, name > /net/vm-systemd8-autofs-nfsserver1/home/user3695, request pid 1756 > Nov 23 16:09:18 vm-systemd8-autofs automount[1150]: failed to mount > /net/vm-systemd8-autofs-nfsserver1/home/user3696 > -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< > -------- > > Proposal: > -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< > -------- > ...: failed to mount /net/vm-systemd8-autofs-nfsserver1/home/user3696 (Too > many open files, please consider increasing the NOFILE limit) > -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< > -------- Well, yes, that would be good. In fact I changed to kernel interface to accommodate exactly that a long time ago but didn't go through and make all the changes to get that error number at the point of reporting the error to the kernel. It's actually pretty difficult to do that because it will mean having to carry the error (and only those that make sense to report) back through the call stack to the location they are passed to the kernel so they can be reported to user space. Ian Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (autofs bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2022:2084 |
Description of problem: When autofs needs to mount a lot of file systems rapidly, e.g. when someone does a "ls /net/nfsserver/home" with lots of individual home dirs shared on the nfs server, the command returns with ENOENT and the following message is seen when autofs is in debug mode: -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- Nov 23 10:19:54 vm-systemd8-autofs automount[1150]: mount(nfs): root=/net/vm-systemd8-autofs-nfsserver1/home/user894 name=/net/vm-systemd8-autofs-nfsserver1/home/user894 what=vm-systemd8-autofs-nfsserver1:/home/user894, fstype=nfs, options=nosuid,nodev Nov 23 10:19:54 vm-systemd8-autofs automount[1150]: mount(nfs): nfs options="nosuid,nodev", nobind=0, nosymlink=0, ro=0 Nov 23 10:19:54 vm-systemd8-autofs automount[1150]: mount_mount: mount(nfs): calling mkdir_path /net/vm-systemd8-autofs-nfsserver1/home/user894 Nov 23 10:19:54 vm-systemd8-autofs automount[1150]: mount(nfs): calling mount -t nfs -s -o nosuid,nodev vm-systemd8-autofs-nfsserver1:/home/user894 /net/vm-systemd8-autofs-nfsserver1/home/user894 Nov 23 10:19:54 vm-systemd8-autofs automount[1150]: >> mount.nfs: mounting vm-systemd8-autofs-nfsserver1:/home/user894 failed, reason given by server: No such file or directory Nov 23 10:19:54 vm-systemd8-autofs automount[1150]: mount(nfs): nfs: mount failure vm-systemd8-autofs-nfsserver1:/home/user894 on /net/vm-systemd8-autofs-nfsserver1/home/user894 Nov 23 10:19:54 vm-systemd8-autofs automount[1150]: dev_ioctl_send_fail: token = 5139 Nov 23 10:19:54 vm-systemd8-autofs automount[1150]: failed to mount /net/vm-systemd8-autofs-nfsserver1/home/user894 -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- The real root cause is in fact a lack of file descriptors, which is configured as the default (soft: 1024) by systemd, and clearly not sufficient. I think the soft limit should be modified or autofs modified to increase the limit upon lacking file descriptors. Version-Release number of selected component (if applicable): autofs-5.1.4-74.el8.x86_64 How reproducible: Always Steps to Reproduce: 1. Set up a NFS server ("nfsserver") with 4000 home dirs, all shared individually # for i in $(seq 1 4000); do useradd user$i echo -e "/home/user$i\t*(rw,no_root_squash)" >> /etc/exports done 2. On the autofs client perform a simple "ls /net/nfsserver" (triggers "-hosts" maps) # ls /net/nfsserver 3. On the autofs client perform a simple "ls /net/nfsserver/home" command (triggers the real NFS mounts) # ls /net/nfsserver/home Actual results: Tons of "ls /net/nfsserver/home/userXXX: No such file or directory" errors Expected results: No error and mounts performed