Bug 1509088
Summary: | autofs hangs due to race condition in do_spawn | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Michael Fenn <Michael.Fenn> |
Component: | autofs | Assignee: | Ian Kent <ikent> |
Status: | CLOSED ERRATA | QA Contact: | xiaoli feng <xifeng> |
Severity: | medium | Docs Contact: | |
Priority: | unspecified | ||
Version: | 7.3 | CC: | ikent, xifeng, xzhou |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | autofs-5.0.7-80.el7 | Doc Type: | Bug Fix |
Doc Text: |
Cause: There can be a finite time between performing an open and setting the descriptor close on exec and during this window the opened file descriptor can leak into execv(3) executed commands.
Consequence: When this happens it leads to two problems, one is that file descriptors can leak into forked processes which selinux prohibits. The second is a leaked file descriptor can cause poll(2) to not return as it should leading to a hang when running such things as a mount.
Fix: Mutual exclusion has been added to ensure that file handle opens are either completed before or not started during the fork(2) when these processes are created.
Result: No file handle leaks into sub-processes and executed programs no longer occasionally hang.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2018-04-10 18:18:20 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1527815 | ||
Bug Blocks: | |||
Attachments: |
Description
Michael Fenn
2017-11-02 23:50:48 UTC
Created attachment 1347413 [details]
Proposed fix using pipe2
Attached a proposed patch to use pipe2(). We noticed that the preprocessor macros surrounding the pipe2() logic don't have the expected results on systems with pipe2(). This patch fixes that logic and puts in a workaround for systems that don't have pipe2().
(In reply to Michael Fenn from comment #0) > Created attachment 1347127 [details] > Core of stuck automount process > > Description of problem: > Automount is multi-threaded and can mount two filesystems at the same time > in different threads, and there is a race condition between these threads: > > Consider two threads in do_spawn, and one of them (T1) is in open_pipe, > *between* the call to pipe and fcntl(..., FD_CLOEXEC) and the other one (T2) > calls exec("mount", ..." > > At the moment that exec is called, the socketpair in T1 contains two open > file descriptors, neither of which is CLOEXEC. So those file descriptors > stay open in the mount corresponding to T2. Nobody in the T2 stack (mount, > which execs a FUSE mount helper, which daemonizes itself) ever closes those > file descriptors. > > Now return to T1. Because do_spawn gets fancy with capturing and logging > stderr, the parent branch of the fork() in T1 jumps through hoops to ignore > SIGCHLD, and to do 'timed_reads' on T1's mount's stdout/stderr. Those timed > reads keep reading until poll returns a POLLHUP. That's supposed to happen > when T1's mount process terminates (or when it closes stdout and stderr). > BUT now, T2 also has the other end of the pipe open. So T1's timed_read > never gets the POLLHUP that would tell it to quit. It's stuck forever. > > Normally this race is fairly harmless because the mount helper will exit, > taking all its file descriptors with it. However, in the case where the > mount helper forks off a daemon (e.g. a FUSE filesystem), the file > descriptors "inherited" by the daemon may never get closed, leading to a > defunct mount process. > > root 18819 1 0 Oct16 ? 00:00:00 /bin/bash --norc -c ... > root 18827 1279 0 Oct16 ? 00:00:00 [mount] <defunct> > root 18835 1 0 Oct16 ? 00:00:05 /sbin/mount.fusedaemon > > When automount is wedged in this way, it's even worse than just a zombie > mount process hanging around. The whole autofs subsystem is wedged. Here's > a stack trace for the stuck thread (mount.fs123p7 is the fuse daemon): > snip ... > > Note in frame 13, the wedged thread is in lookup_nss_mount. Unfortunately, > in frame 14, the kernel is waiting for us to send the ready or fail ioctls. > > info(ap->logopt, "attempting to mount entry %s", buf); > > set_tsd_user_vars(ap->logopt, mt.uid, mt.gid); > > status = lookup_nss_mount(ap, NULL, mt.name, mt.len); > pthread_setcancelstate(PTHREAD_CANCEL_DISABLE, &state); > if (status) { > ops->send_ready(ap->logopt, > ap->ioctlfd, mt.wait_queue_token); > info(ap->logopt, "mounted %s", buf); > } else { > /* TODO: get mount return status from lookup_nss_mount */ > ops->send_fail(ap->logopt, > ap->ioctlfd, mt.wait_queue_token, -ENOENT); > info(ap->logopt, "failed to mount %s", buf); > } > > If blocked thread never calls send_ready, the kernel side never releases the > lock that blocks *all* subsequent attempts to use autofs. So the race > condition between T1 and T2 locks up everything else on the system. That might be a separate problem. > > We have a local workaround (we think) by modifying our FUSE daemon to close > all open file descriptors. However, automount should be resilient to this. > Yes it should. These functions were added (contributed) to prevent these file descriptors leaking into exec'ed processes but as you describe there is a race in them. But the problem exists in all of the open_setmntent_r(), open_fopen_r(), open_fd_mode() and open_fd() as well as in open_pipe(). And I'm not sure that calling close(2) on several thousand file descriptors on every exec is a good approach to to resolving the problem either. I'm inclined to alter these functions to ensure the file handles are all properly set to close on exec instead. That probably means moving them out of the include file and adding a mutex to serialize accesses. Your description also appears to imply that __have_pipe2 no longer does what it's expected to do. I'm not sure about that either so perhaps there's a need to update configure.in in the way you describe to replace the usage of __have_pipe2. Ian > And I'm not sure that calling close(2) on several thousand > file descriptors on every exec is a good approach to to > resolving the problem either. > > I'm inclined to alter these functions to ensure the file > handles are all properly set to close on exec instead. That > probably means moving them out of the include file and adding > a mutex to serialize accesses. For RHEL specifically, both el6 and el7 have pipe2(2) support, so the close(2) loop could be dropped entirely. Wrapping them in a mutex would also work. > > Your description also appears to imply that __have_pipe2 no > longer does what it's expected to do. > > I'm not sure about that either so perhaps there's a need to > update configure.in in the way you describe to replace the > usage of __have_pipe2. > > Ian As far as I can tell, __have_pipe2 was introduced in glibc in 2008 but was really meant for glibc's internal use, testing for pipe2(2) should be done in the usual way with autoconf. The __have_pipe2 macro was changed in 2012, which is probably he source of the regression in el7, and was just recently removed altogether. The attached patch includes a change to configure.in which should test for pipe2(2). Glibc changelogs below: 2008-07-27 Ulrich Drepper <drepper> ... * sysdeps/unix/sysv/linux/syscalls.list: Add __pipe2 alias. ... 2012-08-18 Mike Frysinger <vapier> * include/sys/socket.h (__have_sock_cloexec): Add attribute_hidden. * include/unistd.h (__have_sock_cloexec): Likewise. (__have_pipe2): Likewise. (__have_dup3): Likewise. ... 2012-08-18 Mike Frysinger <vapier> [BZ #9685] * include/unistd.h (__have_pipe2): Change define into an extern int. ... 2017-04-18 Florian Weimer <fweimer> * include/unistd.h (__have_pipe2): Remove declaration. * socket/have_sock_cloexec.c (__have_pipe2): Remove definition. * libio/iopopen.c (_IO_new_proc_open): Assume that pipe2 is available. * posix/wordexp.c (exec_comm_child, exec_comm): Likewise. * sysdeps/unix/sysv/linux/kernel-features.h (__ASSUME_PIPE2): Remove definition. (In reply to Michael Fenn from comment #4) > > And I'm not sure that calling close(2) on several thousand > > file descriptors on every exec is a good approach to to > > resolving the problem either. > > > > I'm inclined to alter these functions to ensure the file > > handles are all properly set to close on exec instead. That > > probably means moving them out of the include file and adding > > a mutex to serialize accesses. > > For RHEL specifically, both el6 and el7 have pipe2(2) support, so the > close(2) loop could be dropped entirely. Wrapping them in a mutex would > also work. I think so and it should also satisfy the no leaked file descriptors on exec selinux requirement for other descriptors. I'm wrapping the entire functions with a mutex as fork(2) probably needs to either block if a descriptor in being opened or the open_*() functions need to be blocked until fork(2) is complete. > > > > > Your description also appears to imply that __have_pipe2 no > > longer does what it's expected to do. > > > > I'm not sure about that either so perhaps there's a need to > > update configure.in in the way you describe to replace the > > usage of __have_pipe2. > > > > Ian > > > As far as I can tell, __have_pipe2 was introduced in glibc in 2008 but was > really meant for glibc's internal use, testing for pipe2(2) should be done > in the usual way with autoconf. The __have_pipe2 macro was changed in 2012, > which is probably he source of the regression in el7, and was just recently > removed altogether. The attached patch includes a change to configure.in > which should test for pipe2(2). Yes, the then glibc maintainer that wrote these probably didn't expect these to go away, I'm adding the configure fragment. I'm working on this now and the only question I have (to myself) is whether I will need to use fork handlers but I think probably not as the child process should have already been cloned when fork(2) returns. Ian (In reply to Ian Kent from comment #5) > (In reply to Michael Fenn from comment #4) > I'm working on this now and the only question I have (to myself) > is whether I will need to use fork handlers but I think probably > not as the child process should have already been cloned when > fork(2) returns. > > Ian Great, thanks for taking a look! Created attachment 1364053 [details]
Patch - update configure to check for pipe2(2)
Created attachment 1364054 [details]
Patch - fix open calls not using open_xxxx() calls
Created attachment 1364055 [details]
Patch - move open_xxxx() functions to spawn.c
Created attachment 1364056 [details]
Patch - serialize calls to open_xxxx() functions
An autofs package with the above changes has been built and is available at: http://people.redhat.com/~ikent/autofs-5.0.7-77.bz1509088.1.el7/ I believe this package will resolve the problem reported in this bug. Could you test it and let me know how it goes please. Ian Thanks Ian. I've been working on a synthetic test to reproduce the race condition outside of our production environment, but so far I haven't been successful. I'll keep working on it. I deployed the patched version into our production environment on 2017-12-13 and reverted the workaround in our fuse filesystem. So far we haven't seen any more hangs. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:0977 |