I've found a race condition between AIO read/write and setresuid(). The race results in AIO in Samba sometimes not completing, leading to stuck clients. You can grab a little standalone test program that demonstrates the bug here: http://samba.org/~tridge/junkcode/aio_uid.c Here is the description of the bug from that code: The race condition is in setresuid(), which in glibc tries to be smart about threads and change the euid of threads when the euid of the main program changes. The problem is that this makes setresuid() non-atomic, which means that if an IO completes during the complex series of system calls that setresuid() becomes, then the thread completing the IO may get -1/EPERM back from the rt_sigqueueinfo() call that it uses to notify its parent of the completing IO. In that case two things happen: 1) the signal is never delivered, so the caller never is told that the IO has completed 2) if the caller polls for completion using aio_error() then it will see a -1/EPERM result, rather than the real result of the IO The simplest fix in existing code that mixes uid changing with AIO (such as Samba) is to not use setresuid() and use setreuid() instead, which in glibc doesn't try to play any games with the euid of threads. That does mean that you will need to manually gain root privileges before calling aio_read() or aio_write() to ensure that the thread has permission to send signals to the main thread If you strace the above program then the bug should be fairly clear. I've reproduced this bug on RHEL 5.2, but I expect it will be on all current Linux distros. I've also reproduced it on Ubuntu Hardy. I've added a workaround to the Samba 3.2 tree, but it will take a while before this workaround gets to end users. I imagine there are probably other applications that are affected. Cheers, Tridge
It is not about glibc trying to be smart, it is simply that POSIX requires the call to affect the whole process, and kernel folks refusing to implement that behavior in the kernel.
*** Bug 555858 has been marked as a duplicate of this bug. ***
Andreas, Jakub, Do you know if there have been any kernel changes proposed upstream that would allow this to be fixed? We are wondering what can be done to move this issue to resolution. thanks
(In reply to comment #1) > > It is not about glibc trying to be smart, it is simply that POSIX requires the > call to affect the whole process, and kernel folks refusing to implement that > behavior in the kernel. I don't think we can change sys_setresuid() and break the current long-standing behaviour. Besides, it is not possible to affect the whole process atomically (from the check_kill_permission() pov) even in kernel. Of course, it _is_ possible with the additional locking, but imho this is not acceptable. Afaics, it is easy (but not nice) to change libc to fix this particular problem, __aio_sigqueue() could take lll_lock(stack_cache_lock) to prevent the race with __nptl_setxid() which takes this lock too. Or, we can add something like "seqcount_t xid_seqcount". __nptl_setxid() does write_seqcount_begin/end and __aio_sigqueue() does do { seq = read_seqcount_begin(&xid_seqcount); ret = INLINE_SYSCALL (rt_sigqueueinfo); if (!ret || errno != EPERM) break; } while (read_seqcount_retry(&xid_seqcount, seq); Or something else, we should ask libc developers. ---------------------------------------------------------------------------- On the other hand. Roland, David. Is there any security reason why tkill should fail when we send a signal to sub-thread? I don't see any reason. The thread can always send SIGKILL to all threads. It can block SIGXXX and do kill(getpid(), SIGXXX), the signal will be delivered to some another sub-thread which may have changed its credentials. It can even modify .text, etc. IOW. Can't we simply change check_kill_permission() --- x/kernel/signal.c +++ b/kernel/signal.c @@ -656,6 +656,9 @@ static int check_kill_permission(int sig if (error) return error; + if (same_thread_group(currant, t)) + return 0; + tcred = __task_cred(t); if ((cred->euid ^ tcred->suid) && (cred->euid ^ tcred->uid) && and simplify the life of libc/userspace developers?
(In reply to comment #4) > > Roland, David. Is there any security reason why tkill should fail when we send > a signal to sub-thread? I don't see any reason. The thread can always send > SIGKILL to all threads. It can block SIGXXX and do kill(getpid(), SIGXXX), > the signal will be delivered to some another sub-thread which may have changed > its credentials. It can even modify .text, etc. > > IOW. Can't we simply change check_kill_permission() > > --- x/kernel/signal.c > +++ b/kernel/signal.c > @@ -656,6 +656,9 @@ static int check_kill_permission(int sig > if (error) > return error; > > + if (same_thread_group(currant, t)) > + return 0; > + > tcred = __task_cred(t); > if ((cred->euid ^ tcred->suid) && > (cred->euid ^ tcred->uid) && > > and simplify the life of libc/userspace developers? It certainly would seem reasonable: as you point out, threads can usually modify one another anyway (but that's not an absolute requirement on Linux). What is notable is that if the thread was cloned with CLONE_THREAD, so that same_thread_group() returns true, then it must also have been cloned with CLONE_SIGHAND, so that the threads share signal handling... So you might want to rephrase your check to a more inclusive if two threads share the same signal handling, then they can splat each other with signals irrespective of security requirements.
(In reply to comment #5) > > > IOW. Can't we simply change check_kill_permission() > > > > --- x/kernel/signal.c > > +++ b/kernel/signal.c > > @@ -656,6 +656,9 @@ static int check_kill_permission(int sig > > if (error) > > return error; > > > > + if (same_thread_group(currant, t)) > > + return 0; > > + > > tcred = __task_cred(t); > > if ((cred->euid ^ tcred->suid) && > > (cred->euid ^ tcred->uid) && > > > > and simplify the life of libc/userspace developers? > > It certainly would seem reasonable: as you point out, threads can usually > modify one another anyway (but that's not an absolute requirement on Linux). OK, great. Lets move this discussion to lkml, I'll send the updated patch (we shouldn't miss security_task_kill). > What is notable is that if the thread was cloned with CLONE_THREAD, so that > same_thread_group() returns true, then it must also have been cloned with > CLONE_SIGHAND, so that the threads share signal handling... > > So you might want to rephrase your check to a more inclusive if two threads > share the same signal handling, then they can splat each other with signals > irrespective of security requirements. Hmm, I didn't think about this... Agreed, probably it makes more sense to check current->sighand == t->sighand to detect the CLONE_SIGHAND case. But for now I'm inclined to send the patch which checks same_thread_group(), because it is easier to explain this change. And, CLONE_SIGHAND without CLONE_THREAD is exotic, I'm afraid the subtleness of CLONE_SIGHAND change won't be tested at all, and then later we may have the "breaks my testcase" reports. IOW, same_thread_group() looks a bit safer and more straightforward to me, at least for now. OK?
The patch is already in -mm tree signals-check_kill_permission-dont-check-creds-if-same_thread_group.patch and it has acks. Hopefully it will be merged in this window, after that I'll forward it to rhkl.
[RHEL5 PATCH] bz#459901: signals: check_kill_permission(): don't check creds if same_thread_group() http://post-office.corp.redhat.com/archives/rhkernel-list/2010-May/msg03168.html
Switching component to kernel, assignee to Oleg.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
in kernel-2.6.18-207.el5 You can download this test kernel from http://people.redhat.com/jwilson/el5 Detailed testing feedback is always welcomed.
@Reporter, @Jeremy, did the reporter test the kernel in comment #20 yet? If so, please let us know the results here. Thanks!
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-0017.html