+++ This bug was initially created as a clone of Bug #459901 +++
I've found a race condition between AIO read/write and setresuid(). The
race results in AIO in Samba sometimes not completing, leading to stuck clients.
You can grab a little standalone test program that demonstrates the bug here:
Here is the description of the bug from that code:
The race condition is in setresuid(), which in glibc tries to be
smart about threads and change the euid of threads when the euid of
the main program changes. The problem is that this makes setresuid()
non-atomic, which means that if an IO completes during the complex
series of system calls that setresuid() becomes, then the thread
completing the IO may get -1/EPERM back from the rt_sigqueueinfo()
call that it uses to notify its parent of the completing IO. In that
case two things happen:
1) the signal is never delivered, so the caller never is told that
the IO has completed
2) if the caller polls for completion using aio_error() then it
will see a -1/EPERM result, rather than the real result of the IO
The simplest fix in existing code that mixes uid changing with AIO
(such as Samba) is to not use setresuid() and use setreuid()
instead, which in glibc doesn't try to play any games with the euid
of threads. That does mean that you will need to manually gain root
privileges before calling aio_read() or aio_write() to ensure that
the thread has permission to send signals to the main thread
If you strace the above program then the bug should be fairly clear.
I've reproduced this bug on RHEL 5.2, but I expect it will be on all
current Linux distros. I've also reproduced it on Ubuntu Hardy.
I've added a workaround to the Samba 3.2 tree, but it will take a while
before this workaround gets to end users. I imagine there are probably
other applications that are affected.
lkml thread: http://lkml.org/lkml/fancy/2010/5/17/393
- moved to kernel component based on proposed fix
[RHEL6 PATCH] bz#595499: signals: check_kill_permission(): don't check creds if same_thread_group()
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux major release. Product Management has requested further
review of this request by Red Hat Engineering, for potential inclusion in a Red
Hat Enterprise Linux Major release. This request is not yet committed for
Patch(es) available on kernel-2.6.32-32.el6
Red Hat Enterprise Linux 6.0 is now available and should resolve
the problem described in this bug report. This report is therefore being closed
with a resolution of CURRENTRELEASE. You may reopen this bug report if the
solution does not work for you.