Bug 459901

Summary:	race condition between AIO and setresuid()
Product:	Red Hat Enterprise Linux 5	Reporter:	tridge
Component:	kernel	Assignee:	David Howells <dhowells>
Status:	CLOSED ERRATA	QA Contact:	Igor Zhang <yugzhang>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	5.4	CC:	cward, dhowells, drepper, jakub, jeder, jmarchan, jthomas, jwest, kzhang, onestero, roland, tao, yugzhang
Target Milestone:	rc
Target Release:	---
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:
Clones:	595499 (view as bug list)		Environment:
Last Closed:	2011-01-13 20:43:05 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	595499

Description tridge 2008-08-24 06:32:25 UTC

I've found a race condition between AIO read/write and setresuid(). The
race results in AIO in Samba sometimes not completing, leading to stuck clients.

You can grab a little standalone test program that demonstrates the bug here:

  http://samba.org/~tridge/junkcode/aio_uid.c

Here is the description of the bug from that code:

  The race condition is in setresuid(), which in glibc tries to be
  smart about threads and change the euid of threads when the euid of
  the main program changes. The problem is that this makes setresuid()
  non-atomic, which means that if an IO completes during the complex
  series of system calls that setresuid() becomes, then the thread
  completing the IO may get -1/EPERM back from the rt_sigqueueinfo()
  call that it uses to notify its parent of the completing IO. In that
  case two things happen:

    1) the signal is never delivered, so the caller never is told that
    the IO has completed

    2) if the caller polls for completion using aio_error() then it
    will see a -1/EPERM result, rather than the real result of the IO

  The simplest fix in existing code that mixes uid changing with AIO
  (such as Samba) is to not use setresuid() and use setreuid()
  instead, which in glibc doesn't try to play any games with the euid
  of threads. That does mean that you will need to manually gain root
  privileges before calling aio_read() or aio_write() to ensure that
  the thread has permission to send signals to the main thread

If you strace the above program then the bug should be fairly clear.

I've reproduced this bug on RHEL 5.2, but I expect it will be on all
current Linux distros. I've also reproduced it on Ubuntu Hardy.

I've added a workaround to the Samba 3.2 tree, but it will take a while
before this workaround gets to end users. I imagine there are probably
other applications that are affected.

Cheers, Tridge

Comment 1 Jakub Jelinek 2008-08-25 07:16:17 UTC

It is not about glibc trying to be smart, it is simply that POSIX requires the call to affect the whole process, and kernel folks refusing to implement that behavior in the kernel.

Comment 2 Andreas Schwab 2010-02-22 17:43:42 UTC

*** Bug 555858 has been marked as a duplicate of this bug. ***

Comment 3 Jon Thomas 2010-03-01 22:59:49 UTC

Andreas, Jakub,

Do you know if there have been any kernel changes proposed upstream that would allow this to be fixed?

We are wondering what can be done to move this issue to resolution.

thanks

Comment 4 Oleg Nesterov 2010-05-16 16:11:42 UTC

(In reply to comment #1)
>
> It is not about glibc trying to be smart, it is simply that POSIX requires the
> call to affect the whole process, and kernel folks refusing to implement that
> behavior in the kernel.

I don't think we can change sys_setresuid() and break the current long-standing
behaviour. Besides, it is not possible to affect the whole process atomically
(from the check_kill_permission() pov) even in kernel. Of course, it _is_
possible with the additional locking, but imho this is not acceptable.

Afaics, it is easy (but not nice) to change libc to fix this particular problem,
__aio_sigqueue() could take lll_lock(stack_cache_lock) to prevent the race with
__nptl_setxid() which takes this lock too.

Or, we can add something like "seqcount_t xid_seqcount". __nptl_setxid() does
write_seqcount_begin/end and __aio_sigqueue() does

        do {
                seq = read_seqcount_begin(&xid_seqcount);
                ret = INLINE_SYSCALL (rt_sigqueueinfo);
                if (!ret || errno != EPERM)
                        break;
        } while (read_seqcount_retry(&xid_seqcount, seq);

Or something else, we should ask libc developers.

----------------------------------------------------------------------------
On the other hand.

Roland, David. Is there any security reason why tkill should fail when we send
a signal to sub-thread? I don't see any reason. The thread can always send
SIGKILL to all threads. It can block SIGXXX and do kill(getpid(), SIGXXX),
the signal will be delivered to some another sub-thread which may have changed
its credentials. It can even modify .text, etc.

IOW. Can't we simply change check_kill_permission()

        --- x/kernel/signal.c
        +++ b/kernel/signal.c
        @@ -656,6 +656,9 @@ static int check_kill_permission(int sig
                if (error)
                        return error;
        
        +       if (same_thread_group(currant, t))
        +               return 0;
        +
                tcred = __task_cred(t);
                if ((cred->euid ^ tcred->suid) &&
                    (cred->euid ^ tcred->uid) &&

and simplify the life of libc/userspace developers?

Comment 5 David Howells 2010-05-17 09:55:37 UTC

(In reply to comment #4)
>
> Roland, David. Is there any security reason why tkill should fail when we send
> a signal to sub-thread? I don't see any reason. The thread can always send
> SIGKILL to all threads. It can block SIGXXX and do kill(getpid(), SIGXXX),
> the signal will be delivered to some another sub-thread which may have changed
> its credentials. It can even modify .text, etc.
> 
> IOW. Can't we simply change check_kill_permission()
> 
>         --- x/kernel/signal.c
>         +++ b/kernel/signal.c
>         @@ -656,6 +656,9 @@ static int check_kill_permission(int sig
>                 if (error)
>                         return error;
> 
>         +       if (same_thread_group(currant, t))
>         +               return 0;
>         +
>                 tcred = __task_cred(t);
>                 if ((cred->euid ^ tcred->suid) &&
>                     (cred->euid ^ tcred->uid) &&
> 
> and simplify the life of libc/userspace developers?    

It certainly would seem reasonable: as you point out, threads can usually modify one another anyway (but that's not an absolute requirement on Linux).

What is notable is that if the thread was cloned with CLONE_THREAD, so that same_thread_group() returns true, then it must also have been cloned with CLONE_SIGHAND, so that the threads share signal handling...

So you might want to rephrase your check to a more inclusive if two threads share the same signal handling, then they can splat each other with signals irrespective of security requirements.

Comment 6 Oleg Nesterov 2010-05-17 17:51:03 UTC

(In reply to comment #5)
>
> > IOW. Can't we simply change check_kill_permission()
> > 
> >         --- x/kernel/signal.c
> >         +++ b/kernel/signal.c
> >         @@ -656,6 +656,9 @@ static int check_kill_permission(int sig
> >                 if (error)
> >                         return error;
> > 
> >         +       if (same_thread_group(currant, t))
> >         +               return 0;
> >         +
> >                 tcred = __task_cred(t);
> >                 if ((cred->euid ^ tcred->suid) &&
> >                     (cred->euid ^ tcred->uid) &&
> > 
> > and simplify the life of libc/userspace developers?    
> 
> It certainly would seem reasonable: as you point out, threads can usually
> modify one another anyway (but that's not an absolute requirement on Linux).

OK, great. Lets move this discussion to lkml, I'll send the updated
patch (we shouldn't miss security_task_kill).

> What is notable is that if the thread was cloned with CLONE_THREAD, so that
> same_thread_group() returns true, then it must also have been cloned with
> CLONE_SIGHAND, so that the threads share signal handling...
> 
> So you might want to rephrase your check to a more inclusive if two threads
> share the same signal handling, then they can splat each other with signals
> irrespective of security requirements.    

Hmm, I didn't think about this... Agreed, probably it makes more sense
to check current->sighand == t->sighand to detect the CLONE_SIGHAND case.

But for now I'm inclined to send the patch which checks same_thread_group(),
because it is easier to explain this change. And, CLONE_SIGHAND without
CLONE_THREAD is exotic, I'm afraid the subtleness of CLONE_SIGHAND change
won't be tested at all, and then later we may have the "breaks my testcase"
reports.

IOW, same_thread_group() looks a bit safer and more straightforward to me,
at least for now.

OK?

Comment 7 Oleg Nesterov 2010-05-21 15:58:57 UTC

The patch is already in -mm tree
signals-check_kill_permission-dont-check-creds-if-same_thread_group.patch
and it has acks.

Hopefully it will be merged in this window, after that I'll forward
it to rhkl.

Comment 8 Oleg Nesterov 2010-05-30 15:37:03 UTC

[RHEL5 PATCH] bz#459901: signals: check_kill_permission(): don't check creds if same_thread_group()
http://post-office.corp.redhat.com/archives/rhkernel-list/2010-May/msg03168.html

Comment 9 Jarod Wilson 2010-06-01 21:49:18 UTC

Switching component to kernel, assignee to Oleg.

Comment 13 RHEL Program Management 2010-06-16 14:38:42 UTC

This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 20 Jarod Wilson 2010-07-19 21:13:03 UTC

in kernel-2.6.18-207.el5
You can download this test kernel from http://people.redhat.com/jwilson/el5

Detailed testing feedback is always welcomed.

Comment 23 Chris Ward 2010-10-21 12:06:11 UTC

@Reporter, @Jeremy, did the reporter test the kernel in comment #20 yet? If so, please let us know the results here. Thanks!

Comment 27 errata-xmlrpc 2011-01-13 20:43:05 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0017.html