Bug 180663 - CVE-2006-4814 Race condition in mincore can cause "ps -ef" to hang
CVE-2006-4814 Race condition in mincore can cause "ps -ef" to hang
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel (Show other bugs)
4.0
All Linux
medium Severity medium
: ---
: ---
Assigned To: Doug Chapman
Brian Brock
impact=moderate,source=redhat,reporte...
: Reopened, Security
Depends On:
Blocks: CVE-2006-4814
  Show dependency treegraph
 
Reported: 2006-02-09 14:56 EST by Doug Chapman
Modified: 2007-11-30 17:07 EST (History)
4 users (show)

See Also:
Fixed In Version: RHSA-2007-0014
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-01-30 09:25:07 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
reproducer for mincore race condition (981 bytes, text/plain)
2006-02-09 14:56 EST, Doug Chapman
no flags Details

  None (edit)
Description Doug Chapman 2006-02-09 14:56:44 EST
Description of problem:
There is a race condition that can be easily triggered by the mincore system
call that will cause "ps -ef" (and likley other things that need to lock
mmap_sem) to hang.  Since this can be used as a DOS attack I am considering this
a security issue.

I can easily reproduce this on my ia64 hardware.  I do not expect it to be
hardware specific however (but I will verify when I get access to other hardware).

The situation that causes this is:

1. thread A calls mincore with the *vec argument pointing to a page that is not
in core (i.e. a freshly mmapped page).  In the mincore system call it calls
down_read on mmap_sem.

2. a second thread in the same process calls mmap to map a new page, this calls
down_write on mmap_sem but blocks due to the reader in the first thread.

3. the first thread then calls copy_to_user to copy the info regarding which
pages are in core to the address specified by *vec.  Since this page is not yet
in core a page fault is triggered.  The page fault handler calls down_read on
mmap_sem even though this task already has a read lock on it.  Since there is a
waiting writer this call to down_read blocks.  Since this is the thread with the
original read_lock on mmap_sem we have deadlock.

4. the DOS issue then comes from any user doing a "ps -ef" which needs to do
down_read on mmap_sem for each process.  This blocks.


Version-Release number of selected component (if applicable):
Seen on the latest RHEL4 kernel (2.6.9-30) but is also broken up stream.  I will
check RHEL3 soon.

How reproducible:
All the time (on ia64 SMP however I expect this is not arch dependent)


Steps to Reproduce:
1. compile the attached reproducer with "cc mincore_hang.c -lpthread"
2. ./a.out
3. ps -ef
  
Actual results:
The a.out and ps -ef hang and are uninterruptable

Expected results:


Additional info:
Comment 1 Doug Chapman 2006-02-09 14:56:44 EST
Created attachment 124452 [details]
reproducer for mincore race condition
Comment 2 Dave Anderson 2006-02-09 16:06:43 EST
You got a kernel patch to fix it?  Or a suggestion how
to fix it?
Comment 3 Doug Chapman 2006-02-09 16:14:55 EST
I have done some thinking on how to fix this.  Since we need to be holding the
read lock on mmap_sem while walking the vma list perhaps instead of copying the
info into user space while holding the lock we could copy into some kmalloc'ed
memory temporarily and the copy_to_user after we drop the read lock.  I don't
really like this solution since it means allocating potentially large chunks via
kmalloc (i.e. some really big app on a 1TB system wants to check all of its
memory to see what is in core).

Perhaps a better solution would be to ensure the memory pointed to by *vec is in
core before we take the lock and mlock it temporarily.

Comment 4 Dave Anderson 2006-02-09 16:59:30 EST
As you surmise, it reproduces on a RHEL4 x86_64, and on
a UP ia64 RHEL3 box.  

Amazing nobody's ever run into this.

> Perhaps a better solution would be to ensure the memory pointed to by *vec
> is in core before we take the lock and mlock it temporarily.

It would seem to make sense to pre-read the memory pointed
to by vec, as opposed to only doing access_ok() on it.
Pinning the pages from inside the kernel of scares me though,
because nothing prevents the user from passing a ridiculously
large block of address space, even if it's a byte array.

Adding Larry to the cc: list in case he has any suggestions.
Comment 5 Dave Anderson 2006-02-10 09:19:51 EST
Doug,

Carrying your suggestion a bit further, perhaps a fault-and-pin-page
operation could be done a page a time, by enclosing the "while (vma)"
loop in sys_mincore() inside another while loop that does the operation
on a page-by-page basis (i.e., per page full of byte markers).  Even
though *vec at a minimum points to a single page, on a large user virtual
address space, a user could still seemingly cause some damage if all of
its *vec memory were locked down at once.  Doing it a page at time will make
sys_mincore() obfuscated, but wouldn't introduce a new problem to fix
the old one.
 
Comment 6 Doug Chapman 2006-02-10 10:57:53 EST
Dave,

I had to think this through and stare at the code a bit but it sounds like this
is probably the best solution.  The first pass through the loop would be special
since *vec likley isn't page aligned so we need to see how many bytes we have
before we hit the page boundary and only look at that many pages on the first
pass.  Most of the time it would likely only be 1 pass so I don't think
performance is impacted (not that mincore is probably a major performance
concern anyway but I bet oracle uses it).


Comment 7 Dave Anderson 2006-02-10 11:27:18 EST
Yep, I forgot about *vec not being on page boundary...

But you're right -- it still means taking the user's passed-in
start, length, and its base pointer to *vec, and keeping local,
progressive, versions of those values for each time through the
loop.

Re: performance, my only concern was memory page locking, i.e.,
that it shouldn't do more than one at a time.


Comment 10 Jay Turner 2006-08-15 22:32:59 EDT
QE ack for 4.5.
Comment 11 Dave Anderson 2006-08-16 09:39:38 EDT
Ok -- I took a quick look at what data is available in IT #98736.

Aside from the fact that there are a number of processes blocked in ps
in IT #98736, can you explain what that has to do with the mincore() issue
that this bugzilla was filed against?  In this bugzilla, Doug forced the
the hang with a multi-threaded process whose threads are doing mincore()
and mmap() operations on their address space simultaneously, leading to
a block on their common mmap_sem semaphore if somebody else then did
a ps on that task.

In IT #98736, the straces show a bunch of processes that mostly are blocked
doing read()'s of various /proc/<pid>/cmdline files.  Are there simultaneous
threads doing mincore() calls on the same pids, presuming that they are
multi-threaded processes?

Anyway, at a minimum, we need a forced crash-dump to figure out what's
going on in the IT's case.  A bunch of user-space "strace" outputs are pretty
much useless, especially those of the "ps" commands (as opposed to the pids
who cmdline's are being read.  

And if it has nothing to do with simultaneous mincore()/mmap() operations
going on in the same threads whose /proc/pid/cmdline files are being read,
then this should be opened as a seperate bugzilla, since it makes no sense
to continue in the context of this one.



  
Comment 12 Dave Anderson 2006-08-17 15:39:54 EDT
Looking at the files in IT #98736, there is zero evidence of
the multi-threaded/mincore/mmap interaction issue that is the
subject of this bugzilla.

The only relationship between the two situations is that something
is causing a task's mmap_sem to be deadlocked, and subsequent ps commands
hang when trying to take the semaphore.

In this bugzilla, it's Doug's program that forces simultaneous mincore()/mmap()
interaction between two threads of a process, and both threads end up blocking
on their own mmap_sem.  In #IT 98736, it's something completely different,
and that problem is what needs to be determined.

It makes no sense to tie that IT to this bugzilla; it needs its own bugzilla.




  
Comment 14 Dave Anderson 2006-09-05 09:31:18 EDT
> The first comment in this bug don't talk about an IT item. What about fixing
> that problem report and leaving any IT by side?

Doug Chapman (the reporter) was well on his way to fixing this with his
first patch.  Perhaps he can resurrect his work and post an updated
version?
Comment 21 Marcel Holtmann 2006-12-04 11:08:16 EST
Has this patch been proposed for upstream inclusion?
Comment 22 Doug Chapman 2006-12-04 11:22:30 EST
(In reply to comment #21)
> Has this patch been proposed for upstream inclusion?
> 

I have not proposed it upstream as I wasn't sure what the procedure was for
sensitive patches.  I sent it to rhkernel-list to get ack's but received none
(no nacks either however).
Comment 23 Marcel Holtmann 2006-12-04 12:07:58 EST
This bug is 10 month old and I would propose we make it public now and propose
the patch upstream. This one is a good candidate for the next RHEL4 async errata
and we should go with a patch that is acceptable by upstream. Any objections?
Comment 24 Marcel Holtmann 2006-12-07 03:53:46 EST
So no objections to lifting the embargo. I will communicate this through
vendor-sec and then open this Bugzilla.
Comment 25 Marcel Holtmann 2006-12-12 16:00:33 EST
And it seems this is not only ia64. Is 64-bit specific or can it also happen on
32-bit architectures?
Comment 26 Dave Anderson 2006-12-13 09:33:55 EST
Doug Chapman wrote:

> How reproducible:
> All the time (on ia64 SMP however I expect this is not arch dependent)

> I can easily reproduce this on my ia64 hardware.  I do not expect it to be
> hardware specific however (but I will verify when I get access to other
> hardware).

Comment 27 Marcel Holtmann 2006-12-15 13:54:04 EST
Created attachment 143802 [details]
Proposed patch from Andrew Morton
Comment 28 Marcel Holtmann 2006-12-15 14:00:08 EST
Created attachment 143803 [details]
Additional patch on-top of the previous one
Comment 29 Dave Anderson 2006-12-15 14:19:19 EST
Hi Doug,

Just in case you can't see private comments, I've un-privatized the
last two from Andrew Morton.  Can you apply his cleanups/fixes to 
your patch, verify it, and re-post to RHKL?

Thanks,
  Dave
Comment 32 Dave Anderson 2006-12-21 13:43:56 EST
(In reply to comment #30)
> looks like Linux re-worked this himself, please see:
> 
> http://marc.theaimsgroup.com/?l=git-commits-head&m=116629555506940&w=2
> http://marc.theaimsgroup.com/?l=git-commits-head&m=116638195203710&w=2
> http://marc.theaimsgroup.com/?l=git-commits-head&m=116638195213406&w=2


Hi Doug,

Jason meant "Linus", who re-worked your patch himself.
Can you apply Linus' patch, test it on your ia64, and
re-post to RHKL?

Thanks,
  Dave




Comment 33 Doug Chapman 2006-12-22 11:33:35 EST
patch posted to rhkernel-list for review.
Comment 34 Jason Baron 2007-01-03 17:41:44 EST
committed in stream U5 build 42.37. A test kernel with this patch is available
from http://people.redhat.com/~jbaron/rhel4/
Comment 35 Doug Chapman 2007-01-04 12:49:20 EST
I have verified this in 2.6.9-42.37.EL.
Comment 37 Mike Gahagan 2007-01-16 17:16:04 EST
fix looks good in 42.0.6
Comment 39 Red Hat Bugzilla 2007-01-30 09:26:17 EST
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2007-0014.html

Note You need to log in before you can comment on or make changes to this bug.