+++ This bug was initially created as a clone of Bug #180663, see that bug for
reproducer and patches. This is the public parent tracking bug +++
Description of problem:
There is a race condition that can be easily triggered by the mincore system
call that will cause "ps -ef" (and likley other things that need to lock
mmap_sem) to hang. Since this can be used as a DOS attack I am considering this
a security issue.
I can easily reproduce this on my ia64 hardware. I do not expect it to be
hardware specific however (but I will verify when I get access to other hardware).
The situation that causes this is:
1. thread A calls mincore with the *vec argument pointing to a page that is not
in core (i.e. a freshly mmapped page). In the mincore system call it calls
down_read on mmap_sem.
2. a second thread in the same process calls mmap to map a new page, this calls
down_write on mmap_sem but blocks due to the reader in the first thread.
3. the first thread then calls copy_to_user to copy the info regarding which
pages are in core to the address specified by *vec. Since this page is not yet
in core a page fault is triggered. The page fault handler calls down_read on
mmap_sem even though this task already has a read lock on it. Since there is a
waiting writer this call to down_read blocks. Since this is the thread with the
original read_lock on mmap_sem we have deadlock.
4. the DOS issue then comes from any user doing a "ps -ef" which needs to do
down_read on mmap_sem for each process. This blocks.
Version-Release number of selected component (if applicable):
Seen on the latest RHEL4 kernel (2.6.9-30) but is also broken up stream. I will
check RHEL3 soon.
All the time (on ia64 SMP however I expect this is not arch dependent)
Steps to Reproduce:
1. compile the attached reproducer with "cc mincore_hang.c -lpthread"
3. ps -ef
The a.out and ps -ef hang and are uninterruptable
I have done some thinking on how to fix this. Since we need to be holding the
read lock on mmap_sem while walking the vma list perhaps instead of copying the
info into user space while holding the lock we could copy into some kmalloc'ed
memory temporarily and the copy_to_user after we drop the read lock. I don't
really like this solution since it means allocating potentially large chunks via
kmalloc (i.e. some really big app on a 1TB system wants to check all of its
memory to see what is in core).
Perhaps a better solution would be to ensure the memory pointed to by *vec is in
core before we take the lock and mlock it temporarily.
As you surmise, it reproduces on a RHEL4 x86_64, and on
a UP ia64 RHEL3 box.
This was addressed via:
Red Hat Enterprise Linux version 4 (RHSA-2007:0014)
Red Hat Enterprise Linux version 3 (RHSA-2008:0211)
Red Hat Linux Advanced Workstation 2.1 (RHSA-2008:0787)
Red Hat Enterprise Linux version 2.1 (RHSA-2009:0001)