Hide Forgot
+++ This bug was initially created as a clone of Bug #180663, see that bug for reproducer and patches. This is the public parent tracking bug +++ Description of problem: There is a race condition that can be easily triggered by the mincore system call that will cause "ps -ef" (and likley other things that need to lock mmap_sem) to hang. Since this can be used as a DOS attack I am considering this a security issue. I can easily reproduce this on my ia64 hardware. I do not expect it to be hardware specific however (but I will verify when I get access to other hardware). The situation that causes this is: 1. thread A calls mincore with the *vec argument pointing to a page that is not in core (i.e. a freshly mmapped page). In the mincore system call it calls down_read on mmap_sem. 2. a second thread in the same process calls mmap to map a new page, this calls down_write on mmap_sem but blocks due to the reader in the first thread. 3. the first thread then calls copy_to_user to copy the info regarding which pages are in core to the address specified by *vec. Since this page is not yet in core a page fault is triggered. The page fault handler calls down_read on mmap_sem even though this task already has a read lock on it. Since there is a waiting writer this call to down_read blocks. Since this is the thread with the original read_lock on mmap_sem we have deadlock. 4. the DOS issue then comes from any user doing a "ps -ef" which needs to do down_read on mmap_sem for each process. This blocks. Version-Release number of selected component (if applicable): Seen on the latest RHEL4 kernel (2.6.9-30) but is also broken up stream. I will check RHEL3 soon. How reproducible: All the time (on ia64 SMP however I expect this is not arch dependent) Steps to Reproduce: 1. compile the attached reproducer with "cc mincore_hang.c -lpthread" 2. ./a.out 3. ps -ef Actual results: The a.out and ps -ef hang and are uninterruptable I have done some thinking on how to fix this. Since we need to be holding the read lock on mmap_sem while walking the vma list perhaps instead of copying the info into user space while holding the lock we could copy into some kmalloc'ed memory temporarily and the copy_to_user after we drop the read lock. I don't really like this solution since it means allocating potentially large chunks via kmalloc (i.e. some really big app on a 1TB system wants to check all of its memory to see what is in core). Perhaps a better solution would be to ensure the memory pointed to by *vec is in core before we take the lock and mlock it temporarily. As you surmise, it reproduces on a RHEL4 x86_64, and on a UP ia64 RHEL3 box.
This was addressed via: Red Hat Enterprise Linux version 4 (RHSA-2007:0014) Red Hat Enterprise Linux version 3 (RHSA-2008:0211) Red Hat Linux Advanced Workstation 2.1 (RHSA-2008:0787) Red Hat Enterprise Linux version 2.1 (RHSA-2009:0001)