Red Hat Bugzilla – Bug 461871
processes holds mmap_sem
Last modified: 2009-08-11 07:08:28 EDT
Description of problem:
A multi threaded single process writing to an ext3 file system obtain a semaphore (i_sem in dio_refill_pages() ) and sits blocked on obtaining the process's mmap_sem.
A another thread is faulted taking mmap_sem for read and ends up sleeping indefinitely in start_this_handle().
There is a third thread doing a sys_mmap or sys_unmap, blocked waiting to for access to the processes mmap_sem, therefore creating a deadlock.
Nesting order of paired locks must always be preserved or risk deadlock. For i_sem and mmap_sem, i_sem must always be held first.
My explanation is probably not the best, so here is an example trace of a process in the stuck state below.
Thread 1 [pid: 31776]
generic_file_aio_write [SUCCESS: down(i_sem);]
Thread 2 [pid: 11951]
handle_mm_fault [SUCCESS: down_read(mmap_sem);]
start_this_handle [wait for j_wait_transaction_locked]
Thread 3 [pid: 11885]
Thread 4 [pid: 31790]
log_wait_commit [wait for j_wait_done_commit]
Thread 5 [pid: 2783]
journal_commit_transaction [wait for j_wait_updates]
Version-Release number of selected component (if applicable):
Very, very rare.
Steps to Reproduce:
1. Start a multi threaded process that writes to disk using direct IO.
2. One thread writing to disk, the other reading from /proc/<pid>/cmdline
3. Wait about 30,000 hours, it should happen a few times.
Process blocks on read from file in proc.
The patch to be attached is an attempt of a backport of the patch in 2.6.25 code change ("ext3: fix lock inversion in direct IO" commit bd1939de9061dbc5cac44ffb4425aaf4c9b894f1).
I don't have the machine hours to test this in the same way that our customer has, although they believe that this modification solves the issue at hand.
hrm, i could have sworn i fixed this already, let me dig up the bz.
*** This bug has been marked as a duplicate of bug 381221 ***