Description of problem: page fault handling may end up in file system code with mmap_sem held, keep mmap_sem locked for a long time and thus delay other thread that need the mmap_sem. For instance, we found threads blocked owning the mmap_sem with the following stack (with 2.6.21-57.el5rt kernel): do_page_fault() grabs the mmap_sem then calls: handle_mm_fault() __handle_mm_fault() handle_pte_fault() do_wp_page() file_update_time() mark_inode_dirty_sync() __mark_inode_dirty() ext3_dirty_inode() ext3_mark_inode_dirty() ext3_reserve_inode_write() ext3_journal_get_write_access() __ext3_journal_get_write_access() journal_get_write_access() do_get_write_access() Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
What workload are you seeing problems in? Typically paging is discouraged in RT workloads due to its non-deterministic character. Also, what contenders for the mmap_sem do you have, other page-faults, or something else?
The workload is a real-time java program running on top of a real-time JVM. We are seeing a RT thread delayed by a non-realtime thread performing mmap I/Os. When that happens the RT thread is in the kernel, handling a write to a write-protected memory area (a mechanism we use to have the thread stop executing its java code and then from the SIGSEV do some work on behalf of the VM).