Bug 435734

Summary: page fault handling may keep mmap_sem for a long time
Product: Red Hat Enterprise MRG Reporter: Roland Westrelin <roland.westrelin>
Component: realtime-kernelAssignee: Peter Zijlstra <pzijlstr>
Status: CLOSED WONTFIX QA Contact:
Severity: medium Docs Contact:
Priority: low    
Version: 1.0CC: bhu, lwang, williams
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-01-05 21:10:14 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Roland Westrelin 2008-03-03 16:37:08 UTC
Description of problem:

page fault handling may end up in file system code with mmap_sem held, keep
mmap_sem locked for a long time and thus delay other thread that need the mmap_sem.

For instance, we found threads blocked owning the mmap_sem with the following
stack (with 2.6.21-57.el5rt kernel):
do_page_fault() grabs the mmap_sem
then calls:
handle_mm_fault()
__handle_mm_fault()
handle_pte_fault()
do_wp_page()
file_update_time()
mark_inode_dirty_sync()
__mark_inode_dirty()
ext3_dirty_inode()
ext3_mark_inode_dirty()
ext3_reserve_inode_write()
ext3_journal_get_write_access()
__ext3_journal_get_write_access()
journal_get_write_access()
do_get_write_access()


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Peter Zijlstra 2008-03-05 10:56:59 UTC
What workload are you seeing problems in? Typically paging is discouraged in RT
workloads due to its non-deterministic character. Also, what contenders for the
mmap_sem do you have, other page-faults, or something else?

Comment 2 Roland Westrelin 2008-03-19 15:08:05 UTC
The workload is a real-time java program running on top of a real-time JVM. We
are seeing a RT thread delayed by a non-realtime thread performing mmap I/Os.
When that happens the RT thread is in the kernel, handling a write to a
write-protected memory area (a mechanism we use to have the thread stop
executing its java code and then from the SIGSEV do some work on behalf of the VM).