Bug 400981 - pthread mutexes/condvars process private attribute not passed to kernel futex calls
Summary: pthread mutexes/condvars process private attribute not passed to kernel futex...
Status: NEW
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: distribution   
(Show other bugs)
Version: 1.0
Hardware: i386
OS: Linux
Target Milestone: ---
: ---
Assignee: mrg-program-list
QA Contact: MRG Quality Engineering
Depends On:
TreeView+ depends on / blocked
Reported: 2007-11-27 14:14 UTC by Roland Westrelin
Modified: 2014-09-11 11:22 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed:
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
patch again 2.7 glibc (3.89 KB, application/octet-stream)
2007-11-27 14:14 UTC, Roland Westrelin
no flags Details

Description Roland Westrelin 2007-11-27 14:14:33 UTC
Description of problem:

pthread mutexes/condition variables have a process private attribute. Operations
performed on a futex through a system call can be marked as process private. The
 2.5 libc that ships with RHEL 5 does not pass this pthread mutex/condvar
attribute to the kernel when a futex syscall is performed. 

When not marked process private, the futex operation synchronizes with the mmap
syscall through a semaphore. This results in extra latency observed by realtime
threads on mutex/condvar operations when a non realtime thread performs
completely unrelated mmap calls.

The 2.7 glibc does partially fix the problem as it passes the process private
attributes to the futex syscall for most of the pthread mutexes/condvars
primitives. However the 2.7 glibc only partially solves the problem as the
priority inheritance variant of mutexes do not pass the process private
attribute through the futex calls.

From the 2.7 glibc NEWS file:

* Handle private futexes in the NPTL implementation.
  Implemented by Jakub Jelinek and Ulrich Drepper.

The extra changes based on the 2.7 glibc that are needed as far as I can tell
are in the attached file.

Version-Release number of selected component (if applicable):

How reproducible:

Performance degradation is observed systematically for short runs of one of our
simple performance test.

Steps to Reproduce:
Actual results:

Expected results:

Additional info:

Comment 1 Roland Westrelin 2007-11-27 14:14:33 UTC
Created attachment 269731 [details]
patch again 2.7 glibc

Comment 2 Tim Burke 2007-12-17 13:18:10 UTC
Can you please provide more justification for the need for this enhancement?  I
checked with our glibc guys who say that the behavior is currently correct. Was
there a question on whether this is a correctness issue vs a request for a
performance enhancement?

Comment 3 Roland Westrelin 2007-12-17 14:05:28 UTC
This a request for a performance enhancement.

Comment 4 David Holmes 2007-12-17 23:04:55 UTC
As originally stated in the second paragraph:

"This results in extra latency observed by realtime
threads on mutex/condvar operations when a non realtime thread performs
completely unrelated mmap calls."

This is a real-time performance issue.

Comment 6 Tim Burke 2007-12-18 13:15:13 UTC
I inquired with our glibc guys who state that this change would be extremely
complicated to backport. Something to keep in mind is that we have the same
glibc for realtime as standard RHEL5 - there isn't a separate realtime glibc. 
Hence we need to abide by the acceptance criteria for standard RHEL.  Which
primarily means that stability is the main criteria.  IOW, we don't want to
introduce potentially destabilizing features.

While there may be some minor perf enhancement to this feature, it is not seen
as being worth the risk.

Can you provide more justification for the request.  Is this absolutely required
to pass RTSJ?  Have you measured the incremental observed performance benefit?

Thanks for any additional info.

Comment 7 David Holmes 2007-12-19 04:31:58 UTC
Tim, pending further info, might I suggest discussing the performance
implications of this with Thomas Gleixner (if you haven't already). He is very
aware of the latency impact of the mmap semaphore.

I understand the stability issues with backporting this kind of change -
particularly as the full fix requires changes that are not yet in glibc. But for
me, in a real-time system, latency is more than just a "minor perf enhancement".
If a full fix were to go in glibc 2.8 (say) what would be your normal time-frame
for advancing to that glibc version?


Comment 8 Tim Burke 2007-12-20 13:35:53 UTC
We don't wholesale update glibc in RHEL5 updates. Rather that would occur in a
major release, such as RHEL6.

Comment 10 Roland Westrelin 2007-12-21 15:04:57 UTC
I'd like to reemphasize that there are 2 issues here:

- the first one is that support for the pthread mutexes/condvars process private
attribute is only partial in glibc 2.7 as it is not implemented for PI mutexes.
That's certainly something we'd like to see fixed. If it was omitted on purpose,
we'd like to know why.

- the second one is that we'd like to see the full support for process private
pthread mutexes/condvars (including the fix we request for glibc 2.7) in RHEL RT.

Here are more data for the performance implication of the process private attribute:

First, the issue we observe is on a realtime (java) benchmark. So we are not
complaining about mean execution time but worse case execution time which is
what defines performance of a realtime system. The benchmark programs a realtime
thread so that it is woken up at a particular absolute time in the future.
Condition variables and mutexes are involved in the process of waking up the
thread when the absolute time is reached. We measure the absolute time at which
the thread is effectively woken up and back to executing java code. The
difference between the measured time and the requested time is called the latency.

The benchmark is run with and without some load, including some load that
triggers the garbage collector. Without the load, we measure a latency below 100
microseconds. With the load, the latency jumps above 1 millisecond. Against,
worst case execution is all that matters to us in this case. The performance is
decreased by a factor 10.

The drop in performance is due to mmaps performed by non-realtime activities in
the java VM. There are many reasons why a non-realtime activities would do a
mmap: malloc is one of them, growing or shrinking the java heap is another one.

Note You need to log in before you can comment on or make changes to this bug.