Description of problem: pthread mutexes/condition variables have a process private attribute. Operations performed on a futex through a system call can be marked as process private. The 2.5 libc that ships with RHEL 5 does not pass this pthread mutex/condvar attribute to the kernel when a futex syscall is performed. When not marked process private, the futex operation synchronizes with the mmap syscall through a semaphore. This results in extra latency observed by realtime threads on mutex/condvar operations when a non realtime thread performs completely unrelated mmap calls. The 2.7 glibc does partially fix the problem as it passes the process private attributes to the futex syscall for most of the pthread mutexes/condvars primitives. However the 2.7 glibc only partially solves the problem as the priority inheritance variant of mutexes do not pass the process private attribute through the futex calls. From the 2.7 glibc NEWS file: * Handle private futexes in the NPTL implementation. Implemented by Jakub Jelinek and Ulrich Drepper. The extra changes based on the 2.7 glibc that are needed as far as I can tell are in the attached file. Version-Release number of selected component (if applicable): How reproducible: Performance degradation is observed systematically for short runs of one of our simple performance test. Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
Created attachment 269731 [details] patch again 2.7 glibc
Can you please provide more justification for the need for this enhancement? I checked with our glibc guys who say that the behavior is currently correct. Was there a question on whether this is a correctness issue vs a request for a performance enhancement?
This a request for a performance enhancement.
As originally stated in the second paragraph: "This results in extra latency observed by realtime threads on mutex/condvar operations when a non realtime thread performs completely unrelated mmap calls." This is a real-time performance issue.
I inquired with our glibc guys who state that this change would be extremely complicated to backport. Something to keep in mind is that we have the same glibc for realtime as standard RHEL5 - there isn't a separate realtime glibc. Hence we need to abide by the acceptance criteria for standard RHEL. Which primarily means that stability is the main criteria. IOW, we don't want to introduce potentially destabilizing features. While there may be some minor perf enhancement to this feature, it is not seen as being worth the risk. Can you provide more justification for the request. Is this absolutely required to pass RTSJ? Have you measured the incremental observed performance benefit? Thanks for any additional info.
Tim, pending further info, might I suggest discussing the performance implications of this with Thomas Gleixner (if you haven't already). He is very aware of the latency impact of the mmap semaphore. I understand the stability issues with backporting this kind of change - particularly as the full fix requires changes that are not yet in glibc. But for me, in a real-time system, latency is more than just a "minor perf enhancement". If a full fix were to go in glibc 2.8 (say) what would be your normal time-frame for advancing to that glibc version? Thanks.
We don't wholesale update glibc in RHEL5 updates. Rather that would occur in a major release, such as RHEL6.
I'd like to reemphasize that there are 2 issues here: - the first one is that support for the pthread mutexes/condvars process private attribute is only partial in glibc 2.7 as it is not implemented for PI mutexes. That's certainly something we'd like to see fixed. If it was omitted on purpose, we'd like to know why. - the second one is that we'd like to see the full support for process private pthread mutexes/condvars (including the fix we request for glibc 2.7) in RHEL RT. Here are more data for the performance implication of the process private attribute: First, the issue we observe is on a realtime (java) benchmark. So we are not complaining about mean execution time but worse case execution time which is what defines performance of a realtime system. The benchmark programs a realtime thread so that it is woken up at a particular absolute time in the future. Condition variables and mutexes are involved in the process of waking up the thread when the absolute time is reached. We measure the absolute time at which the thread is effectively woken up and back to executing java code. The difference between the measured time and the requested time is called the latency. The benchmark is run with and without some load, including some load that triggers the garbage collector. Without the load, we measure a latency below 100 microseconds. With the load, the latency jumps above 1 millisecond. Against, worst case execution is all that matters to us in this case. The performance is decreased by a factor 10. The drop in performance is due to mmaps performed by non-realtime activities in the java VM. There are many reasons why a non-realtime activities would do a mmap: malloc is one of them, growing or shrinking the java heap is another one.
This product has been discontinued or is no longer tracked in Red Hat Bugzilla.