Description of problem: The futex syscall's command FUTEX_WAIT takes a relative time as a timeout argument. The timeout is then converted to an absolute time and handled by the high-res timer subsystem. Using the futex syscall to wake-up (release) a realtime thread at an absolute time T requires that the realtime thread converts T to a relative time dt before the futex syscall. Then dt is converted back from an absolute time T' by the kernel. In most cases, T and T' are very similar. If a preemption occurs between the convertion from T to dt by the realtime thread and the conversion from dt to T' by the kernel, then T' and T can be very different. Using the futex syscall with the FUTEX_WAIT command is thus unuseable to achieve deterministic release of realtime threads. We are requesting a new futex command identical to FUTEX_WAIT except it would take an absolute time as a timeout. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
see also 429292.
The upstream -rt kernels have a new FUTEX operation called FUTEX_WAIT_BITSET that will do what you want. We've backported the relevant bits from 2.6.26-rc9 to our 2.6.24.7-rt14 based kernel, specifically to allow absolute timeouts with a futex_wait operation. The calling changes required will be: 1. Use FUTEX_WAIT_BITSET instead of FUTEX_WAIT 2. Pass in an absolute timespec 3. Add an additional argument FUTEX_BITSET_MATCH_ANY to the syscall The additional argument is the sixth arg to sys_futex (turns into val3 in do_futex). The upcoming -72 kernel will have this backport and should be available on Thursday (tomorrow).
Any thoughts on whether FUTEX_WAIT_BITSET will be useful to Sun's JVM?
It's too late in our release cycle for us to take advantage of the new futex argument. We'll add support in our next update. It will then be useful for troubleshooting. Ultimately, what we really need is the fix to propagate to the libc (429292).
Created attachment 313643 [details] program to test availability of FUTEX_WAIT_BITSET build with: gcc -o futex-wait-bitset futex-wait-bitset.c and then run. Success prints "operaton successful" with 0 exit value.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2008-0585.html
I'm working on taking advantage of the FUTEX_WAIT_BITSET futex command. When I run your attached testcase, on one configuration it hangs: a 32bit binary of the test on a 32bit OS works OK, a 64bit binary of the test on a 64bit OS works OK but a 32bit binary on a 64bit OS hangs. This is with the -108 kernels.