Bug 429290

Summary: provide a futex syscall command similiar to FUTEX_WAIT with takes absolute timeout
Product: Red Hat Enterprise MRG Reporter: Roland Westrelin <roland.westrelin>
Component: realtime-kernelAssignee: Clark Williams <williams>
Status: CLOSED ERRATA QA Contact:
Severity: low Docs Contact:
Priority: low    
Version: 1.0CC: bhu, mmcallis, tglx
Target Milestone: 1.0.1   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-08-26 19:57:48 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 429292    
Attachments:
Description Flags
program to test availability of FUTEX_WAIT_BITSET none

Description Roland Westrelin 2008-01-18 15:42:29 UTC
Description of problem:

The futex syscall's command FUTEX_WAIT takes a relative time as a timeout
argument. The timeout is then converted to an absolute time and handled by the
high-res timer subsystem.

Using the futex syscall to wake-up (release) a realtime thread at an absolute
time T requires that the realtime thread converts T to a relative time dt
before the futex syscall. Then dt is converted back from an absolute time T' by
the kernel. In most cases, T and T' are very similar. If a preemption occurs
between the convertion from T to dt by the realtime thread and the conversion
from dt to T' by the kernel, then T' and T can be very different. Using the
futex syscall with the FUTEX_WAIT command is thus unuseable to achieve
deterministic release of realtime threads.

We are requesting a new futex command identical to FUTEX_WAIT except it would
take an absolute time as a timeout.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Roland Westrelin 2008-01-18 15:45:26 UTC
see also 429292.

Comment 3 Clark Williams 2008-07-09 19:19:35 UTC
The upstream -rt kernels have a new FUTEX operation called FUTEX_WAIT_BITSET
that will do what you want. We've backported the relevant bits from 2.6.26-rc9
to our 2.6.24.7-rt14 based kernel, specifically to allow absolute timeouts with
a futex_wait operation. 

The calling changes required will be:
  1. Use FUTEX_WAIT_BITSET instead of FUTEX_WAIT
  2. Pass in an absolute timespec
  3. Add an additional argument FUTEX_BITSET_MATCH_ANY to the syscall

The additional argument is the sixth arg to sys_futex (turns into val3 in
do_futex). 

The upcoming -72 kernel will have this backport and should be available on
Thursday (tomorrow).



Comment 4 Clark Williams 2008-07-16 17:44:35 UTC
Any thoughts on whether FUTEX_WAIT_BITSET will be useful to Sun's JVM?

Comment 5 Roland Westrelin 2008-07-17 07:51:02 UTC
It's too late in our release cycle for us to take advantage of the new futex
argument. We'll add support in our next update. It will then be useful for
troubleshooting. Ultimately, what we really need is the fix to propagate to the
libc (429292).

Comment 8 Clark Williams 2008-08-06 21:07:03 UTC
Created attachment 313643 [details]
program to test availability of FUTEX_WAIT_BITSET

build with:

gcc -o futex-wait-bitset futex-wait-bitset.c

and then run. Success prints "operaton successful" with 0 exit value.

Comment 12 errata-xmlrpc 2008-08-26 19:57:48 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2008-0585.html

Comment 13 Roland Westrelin 2009-06-12 15:47:36 UTC
I'm working on taking advantage of the FUTEX_WAIT_BITSET futex command. When I run your attached testcase, on one configuration it hangs: a 32bit binary of the test on a 32bit OS works OK, a 64bit binary of the test on a 64bit OS works OK but a 32bit binary on a 64bit OS hangs. This is with the -108 kernels.