Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 452693

Summary: POSIX timer set to fire immediately does not fire
Product: Red Hat Enterprise MRG Reporter: Roland Westrelin <roland.westrelin>
Component: realtime-kernelAssignee: Steven Rostedt <srostedt>
Status: CLOSED ERRATA QA Contact:
Severity: high Docs Contact:
Priority: low    
Version: 1.0CC: bhu, David.Holmes, pzijlstr, srostedt, tglx
Target Milestone: 1.0.1   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-08-26 19:57:46 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
reproducer for timer_settime(3p) bug
none
updated reproducer for timer_settime(3p) problem
none
shell script to run reproducer until it fails
none
hrtimer: prevent migration for raising CPU none

Description Roland Westrelin 2008-06-24 14:52:53 UTC
Description of problem:

We sometime have a strange behaviour with POSIX timers where we program
a timer to fire immediately (get the current time with clock_gettime(()
and program the timer with timer_settime()) but the timer's signal is
never delivered.

This behaviour can be observed on all 2.6.24.7 kernels but not on the
2.6.24.4 kernels which make me suspect a kernel bug.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Clark Williams 2008-06-27 21:11:35 UTC
Created attachment 310481 [details]
reproducer for timer_settime(3p) bug

Test program that shows the timer_settime(3p) bug

Comment 2 Clark Williams 2008-06-30 15:50:37 UTC
Created attachment 310594 [details]
updated reproducer for timer_settime(3p) problem

Changed to just detect signal fire rather than use pause

Comment 3 Clark Williams 2008-06-30 15:53:01 UTC
When i wrote the above reproducer, I noticed that I was seeing the printf from
the signal handler before entering the pause. Luis changed the test case to just
set a variable, usleep for a bit after setting the timer, then check if the
signal handler had modified the variable. 

So far we have not seen a case on our -rt kernels where the signal has not been
delivered. Do you have a different reproducer we could try?

Comment 4 Clark Williams 2008-06-30 16:40:47 UTC
Created attachment 310598 [details]
shell script to run reproducer until it fails

Shell script to run the reproducer either as sched_other or sched_fifo until it
fails or until ctl-C is hit

Comment 5 Clark Williams 2008-06-30 16:42:09 UTC
Update since I last posted; we have not seen this behavior on a SCHED_OTHER
thread, but can reliably reproduce it on a SCHED_FIFO thread. 

Comment 6 Luis Claudio R. Goncalves 2008-06-30 17:59:51 UTC
After doing different tests I also noticed that when the reproducer runs as
SCHED_FIFO we have some eventual fails. It doesn't matter whether it runs at
priority 2, 30 or 97, as long as it runs as SCHED_FIFO.

I started a new set of tests to narrow this issue down.

As a side note, I was unable to reproduce this behavior with the rt-vanilla kernel.

Comment 7 Clark Williams 2008-07-03 20:54:22 UTC
Created attachment 310963 [details]
hrtimer: prevent migration for raising CPU

From: Steven Rostedt <srostedt>
Subject: hrtimer: prevent migration for raising CPU

Due to a possible deadlock, the waking of the softirq was pushed outside
of the hrtimer base locks. Unfortunately this allows the task to migrate
after setting up the softirq and raising it. Since softirqs run a queue that
is per-cpu we may raise the softirq on the wrong CPU and this will keep
the queued softirq task from running.

To solve this issue, this patch disables preemption around the releasing
of the hrtimer lock and raising of the softirq.

Comment 8 Roland Westrelin 2008-07-15 08:13:06 UTC
I confirm this is fixed in -72.

Comment 12 errata-xmlrpc 2008-08-26 19:57:46 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2008-0585.html