Red Hat Bugzilla – Bug 452693
POSIX timer set to fire immediately does not fire
Last modified: 2008-08-26 15:57:46 EDT
Description of problem: We sometime have a strange behaviour with POSIX timers where we program a timer to fire immediately (get the current time with clock_gettime(() and program the timer with timer_settime()) but the timer's signal is never delivered. This behaviour can be observed on all 2.6.24.7 kernels but not on the 2.6.24.4 kernels which make me suspect a kernel bug. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
Created attachment 310481 [details] reproducer for timer_settime(3p) bug Test program that shows the timer_settime(3p) bug
Created attachment 310594 [details] updated reproducer for timer_settime(3p) problem Changed to just detect signal fire rather than use pause
When i wrote the above reproducer, I noticed that I was seeing the printf from the signal handler before entering the pause. Luis changed the test case to just set a variable, usleep for a bit after setting the timer, then check if the signal handler had modified the variable. So far we have not seen a case on our -rt kernels where the signal has not been delivered. Do you have a different reproducer we could try?
Created attachment 310598 [details] shell script to run reproducer until it fails Shell script to run the reproducer either as sched_other or sched_fifo until it fails or until ctl-C is hit
Update since I last posted; we have not seen this behavior on a SCHED_OTHER thread, but can reliably reproduce it on a SCHED_FIFO thread.
After doing different tests I also noticed that when the reproducer runs as SCHED_FIFO we have some eventual fails. It doesn't matter whether it runs at priority 2, 30 or 97, as long as it runs as SCHED_FIFO. I started a new set of tests to narrow this issue down. As a side note, I was unable to reproduce this behavior with the rt-vanilla kernel.
Created attachment 310963 [details] hrtimer: prevent migration for raising CPU From: Steven Rostedt <srostedt@redhat.com> Subject: hrtimer: prevent migration for raising CPU Due to a possible deadlock, the waking of the softirq was pushed outside of the hrtimer base locks. Unfortunately this allows the task to migrate after setting up the softirq and raising it. Since softirqs run a queue that is per-cpu we may raise the softirq on the wrong CPU and this will keep the queued softirq task from running. To solve this issue, this patch disables preemption around the releasing of the hrtimer lock and raising of the softirq.
I confirm this is fixed in -72.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2008-0585.html