Bug 1064059
| Summary: | clock_nanosleep returns early with TIMER_ABSTIME | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Patsy Griffin <pfrankli> | ||||
| Component: | kernel | Assignee: | Stanislaw Gruszka <sgruszka> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Qiao Zhao <qzhao> | ||||
| Severity: | unspecified | Docs Contact: | |||||
| Priority: | medium | ||||||
| Version: | 7.0 | CC: | ashankar, lilu, mnewsome, ovasik, pfrankli, qzhao | ||||
| Target Milestone: | rc | ||||||
| Target Release: | --- | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | kernel-3.10.0-290.el7 | Doc Type: | Bug Fix | ||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2015-11-19 20:02:57 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
|
Description
Patsy Griffin
2014-02-11 22:21:23 UTC
I haven't seen this failure on recent rawhide builds, which led me to look at whether there may have been a recent kernel bug that may have been fixed. This looks quite related: https://lkml.org/lkml/2014/6/24/20 Testing to see if I am right. Nope, that wasn't it. In fact, I checked manually on rawhide and the test still can fail: live thread clock ffffffffffff98ee resolution 0.000000001 live thread before sleep => 0.000657509 self thread before sleep => 0.016908503 live thread after sleep => 0.490654902 self thread after sleep => 0.017012290 clock_nanosleep on process slept 99885847 (outside reasonable range) Looking deeper, it seems like clock_nanosleep may be returning earlier and that clock_gettime is probably just reporting what it sees. The clock_nanosleep wrapper in glibc is also quite minimal, so it still looks like a kernel bug to me. Still working on it. Created attachment 936152 [details]
Reduced test case
Compile with:
cc -o tst-cpuclock2 -std=gnu99 tst-cpuclock2.c -g -pthread -lrt -Wall
and run it like so:
while ./tst-cpuclock2; do true; done
Reassigning to kernel because this does not look like a glibc bug. The first thing I verified is that clock_gettime(CLOCK_PROCESS_CPUTIME_ID) was monotonic using the following simple program:
~~~
#include <time.h>
#include <stdint.h>
#include <stdio.h>
#include <pthread.h>
#include <stdlib.h>
pthread_barrier_t barrier;
/* Help advance the clock. */
static void *
chew_cpu (void *u)
{
pthread_barrier_wait (&barrier);
while (1);
return NULL;
}
static void
verify_time (struct timespec *b, struct timespec *a)
{
unsigned long long bi = 1000000000ULL * b->tv_sec + b->tv_nsec;
unsigned long long ai = 1000000000ULL * a->tv_sec + a->tv_nsec;
if (ai < bi)
{
printf ("clock went backwards from %llu.%llu to %llu.%llu\n",
b->tv_sec, b->tv_nsec, a->tv_sec, a->tv_nsec);
}
}
int
main (void)
{
struct timespec before, after;
clock_gettime (CLOCK_PROCESS_CPUTIME_ID, &before);
after = before;
pthread_t th;
pthread_barrier_init (&barrier, NULL, 2);
if (pthread_create (&th, NULL, chew_cpu, NULL) != 0)
{
perror ("pthread_create");
return 1;
}
pthread_barrier_wait (&barrier);
while (1)
{
clock_gettime (CLOCK_PROCESS_CPUTIME_ID, &after);
verify_time (&before, &after);
before = after;
}
}
~~~
This runs for hours without printing any errors. the only remaining possibility in the attached test case then is that of clock_nanosleep() syscall incorrectly returning early for the absolute case.
*** Bug 1163507 has been marked as a duplicate of this bug. *** Fix for the bug: https://git.kernel.org/cgit/linux/kernel/git/peterz/queue.git/commit/?h=sched/urgent&id=a7fa360f736d209d6fd6cfc98132b14eff62dc51 Patch(es) available on kernel-3.10.0-290.el7 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2015-2152.html |