Bug 1064059
Summary: | clock_nanosleep returns early with TIMER_ABSTIME | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Patsy Griffin <pfrankli> | ||||
Component: | kernel | Assignee: | Stanislaw Gruszka <sgruszka> | ||||
Status: | CLOSED ERRATA | QA Contact: | Qiao Zhao <qzhao> | ||||
Severity: | unspecified | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | 7.0 | CC: | ashankar, lilu, mnewsome, ovasik, pfrankli, qzhao | ||||
Target Milestone: | rc | ||||||
Target Release: | --- | ||||||
Hardware: | Unspecified | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | kernel-3.10.0-290.el7 | Doc Type: | Bug Fix | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2015-11-19 20:02:57 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Patsy Griffin
2014-02-11 22:21:23 UTC
I haven't seen this failure on recent rawhide builds, which led me to look at whether there may have been a recent kernel bug that may have been fixed. This looks quite related: https://lkml.org/lkml/2014/6/24/20 Testing to see if I am right. Nope, that wasn't it. In fact, I checked manually on rawhide and the test still can fail: live thread clock ffffffffffff98ee resolution 0.000000001 live thread before sleep => 0.000657509 self thread before sleep => 0.016908503 live thread after sleep => 0.490654902 self thread after sleep => 0.017012290 clock_nanosleep on process slept 99885847 (outside reasonable range) Looking deeper, it seems like clock_nanosleep may be returning earlier and that clock_gettime is probably just reporting what it sees. The clock_nanosleep wrapper in glibc is also quite minimal, so it still looks like a kernel bug to me. Still working on it. Created attachment 936152 [details]
Reduced test case
Compile with:
cc -o tst-cpuclock2 -std=gnu99 tst-cpuclock2.c -g -pthread -lrt -Wall
and run it like so:
while ./tst-cpuclock2; do true; done
Reassigning to kernel because this does not look like a glibc bug. The first thing I verified is that clock_gettime(CLOCK_PROCESS_CPUTIME_ID) was monotonic using the following simple program: ~~~ #include <time.h> #include <stdint.h> #include <stdio.h> #include <pthread.h> #include <stdlib.h> pthread_barrier_t barrier; /* Help advance the clock. */ static void * chew_cpu (void *u) { pthread_barrier_wait (&barrier); while (1); return NULL; } static void verify_time (struct timespec *b, struct timespec *a) { unsigned long long bi = 1000000000ULL * b->tv_sec + b->tv_nsec; unsigned long long ai = 1000000000ULL * a->tv_sec + a->tv_nsec; if (ai < bi) { printf ("clock went backwards from %llu.%llu to %llu.%llu\n", b->tv_sec, b->tv_nsec, a->tv_sec, a->tv_nsec); } } int main (void) { struct timespec before, after; clock_gettime (CLOCK_PROCESS_CPUTIME_ID, &before); after = before; pthread_t th; pthread_barrier_init (&barrier, NULL, 2); if (pthread_create (&th, NULL, chew_cpu, NULL) != 0) { perror ("pthread_create"); return 1; } pthread_barrier_wait (&barrier); while (1) { clock_gettime (CLOCK_PROCESS_CPUTIME_ID, &after); verify_time (&before, &after); before = after; } } ~~~ This runs for hours without printing any errors. the only remaining possibility in the attached test case then is that of clock_nanosleep() syscall incorrectly returning early for the absolute case. *** Bug 1163507 has been marked as a duplicate of this bug. *** Fix for the bug: https://git.kernel.org/cgit/linux/kernel/git/peterz/queue.git/commit/?h=sched/urgent&id=a7fa360f736d209d6fd6cfc98132b14eff62dc51 Patch(es) available on kernel-3.10.0-290.el7 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2015-2152.html |