Red Hat Bugzilla – Bug 1064059
clock_nanosleep returns early with TIMER_ABSTIME
Last modified: 2015-12-01 02:57:27 EST
Description of problem: glibc test suite - tst-cpuclock2 test fails on: i686. ppc, ppc64, x86_64 Version-Release number of selected component (if applicable): 2.17-48.el7 How reproducible: Fails consistently on RHEL 7.0 Steps to Reproduce: 1. See build log
I haven't seen this failure on recent rawhide builds, which led me to look at whether there may have been a recent kernel bug that may have been fixed. This looks quite related: https://lkml.org/lkml/2014/6/24/20 Testing to see if I am right.
Nope, that wasn't it. In fact, I checked manually on rawhide and the test still can fail: live thread clock ffffffffffff98ee resolution 0.000000001 live thread before sleep => 0.000657509 self thread before sleep => 0.016908503 live thread after sleep => 0.490654902 self thread after sleep => 0.017012290 clock_nanosleep on process slept 99885847 (outside reasonable range) Looking deeper, it seems like clock_nanosleep may be returning earlier and that clock_gettime is probably just reporting what it sees. The clock_nanosleep wrapper in glibc is also quite minimal, so it still looks like a kernel bug to me. Still working on it.
Created attachment 936152 [details] Reduced test case Compile with: cc -o tst-cpuclock2 -std=gnu99 tst-cpuclock2.c -g -pthread -lrt -Wall and run it like so: while ./tst-cpuclock2; do true; done
Reassigning to kernel because this does not look like a glibc bug. The first thing I verified is that clock_gettime(CLOCK_PROCESS_CPUTIME_ID) was monotonic using the following simple program: ~~~ #include <time.h> #include <stdint.h> #include <stdio.h> #include <pthread.h> #include <stdlib.h> pthread_barrier_t barrier; /* Help advance the clock. */ static void * chew_cpu (void *u) { pthread_barrier_wait (&barrier); while (1); return NULL; } static void verify_time (struct timespec *b, struct timespec *a) { unsigned long long bi = 1000000000ULL * b->tv_sec + b->tv_nsec; unsigned long long ai = 1000000000ULL * a->tv_sec + a->tv_nsec; if (ai < bi) { printf ("clock went backwards from %llu.%llu to %llu.%llu\n", b->tv_sec, b->tv_nsec, a->tv_sec, a->tv_nsec); } } int main (void) { struct timespec before, after; clock_gettime (CLOCK_PROCESS_CPUTIME_ID, &before); after = before; pthread_t th; pthread_barrier_init (&barrier, NULL, 2); if (pthread_create (&th, NULL, chew_cpu, NULL) != 0) { perror ("pthread_create"); return 1; } pthread_barrier_wait (&barrier); while (1) { clock_gettime (CLOCK_PROCESS_CPUTIME_ID, &after); verify_time (&before, &after); before = after; } } ~~~ This runs for hours without printing any errors. the only remaining possibility in the attached test case then is that of clock_nanosleep() syscall incorrectly returning early for the absolute case.
*** Bug 1163507 has been marked as a duplicate of this bug. ***
Fix for the bug: https://git.kernel.org/cgit/linux/kernel/git/peterz/queue.git/commit/?h=sched/urgent&id=a7fa360f736d209d6fd6cfc98132b14eff62dc51
Patch(es) available on kernel-3.10.0-290.el7
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2015-2152.html