Bug 1064059

Summary: clock_nanosleep returns early with TIMER_ABSTIME
Product: Red Hat Enterprise Linux 7 Reporter: Patsy Griffin <pfrankli>
Component: kernelAssignee: Stanislaw Gruszka <sgruszka>
Status: CLOSED ERRATA QA Contact: Qiao Zhao <qzhao>
Severity: unspecified Docs Contact:
Priority: medium    
Version: 7.0CC: ashankar, lilu, mnewsome, ovasik, pfrankli, qzhao
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: kernel-3.10.0-290.el7 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-11-19 20:02:57 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Reduced test case none

Description Patsy Griffin 2014-02-11 22:21:23 UTC
Description of problem:
glibc test suite - tst-cpuclock2 test fails on:
i686. ppc, ppc64, x86_64

Version-Release number of selected component (if applicable):
2.17-48.el7

How reproducible:
Fails consistently on RHEL 7.0


Steps to Reproduce:
1. See build log

Comment 4 Siddhesh Poyarekar 2014-09-09 11:39:34 UTC
I haven't seen this failure on recent rawhide builds, which led me to look at whether there may have been a recent kernel bug that may have been fixed.  This looks quite related:

https://lkml.org/lkml/2014/6/24/20

Testing to see if I am right.

Comment 5 Siddhesh Poyarekar 2014-09-10 05:58:57 UTC
Nope, that wasn't it.  In fact, I checked manually on rawhide and the test still can fail:

live thread clock ffffffffffff98ee resolution 0.000000001
live thread before sleep => 0.000657509
self thread before sleep => 0.016908503
live thread after sleep => 0.490654902
self thread after sleep => 0.017012290
clock_nanosleep on process slept 99885847 (outside reasonable range)

Looking deeper, it seems like clock_nanosleep may be returning earlier and that clock_gettime is probably just reporting what it sees.  The clock_nanosleep wrapper in glibc is also quite minimal, so it still looks like a kernel bug to me.

Still working on it.

Comment 6 Siddhesh Poyarekar 2014-09-10 13:30:08 UTC
Created attachment 936152 [details]
Reduced test case

Compile with:

cc -o tst-cpuclock2 -std=gnu99 tst-cpuclock2.c -g -pthread -lrt -Wall

and run it like so:

while ./tst-cpuclock2; do true; done

Comment 7 Siddhesh Poyarekar 2014-09-10 13:32:48 UTC
Reassigning to kernel because this does not look like a glibc bug.  The first thing I verified is that clock_gettime(CLOCK_PROCESS_CPUTIME_ID) was monotonic using the following simple program:

~~~
#include <time.h>
#include <stdint.h>
#include <stdio.h>
#include <pthread.h>
#include <stdlib.h>

pthread_barrier_t barrier;

/* Help advance the clock.  */
static void *
chew_cpu (void *u)
{
  pthread_barrier_wait (&barrier);
  while (1);

  return NULL;
}

static void
verify_time (struct timespec *b, struct timespec *a)
{
  unsigned long long bi = 1000000000ULL * b->tv_sec + b->tv_nsec;
  unsigned long long ai = 1000000000ULL * a->tv_sec + a->tv_nsec;

  if (ai < bi)
    {
      printf ("clock went backwards from %llu.%llu to %llu.%llu\n",
              b->tv_sec, b->tv_nsec, a->tv_sec, a->tv_nsec);
    }
}

int
main (void)
{
  struct timespec before, after;
  clock_gettime (CLOCK_PROCESS_CPUTIME_ID, &before);
  after = before;

  pthread_t th;   

  pthread_barrier_init (&barrier, NULL, 2);

  if (pthread_create (&th, NULL, chew_cpu, NULL) != 0)
    {
      perror ("pthread_create");
      return 1;
    }

  pthread_barrier_wait (&barrier);

  while (1)
    {
      clock_gettime (CLOCK_PROCESS_CPUTIME_ID, &after);
      verify_time (&before, &after);

      before = after;
    }
}
~~~

This runs for hours without printing any errors.  the only remaining possibility in the attached test case then is that of clock_nanosleep() syscall incorrectly returning early for the absolute case.

Comment 10 Siddhesh Poyarekar 2014-11-13 05:32:54 UTC
*** Bug 1163507 has been marked as a duplicate of this bug. ***

Comment 16 Rafael Aquini 2015-07-03 14:17:09 UTC
Patch(es) available on kernel-3.10.0-290.el7

Comment 22 errata-xmlrpc 2015-11-19 20:02:57 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-2152.html