RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 742037 - clock_gettime(CLOCK_MONOTONIC, ...) is not monotonic
Summary: clock_gettime(CLOCK_MONOTONIC, ...) is not monotonic
Keywords:
Status: CLOSED DUPLICATE of bug 750201
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: kernel
Version: 6.1
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: rc
: ---
Assignee: Rik van Riel
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-09-28 20:21 UTC by Andrew Charles
Modified: 2014-06-20 13:06 UTC (History)
14 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-01-11 22:46:08 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
guest /var/log/dmesg (17.52 KB, application/octet-stream)
2011-10-19 21:29 UTC, Andrew Charles
no flags Details
host /var/log/dmesg (48.46 KB, application/octet-stream)
2011-10-20 13:20 UTC, Andrew Charles
no flags Details

Description Andrew Charles 2011-09-28 20:21:54 UTC
Description of problem:
RHEL.6 x86 guest running under KVM (that is running under RHEL6.1
on an x86 host).

Version-Release number of selected component (if applicable):
libvirt-0.8.7-18.el6.x86_64.rpm

How reproducible:
Easily.


Steps to Reproduce:

gcc -lrt clktest.c -o clktest

In the guest run as
   clktest
or
   clktest <milliseconds delay>

Longer delays between clock_gettime
calls reduce the frequency of the error.

I suspect, but have not proved, that the error is more
likely when the host (or at least whatever cores the
guest VM is running on) is busy.

++++++++++++++++++++++
#include <time.h>
#include <stdio.h>
#include <errno.h>
#include <string.h>

int
main(int argc, char *argv[])
{
struct timespec         cur;
struct timespec         prev;
unsigned int            ticks;
int                     delay;

        if(argc > 1)
        {
                delay = atoi(argv[1]);
        }
        else
        {
                delay = 1;
        }

        printf("Using delay=%u milliseconds between calls.\n", delay);

        if(clock_getres(CLOCK_MONOTONIC, &cur) != 0)
        {
                printf("Failed clock resolution read errno=%d [%s].\n",
                        errno, strerror(errno));
                return(0);
        }

        printf("Clock resolution sec=%lu nsec=%lu\n",
                cur.tv_sec, cur.tv_nsec);

        if(clock_gettime(CLOCK_MONOTONIC, &cur) != 0)
        {
                printf("Failed initial read errno=%d [%s].\n",
                        errno, strerror(errno));
                return(0);
        }

        printf("Initial time sec=%lu nsec=%lu\n",
                cur.tv_sec, cur.tv_nsec);

        ticks = 0;

        while(1)
        {
                prev = cur;

                poll(0, 0, delay);

                if(clock_gettime(CLOCK_MONOTONIC, &cur) != 0)
                {
                        printf("Failed subsequent read errno=%d [%s].\n",
                                errno, strerror(errno));
                        return(0);
                }

                if((cur.tv_sec <= prev.tv_sec) &&
                   (cur.tv_nsec < prev.tv_nsec))
                {
                        printf("Time ran backward:\n\tcur:\t%lu %lu\n\tprev:\t%lu %lu\nInterval is >= %u milliseconds).\n",
                                cur.tv_sec, cur.tv_nsec,
                                prev.tv_sec, prev.tv_nsec,
                                delay);
                }

        }

        return(0);
}

++++++++++++++++++++++

Run the following function:

  
Actual results:

Time ran backward:
        cur:    74936 797858224
        prev:   74936 797859224
Interval is >= 1 milliseconds).

Expected results:
No output.

Additional info:
The 1 microsecond difference is not that serious but if
the "cur" value has enough 9's in it, the time reversal
can appear arbitrarily large. Here is a 1 millisecons example:

        cur:    75756 534999224
        prev:   75756 535000224

Comment 3 FuXiangChun 2011-09-29 08:53:17 UTC
(In reply to comment #0)
> Description of problem:
> RHEL.6 x86 guest running under KVM (that is running under RHEL6.1
> on an x86 host).
> 
> Version-Release number of selected component (if applicable):
> libvirt-0.8.7-18.el6.x86_64.rpm
> 
> How reproducible:
> Easily.
> 
> 
> Steps to Reproduce:
> 
> gcc -lrt clktest.c -o clktest
> 
> In the guest run as
>    clktest
> or
>    clktest <milliseconds delay>
> 
> Longer delays between clock_gettime
> calls reduce the frequency of the error.
> 
> I suspect, but have not proved, that the error is more
> likely when the host (or at least whatever cores the
> guest VM is running on) is busy.
> 
> ++++++++++++++++++++++
> #include <time.h>
> #include <stdio.h>
> #include <errno.h>
> #include <string.h>
> 
> int
> main(int argc, char *argv[])
> {
> struct timespec         cur;
> struct timespec         prev;
> unsigned int            ticks;
> int                     delay;
> 
>         if(argc > 1)
>         {
>                 delay = atoi(argv[1]);
>         }
>         else
>         {
>                 delay = 1;
>         }
> 
>         printf("Using delay=%u milliseconds between calls.\n", delay);
> 
>         if(clock_getres(CLOCK_MONOTONIC, &cur) != 0)
>         {
>                 printf("Failed clock resolution read errno=%d [%s].\n",
>                         errno, strerror(errno));
>                 return(0);
>         }
> 
>         printf("Clock resolution sec=%lu nsec=%lu\n",
>                 cur.tv_sec, cur.tv_nsec);
> 
>         if(clock_gettime(CLOCK_MONOTONIC, &cur) != 0)
>         {
>                 printf("Failed initial read errno=%d [%s].\n",
>                         errno, strerror(errno));
>                 return(0);
>         }
> 
>         printf("Initial time sec=%lu nsec=%lu\n",
>                 cur.tv_sec, cur.tv_nsec);
> 
>         ticks = 0;
> 
>         while(1)
>         {
>                 prev = cur;
> 
>                 poll(0, 0, delay);
> 
>                 if(clock_gettime(CLOCK_MONOTONIC, &cur) != 0)
>                 {
>                         printf("Failed subsequent read errno=%d [%s].\n",
>                                 errno, strerror(errno));
>                         return(0);
>                 }
> 
>                 if((cur.tv_sec <= prev.tv_sec) &&
>                    (cur.tv_nsec < prev.tv_nsec))
>                 {
>                         printf("Time ran backward:\n\tcur:\t%lu
> %lu\n\tprev:\t%lu %lu\nInterval is >= %u milliseconds).\n",
>                                 cur.tv_sec, cur.tv_nsec,
>                                 prev.tv_sec, prev.tv_nsec,
>                                 delay);
>                 }
> 
>         }
> 
>         return(0);
> }
> 
> ++++++++++++++++++++++
> 
> Run the following function:
> 
> 
> Actual results:
> 
> Time ran backward:
>         cur:    74936 797858224
>         prev:   74936 797859224
> Interval is >= 1 milliseconds).
> 
> Expected results:
> No output.
> 
> Additional info:
> The 1 microsecond difference is not that serious but if
> the "cur" value has enough 9's in it, the time reversal
> can appear arbitrarily large. Here is a 1 millisecons example:
> 
>         cur:    75756 534999224
>         prev:   75756 535000224


cann't reproduce this bug with below environment.

1.rhel6.2 host 
# uname -r
  2.6.32-202.el6.x86_64

2.rhel6.2 guest 64bit and rhel6.1 32bit

3. qemu info
  # rpm -qa|grep qemu
  qemu-kvm-0.12.1.2-2.192.el6.x86_64

4. run above script about 3 hours


Hi Andrew, 

    can you tell me what are version of qemu and host kernel? 
    
    how long can reproduce this issue with above script?
   
    "is running under RHEL6.1 on an x86 host" in description, your mean is that host is rhel6.1 x86_64,right?

Comment 4 Suqin Huang 2011-09-29 09:03:25 UTC
Andrew,
Can you provide kernel version, seems it's a duplicate of bug 595225 (bug 613904)

Thanks
Suqin

Comment 5 Andrew Charles 2011-09-29 14:31:56 UTC
Suquin Huang:
Apparently I am "not authorized to access bug 595225" and so
can't comment on whether it is the same issue.

FuXiangChun:
In my test, the process is a 32 bit process running on a RHEL5.6
x86-64 linux kernel (version 2.6.18-238.19.1.el5).

The host is running RHEL6.1 on an x86-64 kernel (version
2.6.32-131.6.1.el6.x86_64) with qemu-kvm-0.12.1.2-2.160.el6_1.2.x86_64.

Thanks,

Andrew

Comment 6 Dor Laor 2011-10-02 14:44:41 UTC
What happens w/ a newer guest like 6.1 or 5.7?
Can you please provide the host details from /proc/cpuinfo?
We like to see whether it has stable tsc

Comment 7 Andrew Charles 2011-10-03 01:12:17 UTC
I will try to arrange a test with a guest running a more recent
version of RHEL but don't have that today.

Here is /proc/cpuinfo from the guest. The guest kernel is configured
with constant TSC.

processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 13
model name      : QEMU Virtual CPU version (cpu64-rhel6)
stepping        : 3
cpu MHz         : 2666.594
cache size      : 4096 KB
fpu             : yes
fpu_exception   : yes
cpuid level     : 4
wp              : yes
flags           : fpu de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx lm pni cx16 lahf_lm
bogomips        : 5333.18
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

[...trimmed 8 intermediate identical cores...]

processor       : 9
vendor_id       : GenuineIntel
cpu family      : 6
model           : 13
model name      : QEMU Virtual CPU version (cpu64-rhel6)
stepping        : 3
cpu MHz         : 2666.594
cache size      : 4096 KB
fpu             : yes
fpu_exception   : yes
cpuid level     : 4
wp              : yes
flags           : fpu de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx lm pni cx16 lahf_lm
bogomips        : 5337.05
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

Comment 8 Dor Laor 2011-10-04 14:50:21 UTC
(In reply to comment #7)
> I will try to arrange a test with a guest running a more recent
> version of RHEL but don't have that today.
> 
> Here is /proc/cpuinfo from the guest. The guest kernel is configured
> with constant TSC.


I was asking about the host. Also the guest shows only 'tsc' and not 'constant_tsc' and 'nonstop_tsc' as I expect the host to have.

Can you please also report what the content of 
cat /sys/devices/system/clocksource/clocksource0/current_clocksource 
both in the guest and the host?

Comment 9 Andrew Charles 2011-10-04 15:04:26 UTC
host current_clocksource:
# cat /sys/devices/system/clocksource/clocksource0/current_clocksource
tsc

guest current_clocksource:
#  cat /sys/devices/system/clocksource/clocksource0/current_clocksource
jiffies 

Here is /proc/cpuinfo from the host:'

processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 44
model name      : Intel(R) Xeon(R) CPU           X5650  @ 2.67GHz
stepping        : 2
cpu MHz         : 2666.456
cache size      : 12288 KB
physical id     : 1
siblings        : 12
core id         : 0
cpu cores       : 6
apicid          : 32
initial apicid  : 32
fpu             : yes
fpu_exception   : yes
cpuid level     : 11
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt aes lahf_lm ida arat epb tpr_shadow vnmi flexpriority ept vpid
bogomips        : 5332.91
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

[...trimmed redundant entries...]

processor       : 23
vendor_id       : GenuineIntel
cpu family      : 6
model           : 44
model name      : Intel(R) Xeon(R) CPU           X5650  @ 2.67GHz
stepping        : 2
cpu MHz         : 2666.456
cache size      : 12288 KB
physical id     : 0
siblings        : 12
core id         : 1
cpu cores       : 6
apicid          : 3
initial apicid  : 3
fpu             : yes
fpu_exception   : yes
cpuid level     : 11
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt aes lahf_lm ida arat epb tpr_shadow vnmi flexpriority ept vpid
bogomips        : 5333.25
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

Comment 10 Rik van Riel 2011-10-04 15:20:48 UTC
Could we please get the dmesg output from the guest?

Chances are the dmesg output will contain some useful info on the guest clock...

Comment 11 Andrew Charles 2011-10-04 15:44:08 UTC
demsg > outfile resulted in 2600+ lines that look like this:

LCP-ICMP:IN=eth0.800 OUT= MAC=52:54:00:50:20:10:52:54:00:50:21:30:08:00:45:00:00:54 SRC=10.52.33.48 DST=10.52.32.16 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=ICMP TYPE=8 CODE=0 ID=37403 SEQ=1
LCP-ICMP:IN=eth0.800 OUT= MAC=52:54:00:50:20:10:52:54:00:50:21:30:08:00:45:00:00:54 SRC=10.52.33.48 DST=10.52.32.16 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=ICMP TYPE=8 CODE=0 ID=14108 SEQ=1
LCP-ICMP:IN=eth0.800 OUT= MAC=52:54:00:50:20:10:52:54:00:50:21:30:08:00:45:00:00:54 SRC=10.52.33.48 DST=10.52.32.16 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=ICMP TYPE=8 CODE=0 ID=60444 SEQ=1
LCP-ICMP:IN=eth0.800 OUT= MAC=52:54:00:50:20:10:52:54:00:50:21:30:08:00:45:00:00:54 SRC=10.52.33.48 DST=10.52.32.16 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=ICMP TYPE=8 CODE=0 ID=44829 SEQ=1
LCP-ICMP:IN=eth0.800 OUT= MAC=52:54:00:50:20:10:52:54:00:50:21:30:08:00:45:00:00:54 SRC=10.52.33.48 DST=10.52.32.16 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=ICMP TYPE=8 CODE=0 ID=21790 SEQ=1
LCP-ICMP:IN=eth0.800 OUT= MAC=52:54:00:50:20:10:52:54:00:50:21:30:08:00:45:00:00:54 SRC=10.52.33.48 DST=10.52.32.16 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=ICMP TYPE=8 CODE=0 ID=61726 SEQ=1
LCP-ICMP:IN=eth0.800 OUT= MAC=52:54:00:50:20:10:52:54:00:50:21:30:08:00:45:00:00:54 SRC=10.52.33.48 DST=10.52.32.16 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=ICMP TYPE=8 CODE=0 ID=38687 SEQ=1
LCP-ICMP:IN=eth0.800 OUT= MAC=52:54:00:50:20:10:52:54:00:50:21:30:08:00:45:00:00:54 SRC=10.52.33.48 DST=10.52.32.16 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=ICMP TYPE=8 CODE=0 ID=17184 SEQ=1
LCP-ICMP:IN=eth0.800 OUT= MAC=52:54:00:50:20:10:52:54:00:50:21:30:08:00:45:00:00:54 SRC=10.52.33.48 DST=10.52.32.16 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=ICMP TYPE=8 CODE=0 ID=60960 SEQ=1
LCP-ICMP:IN=eth0.800 OUT= MAC=52:54:00:50:20:10:52:54:00:50:21:30:08:00:45:00:00:54 SRC=10.52.33.48 DST=10.52.32.16 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=ICMP TYPE=8 CODE=0 ID=37921 SEQ=1
LCP-ICMP:IN=eth0.800 OUT= MAC=52:54:00:50:20:10:52:54:00:50:21:30:08:00:45:00:00:54 SRC=10.52.33.48 DST=10.52.32.16 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=ICMP TYPE=8 CODE=0 ID=12578 SEQ=1
LCP-ICMP:IN=eth0.800 OUT= MAC=52:54:00:50:20:10:52:54:00:50:21:30:08:00:45:00:00:54 SRC=10.52.33.48 DST=10.52.32.16 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=ICMP TYPE=8 CODE=0 ID=55074 SEQ=1
LCP-ICMP:IN=eth0.800 OUT= MAC=52:54:00:50:20:10:52:54:00:50:21:30:08:00:45:00:00:54 SRC=10.52.33.48 DST=10.52.32.16 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=ICMP TYPE=8 CODE=0 ID=33059 SEQ=1
LCP-ICMP:IN=eth0.800 OUT= MAC=52:54:00:50:20:10:52:54:00:50:21:30:08:00:45:00:00:54 SRC=10.52.33.48 DST=10.52.32.16 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=ICMP TYPE=8 CODE=0 ID=13092 SEQ=1

Comment 12 Dor Laor 2011-10-09 13:19:06 UTC
(In reply to comment #9)
> host current_clocksource:
> # cat /sys/devices/system/clocksource/clocksource0/current_clocksource
> tsc
> 
> guest current_clocksource:
> #  cat /sys/devices/system/clocksource/clocksource0/current_clocksource
> jiffies 
> 

It should have been kvm-clock (like on my guest).
Getting the relevant dmesg part (you can get it from /var/log/messages) or special setting the guest kernel cmd line have will help.

It is expected to get a drift for jiffies w/o kvmclock

Comment 13 Andrew Charles 2011-10-09 14:38:55 UTC
According to http://www.linux-kvm.org/page/KVMClock:

> Older Linux guests with backported versions of kvmclock may need different ways
> to check for kvmclock. This is due to the fact that those guests might lack 
> total support for the clocksource subsystem
>
> * RHEL/CentOS 5.x: for the x86_64 version, the "current_clocksource" file
> exists, but will always return "jiffies", no matter what. One should search
> for "KVM" in  dmesg | grep time.c

That is the case here:
> dmesg | grep time.c
> time.c: Using 1.193182 MHz WALL KVM GTOD KVM timer.
> time.c: Detected 2666.594 MHz processor.

Comment 14 Rik van Riel 2011-10-17 15:35:01 UTC
Old dmesg messages are preserved in /var/log/dmesg.  I would appreciate it if you could attach that file so we can figure out if there is anything "interesting" on the host side.

So far I have not reproduced the bug on my test system.

Comment 15 Rik van Riel 2011-10-19 21:12:33 UTC
I am still running the test case and am occasionally poking at it (adding/reducing background load, adding/reducing load inside the guest, etc) and have not observed time going backwards yet.

I am running on a two socket (two numa node) Intel system, but maybe I need hardware where the CPUs are less tightly coupled?

On what hardware was the bug originally reproduced?

Could we get the full dmesg log from /var/log/dmesg?

Comment 16 Andrew Charles 2011-10-19 21:29:07 UTC
Created attachment 529110 [details]
guest /var/log/dmesg

Comment 17 Andrew Charles 2011-10-19 21:53:09 UTC
The systems we're seeing this on are shared by several users so I had
to wait a bit to collect the dmesg log. I've attached the current log
file and confirmed that the issue was still present on that host.

As for the hardware, the /proc/cpuinfo for each of the host and
guest is already in the ticket. The host is a very recent 12 core
(24 core with hyperthreading) HP blade. The guest is a 10 core X86.

FWIW, I tried disabling the VDSO on the guest to see if that affected
the result...it didn't.

Drew

Comment 18 Andrew Charles 2011-10-19 23:59:11 UTC
Made a major breakthrough on this. We noticed that guests on hosts that
had this problem also had large negative ntp drift values
(from /var/lib/ntp/drift).

I tried removing the drift file from one such host and rebooting
it. When it came back up and the guest was running I logged in
and ran the test script and did not have any errors (before they
were coming out steadily).

Comment 19 Andrew Charles 2011-10-20 00:15:51 UTC
I ran the next obvious experiment...echo'd -100 into the drift file
on the host, rebooted it and ran the script on the guest when it
came back. Once again the errors are coming out steadily.

Comment 20 Rik van Riel 2011-10-20 03:41:16 UTC
Thank you for finding that NTP thing. I will try that out on my lab system tomorrow.

Do you know if the host in question actually requires that large negative ntp drift value to keep the clock in sync, or if that large drift value came about through some accident?   (say, one host having a clock that's off, and having the disk image from that host used on all the others).

I will go through the kvmclock code carefully to see how it deals with ntp time corrections (if at all). It is entirely possible that ntp time adjustments, which happen very gradually over time, could collide with kvmclock (which might be built on the unadjusted system time - but I do not remember this for sure and will check tomorrow).

For completeness, what is the model of that HP blade?

Also, it would help if I could get the host /var/log/dmesg as well, since that may show some more info about the host hardware (which appears to be causing/triggering this issue).

Comment 21 Rik van Riel 2011-10-20 03:48:35 UTC
One last question: are you running ntp just in the host, or also in the guest?

If ntp is running in both host and guest, you get two duelling clock correction loops, which could get ugly...

Comment 22 Andrew Charles 2011-10-20 13:15:40 UTC
NTP is only running on the host.

We are trying to understand why the drift is so bad. We have other systems
based off other hardware using the same clock source that are fine.
That issue is being investigated separately. But it's happening on enough
different blades in different places that it's not just due to an
incident on one blade.

The blades are HP BL460c G6s.

I will provide the host /var/log/dmesg shortly.

Comment 23 Andrew Charles 2011-10-20 13:20:05 UTC
Created attachment 529267 [details]
host /var/log/dmesg

Comment 24 Andrew Charles 2011-10-20 15:22:56 UTC
I have to take back what I said previously about NTP in the guest.
We have NTP enabled in both the host and the guest as recommended
by RedHat:

http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Virtualization/chap-Virtualization-KVM_guest_timing_management.htm

I also wanted to pass along some additional information which may
or may not be relevant when you are looking at kvm_clock code.

We have had a second issue where sometimes when we reboot a guest
it takes a long time (up to hours) to complete. Once the reboot
completes it runs normally again. It seemed more likely to happen
on guests that had been running a long time but was consistently
reproducible.

We now suspect that there is a correlation between with the
large host NTP drift values and long guest boot times. We are
exploring that right now.

We also discovered something else. First a little background:
we have a kernel module that collects CPU usage for threads,
IRQs etc. that we install by default in the guests. It uses
the KVM-supplied X86 kernel API "monotonic_clock()" to get
the timestamps it uses to track the start/end of various
events. The API is called a LOT.

This module is normally installed after udev...around the time
that slow-rebooting guests return to normal operation. Yesterday
we discovered that if we don't install it on slow-rebooting
guests that they don't return to normal. Then we tried installing
it before udev and udev execution time returned to normal.

We also tried replacing the monotonic_clock() call with a
TSC read (which is what the module uses on real X86 hardware).
With that change in place there was no change in behavior
with or without the module installed.

I have no idea why (frequent) calls to monotonic_clock() would
improve guest performance but thought I should pass it along
to you in case it triggered any ideas (especially if the slow
reboot time is correlated with large NTP drift values).

Finally, a minor note: monotonic_clock() is exported but not
externed. I had to add the extern to my kernel module in order
to build.

Comment 25 Rik van Riel 2011-10-20 17:54:19 UTC
Running ntp in the guest makes sense when the guest maintains its own clock.

However, with kvmclock, the guest "piggybacks" on the host clock and having a second instance of ntpd running could conceivably cause problems.

Could you try disabling ntp in one of the trouble guests, to see if the issue continues or if it goes away?

At this point I can see two possible causes for the problem:
1) somehow kvmclock is not enough (or too much) adjusted for ntp time drift and the guest sees too fast/slow a clock
2) kvmclock IS adjusted for ntp time drift, and the guest is, for some reason, trying to impose its own correction on top and causing issues
3) the combination of a monotonically increasing kvmclock time and a somewhat more irregular timer interrupt in the guest is confusing the guest's ntpd

Stopping ntpd in the guest can rule out (2) and (3). It would also be good to verify that the guest time stays in sync with the host time when ntpd is disabled (it seems to stay in sync on my systems...).

I will continue to investigate the source code to understand whether and if (1) could happen.

Comment 26 Andrew Charles 2011-10-20 18:08:57 UTC
FYI, we tried running RHEL5.6 on the HP host with the same clock
source as before and do NOT see the high levels of NTP drift.

So the source of the NTP drift *may* be a problem in RHEL6.1.

Comment 27 Andrew Charles 2011-10-20 18:51:38 UTC
We turned NTP off in one of the guests that was demonstrating
the issue. Although the host still has a large NTP drift
(-99 right now), clock_gettime() in the guest is working
correctly.

Comment 30 Rik van Riel 2012-01-03 16:40:13 UTC
Possibly related to bug 750201

Comment 31 Jeremy West 2012-01-11 22:46:08 UTC
I confirmed with Andrew and Jack (Alcatel) that this bug is indeed resolved via the Prarit's patch for BZ 750201.  I'm closing this bug out as a dup.

Thanks
Jeremy

*** This bug has been marked as a duplicate of bug 750201 ***

Comment 32 Nick Adams 2014-06-19 01:42:33 UTC
Can access or additional information please be provided about bug 750201? We are experiencing the same issue and cannot access whatever information is in that bug.

Comment 33 Paolo Bonzini 2014-06-19 15:36:26 UTC
The bug should be fixed in 6.3.

Comment 34 Nick Adams 2014-06-19 21:15:30 UTC
Hi Paolo. Thanks but I don't use RedHat and I still have no idea what the fix was/is to see if it effects my OS. Can you provide more information or open the other bug please?

Comment 35 Paolo Bonzini 2014-06-20 13:06:48 UTC
The relevant patches upstream are:

- 7d2f944a2b836c69a9d260a0a5f0d1720d57fdff
- 39fe05e58c5e448601ce46e6b03900d5bf31c4b0
- d7e81c269db899b800e0963dc4aceece1f82a680
- 08ec0c58fb8a05d3191d5cb6f5d6f81adb419798


Note You need to log in before you can comment on or make changes to this bug.