Bug 706373

Summary: skewed clock for some 500 seconds and ntpd not fixing the time in all the kvm guests running in a kvm host
Product: Red Hat Enterprise Linux 6 Reporter: Ioannis Aslanidis <aslanidis>
Component: kernelAssignee: Rik van Riel <riel>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: high Docs Contact:
Priority: unspecified    
Version: 6.0CC: riel, syeghiay, tburke
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-04-12 19:34:02 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Ioannis Aslanidis 2011-05-20 10:10:50 UTC
Description of problem:

A few days ago our monitoring system detected that all the kvm guests running in the same kvm host were desynchronized. The hosts were both 32 and 64 bit. The offset started growing and growing to some 500 seconds and beyond. That's when we detected the issue. Notice that the kvm host was correctly synchronized at all times.

The ntp daemon was correctly configured and running all the time in all the guests; however, it would not synchronize the clock after waiting for several hours. We did see that it was launching requests, but would not fix the clock. 

The panic threshold was set at default levels, that is 1000 seconds, so it should not have been the problem.

Here's a sample of the alert:

nagios-05-12-2011-00.log:[1305129856] SERVICE ALERT: admin1.chg.fdcs;ntp;CRITICAL;HARD;4;NTP CRITICAL: Offset 562.3206676 secs

The processors of the kvm host are 2 AMD Opteron 6128 with the 'constant_tsc' feature. The clocksource was set to 'tsc'.

All kvm guests had 'kvm-clock' set as clocksource.

In the end, our setup matched the one prescribed in the documentation: http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Virtualization/chap-Virtualization-KVM_guest_timing_management.html

Version-Release number of selected component (if applicable):

ntp-4.2.4p8-2.el6.i686

How reproducible:
We do not know how to reproduce the problem yet.

Steps to Reproduce:
1. Start a kvm guest with ntpd running with the default configuration (ntp server of your choice).
2. Wait an undefined amount of time.
3. The clock starts skewing.
4. The ntp daemon does not fix the skew.
  
Actual results:

The ntp daemon does not fix the time of the machine when it is desynchronized under unexplained circumstances.

Expected results:

The ntp daemon should always fix the time of the machine.


Additional info:

Restarting the ntp daemon with the default configuration did not fix the issue; however, enabling iburst and decreasing minpoll to 4 and maxpoll to 6, and also setting 'disable kernel', then restarting ntpd, did synchronize the clock properly.

Comment 2 Miroslav Lichvar 2011-05-20 10:43:43 UTC
This is most likely a kernel/kvm problem which makes the clock drift at higher rate than ntp is able to correct (500 ppm).

To confirm that, please enable the peerstats and loopstats statistics in ntp.conf, restart ntpd, let it run for few hours and attach the logs here.

Comment 3 Ioannis Aslanidis 2011-05-20 13:29:13 UTC
I have enabled the logging of peerstats and loopstats as you requested. I will give you the results on Monday.

Comment 4 RHEL Program Management 2011-07-06 01:35:36 UTC
This request was evaluated by Red Hat Product Management for
inclusion in the current release of Red Hat Enterprise Linux.
Because the affected component is not scheduled to be updated
in the current release, Red Hat is unfortunately unable to
address this request at this time. Red Hat invites you to
ask your support representative to propose this request, if
appropriate and relevant, in the next release of Red Hat
Enterprise Linux. If you would like it considered as an
exception in the current release, please ask your support
representative.

Comment 5 RHEL Program Management 2011-10-07 15:35:51 UTC
Since RHEL 6.2 External Beta has begun, and this bug remains
unresolved, it has been rejected as it is not proposed as
exception or blocker.

Red Hat invites you to ask your support representative to
propose this request, if appropriate and relevant, in the
next release of Red Hat Enterprise Linux.

Comment 6 Rik van Riel 2011-12-12 17:25:02 UTC
Ioannis,

what are the values of /var/lib/ntp/drift on the host and in the guests?

Comment 7 Dor Laor 2011-12-21 13:20:56 UTC
Ioannis, can we please get the guest xml, qemu command line, and guest kernel cmdline?

Comment 8 Dor Laor 2012-02-08 09:32:29 UTC
No feedback received, we'll have to close it if we won't get feedback.

Comment 10 Karen Noel 2012-04-12 19:34:02 UTC
We believe this bug was fixed by fixes for another, related bug. However, we do not have enough data to know that for sure. If the bug persists, please reopen this bug.