Hide Forgot
Description of problem: A few days ago our monitoring system detected that all the kvm guests running in the same kvm host were desynchronized. The hosts were both 32 and 64 bit. The offset started growing and growing to some 500 seconds and beyond. That's when we detected the issue. Notice that the kvm host was correctly synchronized at all times. The ntp daemon was correctly configured and running all the time in all the guests; however, it would not synchronize the clock after waiting for several hours. We did see that it was launching requests, but would not fix the clock. The panic threshold was set at default levels, that is 1000 seconds, so it should not have been the problem. Here's a sample of the alert: nagios-05-12-2011-00.log:[1305129856] SERVICE ALERT: admin1.chg.fdcs;ntp;CRITICAL;HARD;4;NTP CRITICAL: Offset 562.3206676 secs The processors of the kvm host are 2 AMD Opteron 6128 with the 'constant_tsc' feature. The clocksource was set to 'tsc'. All kvm guests had 'kvm-clock' set as clocksource. In the end, our setup matched the one prescribed in the documentation: http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Virtualization/chap-Virtualization-KVM_guest_timing_management.html Version-Release number of selected component (if applicable): ntp-4.2.4p8-2.el6.i686 How reproducible: We do not know how to reproduce the problem yet. Steps to Reproduce: 1. Start a kvm guest with ntpd running with the default configuration (ntp server of your choice). 2. Wait an undefined amount of time. 3. The clock starts skewing. 4. The ntp daemon does not fix the skew. Actual results: The ntp daemon does not fix the time of the machine when it is desynchronized under unexplained circumstances. Expected results: The ntp daemon should always fix the time of the machine. Additional info: Restarting the ntp daemon with the default configuration did not fix the issue; however, enabling iburst and decreasing minpoll to 4 and maxpoll to 6, and also setting 'disable kernel', then restarting ntpd, did synchronize the clock properly.
This is most likely a kernel/kvm problem which makes the clock drift at higher rate than ntp is able to correct (500 ppm). To confirm that, please enable the peerstats and loopstats statistics in ntp.conf, restart ntpd, let it run for few hours and attach the logs here.
I have enabled the logging of peerstats and loopstats as you requested. I will give you the results on Monday.
This request was evaluated by Red Hat Product Management for inclusion in the current release of Red Hat Enterprise Linux. Because the affected component is not scheduled to be updated in the current release, Red Hat is unfortunately unable to address this request at this time. Red Hat invites you to ask your support representative to propose this request, if appropriate and relevant, in the next release of Red Hat Enterprise Linux. If you would like it considered as an exception in the current release, please ask your support representative.
Since RHEL 6.2 External Beta has begun, and this bug remains unresolved, it has been rejected as it is not proposed as exception or blocker. Red Hat invites you to ask your support representative to propose this request, if appropriate and relevant, in the next release of Red Hat Enterprise Linux.
Ioannis, what are the values of /var/lib/ntp/drift on the host and in the guests?
Ioannis, can we please get the guest xml, qemu command line, and guest kernel cmdline?
No feedback received, we'll have to close it if we won't get feedback.
We believe this bug was fixed by fixes for another, related bug. However, we do not have enough data to know that for sure. If the bug persists, please reopen this bug.