Bug 624572

Summary:	time drift after guest running for more than 12 hours
Product:	Red Hat Enterprise Linux 6	Reporter:	Shirley Zhou <szhou>
Component:	qemu-kvm	Assignee:	Zachary Amsden <zamsden>
Status:	CLOSED NOTABUG	QA Contact:	Virtualization Bugs <virt-bugs>
Severity:	medium	Docs Contact:
Priority:	low
Version:	6.0	CC:	bcao, lihuang, mkenneth, mshao, syeghiay, tburke, virt-maint, zamsden
Target Milestone:	beta	Keywords:	Reopened, RHELNAK
Target Release:	6.1
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2011-04-07 16:59:01 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	580951

Description Shirley Zhou 2010-08-17 02:24:57 UTC

Description of problem:
time drift after guest running for more than 12 hours

Version-Release number of selected component (if applicable):
qemu-kvm-0.12.1.2-2.109.el6.x86_64
kernel-2.6.32-63.el6.x86_64

How reproducible:
100%

Steps to Reproduce:
1.sync time on host
ntpdate -b clock.redhat.com
2.run rhel6 guest with time option as :
-rtc base=utc,clock=host,driftfix=slew 
3.do sync time on guest
ntpdate -b clock.redhat.com
4.query time on guest
ntpdate -q clock.redhat.com
offset is -0.299896
5.running this guest for 14 hours,then query time again
ntpdate -q clock.redhat.com
server 66.187.233.4, stratum 1, offset -9.975898, delay 0.34035
17 Aug 09:44:46 ntpdate[4619]: step time server 66.187.233.4 offset -9.975898 sec

Actual results:
After guest run a long time, clock drift have huge increase.

Expected results:
There should not be huge increase time drift after guest running long time.

Additional info:

Comment 2 RHEL Program Management 2010-08-17 02:58:39 UTC

This issue has been proposed when we are only considering blocker
issues in the current Red Hat Enterprise Linux release.

** If you would still like this issue considered for the current
release, ask your support representative to file as a blocker on
your behalf. Otherwise ask that it be considered for the next
Red Hat Enterprise Linux release. **

Comment 3 Dor Laor 2010-11-21 22:34:26 UTC

Not sure that is that bad but worth investigation.

Comment 4 Zachary Amsden 2011-03-29 18:35:06 UTC

This bug is really old and fixes have gone in to the kvmclock and TSC code since.  Suggest we re-test to verify the bug still exists.

Comment 10 Zachary Amsden 2011-03-30 22:05:38 UTC

I don't have access to Bug 682613, was this last update posted in the wrong bug?

Comment 12 Dor Laor 2011-03-31 11:18:23 UTC

*** Bug 682613 has been marked as a duplicate of this bug. ***

Comment 13 Zachary Amsden 2011-03-31 19:26:03 UTC

So... as for the duplicate, same comment applies.


Host clock drifting is not a virt bug... we can't stop guest clocks drifting
when the host isn't even stable.  Re-assigning component to kernel.

If the host clock drift isn't a regression, it's possible this is just a
drifting or unstable host.  It's also possible the NTP server measurement is
being affected by network latency.  These are rather difficult things to rule
out, but the first step would be to see if the drift still exists with a 6.0
kernel on the same host.


This bug is kind of a mess.  I suggest we re-test to make sure it's really a bug.  Also, the attachment showing the drift results were attached to the other bug.

I'm going to close THIS bug as a duplicate and re-open the other one which is under the proper component.

*** This bug has been marked as a duplicate of bug 682613 ***

Comment 14 Zachary Amsden 2011-03-31 19:36:16 UTC

Okay, pending further investigation, I am re-opening this bug.

PLEASE DO NOT CLOSE EITHER OF THIS OR 682613 as duplicates.  There are two separate issues being reported.

One is a very small drift reported on a system which apparently has a drifting host clock (682613).  Not sure this is a real bug or can even be fixed.  That bug is not a virt issue, but a kernel issue.

This bug (624572) report concerns a virt guest running for over 14 hours having a "huge" drift, -9.9 seconds.  Quantitatively, that is a 200 part per million error, which isn't actually huge, and is within the threshold of NTP correctable error.

Can we please also verify whether or not the host clock is drifting on the same machine for which this bug was reported?

If that does indeed turn out to be the case, then we can dismiss one of these as duplicates, but for now with the absence of any known drift on the host clock, we still cannot rule out a virt bug on this one.

Can we also double check which clocksource was being used in the guest here?  Kvmclock or something else?

Thanks,

Zach

Comment 15 Zachary Amsden 2011-04-05 21:55:48 UTC

Need to very if this is indeed a drifting host or something else.

Comment 16 Mike Cao 2011-04-07 03:06:48 UTC

Tried on kernel-2.6.32-128.el6.

steps:
1. running a guest on it 
2. load the host cpu
3. check the time drift after 12 hours


Actual Results:
ON AMD host : 
after 14 hours ,time drifted 6.6 sec
ON intel host:
after 13 hours ,time drifted 0.3 sec

Comment 17 Zachary Amsden 2011-04-07 16:59:01 UTC

6.6 seconds in 14 hours is less that 140ppm error, which is within hardware expectation and within NTP adjustable tolerance of 500 ppm.

Differing drift on AMD and Intel platforms confirms it is a platform clock stability issue and not a systemic kernel or virtualization problem, so I'm closing the bug.