Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

For bugs related to Red Hat Enterprise Linux 5 product line. The current stable release is 5.10. For Red Hat Enterprise Linux 6 and above, please visit Red Hat JIRA https://issues.redhat.com/secure/CreateIssue!default.jspa?pid=12332745 to report new issues.

Bug 606246

Summary:	TSC is not synchronized between VCPUs
Product:	Red Hat Enterprise Linux 5	Reporter:	YangFeng <fyang>
Component:	kvm	Assignee:	Zachary Amsden <zamsden>
Status:	CLOSED WONTFIX	QA Contact:	Virtualization Bugs <virt-bugs>
Severity:	medium	Docs Contact:
Priority:	low
Version:	5.7	CC:	gcosta, llim, michen, mkenneth, mtosatti, riel, shuang, tburke, virt-maint, ykaul
Target Milestone:	rc	Keywords:	Triaged
Target Release:	---
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2011-02-07 14:02:04 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	562808, 580946, 580948

Description YangFeng 2010-06-21 08:26:47 UTC

Description of problem:
Autotest TSC case always fail in RHEL guest OS.

One fail test result for autotest TSC case:
13:34:46 INFO | START   ----    ----    timestamp=1274679286    localtime=May
24 13:34:46
13:34:46 INFO |         START   tsc     tsc     timestamp=1274679286   
localtime=May 24 13:34:46
13:34:50 INFO |                 FAIL    tsc     tsc     timestamp=1274679290   
localtime=May 24 13:34:50       Latency CPU 1 - CPU 0 =   940 exceeds threshold
650
13:34:50 INFO |         END FAIL        tsc     tsc     timestamp=1274679290   
localtime=May 24 13:34:50

Version-Release number of selected component (if applicable):
rpm -qa|grep kvm
etherboot-zroms-kvm-5.4.4-13.el5
kvm-qemu-img-83-183.el5
kmod-kvm-83-183.el5
kvm-83-183.el5
kvm-tools-83-183.el5

kernel in guest:
kernel-PAE-2.6.18-200.el5


How reproducible:
Always on test machine

Steps to Reproduce:
1. Start a VM with two virtual cpu. e.g.
qemu-kvm -name 'vm1' -drive file=RHEL-Server-5.5-64.qcow2,if=ide,cache=none,boot=on -net nic,vlan=0,model=e1000,macaddr=02:77:09:7F:8f:43 -net tap,vlan=0,ifname=virtio_0_6001,script=qemu-ifup-switch,downscript=no -m 4096 -smp 2 -soundhw ac97 -redir tcp:5000::22 -vnc :0 -spice port=8000,disable-ticketing -usbdevice tablet -rtc-td-hack -cpu qemu64,+sse2 -no-kvm-pit-reinjection -serial unix:/tmp/serial-20100618-131722-88J3,server,nowait -no-hpet

2. Run TSC autotest test case in guest.

  
Actual results:
autotest TSC fail

Expected results:
autotest TSC could pass.

Additional info:
one result for './checktsc -t 650 -v' in guest.
CPU 0 - CPU 1
loop  0: roundtrip =   427
loop  1: roundtrip =   428
loop  2: roundtrip =   399
loop  3: roundtrip =   399
loop  4: roundtrip =   399
loop  5: roundtrip =   399
loop  6: roundtrip =   399
loop  7: roundtrip =   399
loop  8: roundtrip =   399
loop  9: roundtrip =   399
CPU 0 - CPU 1 =  10674
CPU 1 - CPU 0
loop  0: roundtrip =   418
loop  1: roundtrip =   428
loop  2: roundtrip =   399
loop  3: roundtrip =   399
loop  4: roundtrip =   399
loop  5: roundtrip =   399
loop  6: roundtrip =   399
loop  7: roundtrip =   399
loop  8: roundtrip =   399
loop  9: roundtrip =   399
CPU 1 - CPU 0 = -10627
FAIL


cpu info in host:

processor       : 1
vendor_id       : GenuineIntel
cpu family      : 6
model           : 23
model name      : Intel(R) Core(TM)2 Duo CPU     E8500  @ 3.16GHz
stepping        : 10
cpu MHz         : 2000.000
cache size      : 6144 KB
physical id     : 0
siblings        : 2
core id         : 1
cpu cores       : 2
apicid          : 1
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx lm constant_tsc pni monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr sse4_1 lahf_lm
bogomips        : 6317.30
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

Comment 1 Zachary Amsden 2010-11-09 23:58:53 UTC

*** Bug 647106 has been marked as a duplicate of this bug. ***

Comment 4 RHEL Program Management 2011-01-11 19:56:51 UTC

This request was evaluated by Red Hat Product Management for
inclusion in the current release of Red Hat Enterprise Linux.
Because the affected component is not scheduled to be updated in the
current release, Red Hat is unfortunately unable to address this
request at this time. Red Hat invites you to ask your support
representative to propose this request, if appropriate and relevant,
in the next release of Red Hat Enterprise Linux.

Comment 5 RHEL Program Management 2011-01-11 22:52:12 UTC

This request was erroneously denied for the current release of
Red Hat Enterprise Linux.  The error has been fixed and this
request has been re-proposed for the current release.

Comment 6 Zachary Amsden 2011-01-13 22:22:01 UTC

The fixes for this are upstream and should be backported to RHEL 5.

Comment 7 Zachary Amsden 2011-01-13 23:36:43 UTC

Note that this is not fixed by TSC trapping.  This is related to TSC getting out of sync in the guest during CPU power on, reset, and when the guest attempts to reprogram the TSC itself.

Comment 8 Zachary Amsden 2011-02-03 15:09:45 UTC

Cc-ing reviewers of my RHEL6 TSC patches. After attempting to port this patch series back to RHEL5, it became apparent there was more missing infrastructure, one problem is that RHEL5 does not detect CPU frequency changes and reflect them in kvmclock at all. The result is the patch series would need to introduce more changes.

While it is possible to do, there is added risk involved and the benefits are +/- from upstream.

One benefit is that one of the changes (fix a possible backwards warp of kvmclock) actually has much greater benefit (due to lack of fine grained clocksources in RHEL5, the bug window is higher). However, the infrastructure work which was done in these patches in preparation for future changes (tsc trapping / scaling) is not needed (again due to lack of fine grained clocksources, those changes will not be very effective on a RHEL5 kernel base). It's not really possible to bring the fine grained clocksources to RHEL5, that requires a significant and risky kernel change that I'm not willing to do on a stable release.

So there is mixed benefit, some risks, and as far as I know, not a lot of complaints about the TSC or kvmclock on RHEL5.

My suggestion is to not backport these changes to RHEL5 unless we really think they are needed. One exception might be that backwards warp fix, which could provide some benefit; other than that, I'd like to avoid unnecessary changes here. I'm seeking feedback from the patch reviewers on that, as they are in a better position than others to assess risk.

Comment 9 Rik van Riel 2011-02-03 17:44:12 UTC

Agreed, we need to limit the risk in RHEL 5, even if it means leaving certain bugs in place.  The few people who are affected by unstable TSCs on certain multi-core AMD systems have the option of migrating to RHEL 6.  To risk breaking the timer for everybody else just is not worth it, IMHO.

Comment 10 Zachary Amsden 2011-02-07 14:02:04 UTC

There's not much we can do to fix this problem in RHEL5 without substantial risk.

If we ever see TSC going backwards on RHEL5 in a UP environment, there may be some steps we can take to correct that, but without a reported bug, I'm skeptical to cherry pick the one fix needed for that - especially as it only helps in very narrow cases - unstable host TSC and migration - which already have other, even deeper issues with TSC (SMP stability and scaling problems).

Since KVM clock should address all of those problems on RHEL5 and we have an upgrade path in place for RHEL6, our exposure on this issue is very limited.

Closing wontfix.