Description of problem: Autotest TSC case always fail in RHEL guest OS. One fail test result for autotest TSC case: 13:34:46 INFO | START ---- ---- timestamp=1274679286 localtime=May 24 13:34:46 13:34:46 INFO | START tsc tsc timestamp=1274679286 localtime=May 24 13:34:46 13:34:50 INFO | FAIL tsc tsc timestamp=1274679290 localtime=May 24 13:34:50 Latency CPU 1 - CPU 0 = 940 exceeds threshold 650 13:34:50 INFO | END FAIL tsc tsc timestamp=1274679290 localtime=May 24 13:34:50 Version-Release number of selected component (if applicable): rpm -qa|grep kvm etherboot-zroms-kvm-5.4.4-13.el5 kvm-qemu-img-83-183.el5 kmod-kvm-83-183.el5 kvm-83-183.el5 kvm-tools-83-183.el5 kernel in guest: kernel-PAE-2.6.18-200.el5 How reproducible: Always on test machine Steps to Reproduce: 1. Start a VM with two virtual cpu. e.g. qemu-kvm -name 'vm1' -drive file=RHEL-Server-5.5-64.qcow2,if=ide,cache=none,boot=on -net nic,vlan=0,model=e1000,macaddr=02:77:09:7F:8f:43 -net tap,vlan=0,ifname=virtio_0_6001,script=qemu-ifup-switch,downscript=no -m 4096 -smp 2 -soundhw ac97 -redir tcp:5000::22 -vnc :0 -spice port=8000,disable-ticketing -usbdevice tablet -rtc-td-hack -cpu qemu64,+sse2 -no-kvm-pit-reinjection -serial unix:/tmp/serial-20100618-131722-88J3,server,nowait -no-hpet 2. Run TSC autotest test case in guest. Actual results: autotest TSC fail Expected results: autotest TSC could pass. Additional info: one result for './checktsc -t 650 -v' in guest. CPU 0 - CPU 1 loop 0: roundtrip = 427 loop 1: roundtrip = 428 loop 2: roundtrip = 399 loop 3: roundtrip = 399 loop 4: roundtrip = 399 loop 5: roundtrip = 399 loop 6: roundtrip = 399 loop 7: roundtrip = 399 loop 8: roundtrip = 399 loop 9: roundtrip = 399 CPU 0 - CPU 1 = 10674 CPU 1 - CPU 0 loop 0: roundtrip = 418 loop 1: roundtrip = 428 loop 2: roundtrip = 399 loop 3: roundtrip = 399 loop 4: roundtrip = 399 loop 5: roundtrip = 399 loop 6: roundtrip = 399 loop 7: roundtrip = 399 loop 8: roundtrip = 399 loop 9: roundtrip = 399 CPU 1 - CPU 0 = -10627 FAIL cpu info in host: processor : 1 vendor_id : GenuineIntel cpu family : 6 model : 23 model name : Intel(R) Core(TM)2 Duo CPU E8500 @ 3.16GHz stepping : 10 cpu MHz : 2000.000 cache size : 6144 KB physical id : 0 siblings : 2 core id : 1 cpu cores : 2 apicid : 1 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx lm constant_tsc pni monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr sse4_1 lahf_lm bogomips : 6317.30 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management:
*** Bug 647106 has been marked as a duplicate of this bug. ***
This request was evaluated by Red Hat Product Management for inclusion in the current release of Red Hat Enterprise Linux. Because the affected component is not scheduled to be updated in the current release, Red Hat is unfortunately unable to address this request at this time. Red Hat invites you to ask your support representative to propose this request, if appropriate and relevant, in the next release of Red Hat Enterprise Linux.
This request was erroneously denied for the current release of Red Hat Enterprise Linux. The error has been fixed and this request has been re-proposed for the current release.
The fixes for this are upstream and should be backported to RHEL 5.
Note that this is not fixed by TSC trapping. This is related to TSC getting out of sync in the guest during CPU power on, reset, and when the guest attempts to reprogram the TSC itself.
Cc-ing reviewers of my RHEL6 TSC patches. After attempting to port this patch series back to RHEL5, it became apparent there was more missing infrastructure, one problem is that RHEL5 does not detect CPU frequency changes and reflect them in kvmclock at all. The result is the patch series would need to introduce more changes. While it is possible to do, there is added risk involved and the benefits are +/- from upstream. One benefit is that one of the changes (fix a possible backwards warp of kvmclock) actually has much greater benefit (due to lack of fine grained clocksources in RHEL5, the bug window is higher). However, the infrastructure work which was done in these patches in preparation for future changes (tsc trapping / scaling) is not needed (again due to lack of fine grained clocksources, those changes will not be very effective on a RHEL5 kernel base). It's not really possible to bring the fine grained clocksources to RHEL5, that requires a significant and risky kernel change that I'm not willing to do on a stable release. So there is mixed benefit, some risks, and as far as I know, not a lot of complaints about the TSC or kvmclock on RHEL5. My suggestion is to not backport these changes to RHEL5 unless we really think they are needed. One exception might be that backwards warp fix, which could provide some benefit; other than that, I'd like to avoid unnecessary changes here. I'm seeking feedback from the patch reviewers on that, as they are in a better position than others to assess risk.
Agreed, we need to limit the risk in RHEL 5, even if it means leaving certain bugs in place. The few people who are affected by unstable TSCs on certain multi-core AMD systems have the option of migrating to RHEL 6. To risk breaking the timer for everybody else just is not worth it, IMHO.
There's not much we can do to fix this problem in RHEL5 without substantial risk. If we ever see TSC going backwards on RHEL5 in a UP environment, there may be some steps we can take to correct that, but without a reported bug, I'm skeptical to cherry pick the one fix needed for that - especially as it only helps in very narrow cases - unstable host TSC and migration - which already have other, even deeper issues with TSC (SMP stability and scaling problems). Since KVM clock should address all of those problems on RHEL5 and we have an upgrade path in place for RHEL6, our exposure on this issue is very limited. Closing wontfix.