Description of problem: When run monotonic_time test (could be found in kvm-autotest) in the guest during some rounds of ping-pong migration. Guest time wraps. Monotonic_time test the monotonicity of time and contains three sub-test: 1) gettimeofday() test 2) clock_gettime(CLOCK_MONOTONIC) test 3) TSC test Version-Release number of selected component (if applicable): etherboot-zroms-kvm-5.4.4-10.el5 kmod-kvm-83-90.el5 kvm-83-90.el5 kvm-tools-83-90.el5 kvm-qemu-img-83-90.el5 kvm-debuginfo-83-90.el5 How reproducible: 100% Steps to Reproduce: 1. Boot the vm on src and dst machines 2. run monotonic_time() test in the guest 3. doing some rounds of ping-pong migration 4. see the results Actual results: 1. All three tests failed with the following output: START ---- ---- timestamp=1248184157 localtime=Jul 21 09:49:17 START monotonic_time.gtod monotonic_time.gtod timestamp=1248184157 localtime=Jul 21 09:49:17 FAIL monotonic_time.gtod monotonic_time.gtod timestamp=1248184460 localtime=Jul 21 09:54:20 FAIL: gtod-worst-warp=-13521 END FAIL monotonic_time.gtod monotonic_time.gtod timestamp=1248184460 localtime=Jul 21 09:54:20 START monotonic_time.clock monotonic_time.clock timestamp=1248184460 localtime=Jul 21 09:54:20 FAIL monotonic_time.clock monotonic_time.clock timestamp=1248184760 localtime=Jul 21 09:59:20 FAIL: clock-worst-warp=-13520000 END FAIL monotonic_time.clock monotonic_time.clock timestamp=1248184760 localtime=Jul 21 09:59:20 START monotonic_time.tsc monotonic_time.tsc timestamp=1248184761 localtime=Jul 21 09:59:21 FAIL monotonic_time.tsc monotonic_time.tsc timestamp=1248185061 localtime=Jul 21 10:04:21 FAIL: tsc-worst-warp=-24470226 END FAIL monotonic_time.tsc monotonic_time.tsc timestamp=1248185061 localtime=Jul 21 10:04:21 END GOOD ---- ---- timestamp=1248185061 localtime=Jul 21 10:04:21 Expected results: 1. The test should pass. No clock wrap Additional info: 1. Could not be reproduced under 1 vcpu 2. Guest Platform: could be reproduce in all linux guests Above test platform is RHEL-Server-5.3-64 Clocksource: jiffies (only available clocksource in RHEL-Server-5.3-64) 3. Both src and dst host are Intel(R) Xeon(R) CPU E5310 @ 1.60GHz 4. Running monotonic_time test without migration, gtod and clock would pass, TSC warp could be limited into a small range ( <=5000 ) 5. qemu-kvm command line: /usr/local/staf/test/RHEV/kvm/kvm-test/tests/kvm_runtest_2/qemu -name 'vm1' -monitor tcp:0:5400,server,nowait -drive file=/usr/local/staf/test/RHEV/kvm/kvm-test/tests/kvm_runtest_2/images/RHEL-Server-5.3-64.0.qcow2,if=ide,boot=on -uuid 872a99e3-da51-4e25-a3ab-f52c66d6d115 -net nic,vlan=0,macaddr=00:11:22:33:D3:00,model=e1000 -net tap,vlan=0,ifname=AUTOTEST_303,script=/etc/qemu-ifup-switch,downscript=no -m 16384 -usbdevice tablet -rtc-td-hack -no-hpet -cpu qemu64,+sse2 -smp 8 -vnc :0 6. not a regression, could be reproduced in 87el5,77el5 7. Could be reproduced in both AMD and Intel machine
Interesting problem! Questions 1: Without migration, but with adding load on the host, can you see it happening? Question 2: Can you re-do it with offline migration (stop the vm before the migration command, do the migration, 'cont' the vm on the destination. Maybe we should cancel cpu scaling (not sure) - /sys/devices/system/cpu/cpu1/cpufreq/scaling_max_freq. Marcelo?
(In reply to comment #1) > Interesting problem! > > Questions 1: > Without migration, but with adding load on the host, can you see it happening? > running unixbench and monotonic AMD platform: AMD Phenom(tm) 8750 Triple-Core Processor -smp 3 gettimeofday() PASS clock() PASS TSC() FAIL with a big warp value Could also notice obvious timedrift ( could be easily by watch -n 0 date ) Intel platform: Intel(R) Xeon(R) CPU E5310 @ 1.60GHz TSC() FAIL with a really small wrap ( something like -2010 ) Timedrift could also be noticed ( not as obvious as AMD processors but still could be see in just one minutes ) > Question 2: > Can you re-do it with offline migration (stop the vm before the migration > command, do the migration, 'cont' the vm on the destination. Still could reproduce the problem: Intel(R) Xeon(R) CPU E5310 @ 1.60GHz -smp 8 All three Test Failed. > Maybe we should cancel cpu scaling (not sure) - > /sys/devices/system/cpu/cpu1/cpufreq/scaling_max_freq. Marcelo?
The problem is the TSC for different vcpus is not saved at exactly the same (real) time point, so the TSC's go out of sync between the vcpus on the destination. This is similar to what happens during guest initialization, which was "fixed" by: http://mirror.celinuxforum.org/gitstat//commit-detail.php?commit=53f658b3c33616a4997ee254311b335e59063289 It can probably be fixed in a similar way (looking into that). However note that handling of TSC during migration suffers from other issues (even after this bug is fixed). Dor, regarding cpufreq, if the hosts TSC does not tick at a constant rate, or if the host TSC stops for some reason (say ACPI deep sleep), its likely that Linux guests will encounter problems.
Marcelo, can you add release note for it? How do you disable the cpufreq changes and the deep sleep states?
Release note added. If any revisions are required, please set the "requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: When the tsc is not stable on the host, including cpufreq changes and deep C state, or even migration into faster tsc host. To stop deep C states in which the TSC can stop, add "processor.max_cstate=1" as a host kernel boot option. To disable cpufreq (only necessary on hosts which lack constant_tsc flag in "flags" field of /proc/cpuinfo), edit /etc/sysconfig/cpuspeed MIN_SPEED and MAX_SPEED variables to the highest frequency available.
Created attachment 355338 [details] kvm-userspace-rhel5-savevm-tsc-synchronization.patch
jason, The failure with big tsc warp, is it during migration of SMP guest? how many vcpus?
Ah. Do you mean that the processor.max_cstate=1 boot option eliminates the big tsc warp problem?
(In reply to comment #10) > jason, > > The failure with big tsc warp, is it during migration of SMP guest? how many > vcpus? Forget to tell, 4 vcpus are used in all test.
(In reply to comment #11) > Ah. Do you mean that the processor.max_cstate=1 boot option eliminates the big > tsc warp problem? Only up is in the test of processor.mac_cstate=1. So I would retest the smp guests with processor.mac_cstate=1
(In reply to comment #11) > Ah. Do you mean that the processor.max_cstate=1 boot option eliminates the big > tsc warp problem? Re-test in smp=4 wich processor.max_cstate=1 for five times. ALL gettimeofday() PASS ALL clock_gettime() PASS ALL TSC FAIL with big wraps.
(In reply to comment #14) > (In reply to comment #11) > > Ah. Do you mean that the processor.max_cstate=1 boot option eliminates the big > > tsc warp problem? > > Re-test in smp=4 wich processor.max_cstate=1 for five times. > ALL gettimeofday() PASS > ALL clock_gettime() PASS > ALL TSC FAIL with big wraps. I don't see the big TSC warps here, with Intel host. Can you try that? At least the warp on system clock (which is what applications should be using) is reduced with the patch (it will increase as the number of vcpus increases).
(In reply to comment #17) > (In reply to comment #14) > > (In reply to comment #11) > > > Ah. Do you mean that the processor.max_cstate=1 boot option eliminates the big > > > tsc warp problem? > > > > Re-test in smp=4 wich processor.max_cstate=1 for five times. > > ALL gettimeofday() PASS > > ALL clock_gettime() PASS > > ALL TSC FAIL with big wraps. > > I don't see the big TSC warps here, with Intel host. Can you try that? > > At least the warp on system clock (which is what applications should be using) > is reduced with the patch (it will increase as the number of vcpus increases). Marcelo: Test it with two Intel Xeon E5310s with eight physical cpu on each host. Use kvm-autoest to do the ping-pong migration (about 10 rounds in this case) until the monotonic_time finished. Test both with and without processor.max_cstate=1. Test the guest with 2 vcpus and 8 vcpus. All test FAIL with big warps (gettimeofday, clock_gettime() TSC). The warp value of TSC is relative small (for example -465834 or -612732) compared to gtod/clock, the warp value of gtod/clock is very big. Additional Information: 1 cat /proc/cpuinfo ... processor : 7 vendor_id : GenuineIntel cpu family : 6 model : 15 model name : Intel(R) Xeon(R) CPU E5310 @ 1.60GHz stepping : 11 cpu MHz : 1595.927 cache size : 4096 KB physical id : 1 siblings : 4 core id : 3 cpu cores : 4 apicid : 7 fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx lm constant_tsc pni monitor ds_cpl vmx tm2 cx16 xtpr lahf_lm bogomips : 3191.89 clflush size : 64 cache_alignment : 64 address sizes : 38 bits physical, 48 bits virtual power management: 2. invariant TSC is zero in the host ( Xeon 5310) but the invariant TSC is set in the previous AMD host ( Quad-Core AMD Opteron(tm) Processor 1352)
jason, Can you please test migration with the "clocksource=acpi_pm notsc" option passed to the guest kernel?
TSC is not recommended in rhel5 guest. Either use the kvm pv clock or the PIT.