Bug 513138

Summary: Time of smp guest warps after migration
Product: Red Hat Enterprise Linux 5 Reporter: jason wang <jasowang>
Component: kvmAssignee: Zachary Amsden <zamsden>
Status: CLOSED WONTFIX QA Contact: Lawrence Lim <llim>
Severity: high Docs Contact:
Priority: low    
Version: 5.4CC: azarembo, fschwarz, herrold, rlerch, tburke, tools-bugs, virt-maint, wnefal+redhatbugzilla, ykaul, zamsden
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
When the tsc is not stable on the host, including cpufreq changes and deep C state, or even migration into faster tsc host. To stop deep C states in which the TSC can stop, add "processor.max_cstate=1" as a host kernel boot option. To disable cpufreq (only necessary on hosts which lack constant_tsc flag in "flags" field of /proc/cpuinfo), edit /etc/sysconfig/cpuspeed MIN_SPEED and MAX_SPEED variables to the highest frequency available.
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-10-29 22:05:28 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 481013    
Bug Blocks: 495630, 513501    
Attachments:
Description Flags
kvm-userspace-rhel5-savevm-tsc-synchronization.patch none

Description jason wang 2009-07-22 07:51:50 UTC
Description of problem:
When run monotonic_time test (could be found in kvm-autotest) in the guest during some rounds of ping-pong migration. Guest time wraps.
Monotonic_time test the monotonicity of time and contains three sub-test: 
1) gettimeofday() test
2) clock_gettime(CLOCK_MONOTONIC) test
3) TSC test

Version-Release number of selected component (if applicable):
etherboot-zroms-kvm-5.4.4-10.el5
kmod-kvm-83-90.el5
kvm-83-90.el5
kvm-tools-83-90.el5
kvm-qemu-img-83-90.el5
kvm-debuginfo-83-90.el5

How reproducible:
100%

Steps to Reproduce:
1. Boot the vm on src and dst machines
2. run monotonic_time() test in the guest
3. doing some rounds of ping-pong migration
4. see the results 
  
Actual results:
1. All three tests failed with the following output:
START	----	----	timestamp=1248184157	localtime=Jul 21 09:49:17	
	START	monotonic_time.gtod	monotonic_time.gtod	timestamp=1248184157	localtime=Jul 21 09:49:17	
		FAIL	monotonic_time.gtod	monotonic_time.gtod	timestamp=1248184460	localtime=Jul 21 09:54:20	FAIL: gtod-worst-warp=-13521
	END FAIL	monotonic_time.gtod	monotonic_time.gtod	timestamp=1248184460	localtime=Jul 21 09:54:20	
	START	monotonic_time.clock	monotonic_time.clock	timestamp=1248184460	localtime=Jul 21 09:54:20	
		FAIL	monotonic_time.clock	monotonic_time.clock	timestamp=1248184760	localtime=Jul 21 09:59:20	FAIL: clock-worst-warp=-13520000
	END FAIL	monotonic_time.clock	monotonic_time.clock	timestamp=1248184760	localtime=Jul 21 09:59:20	
	START	monotonic_time.tsc	monotonic_time.tsc	timestamp=1248184761	localtime=Jul 21 09:59:21	
		FAIL	monotonic_time.tsc	monotonic_time.tsc	timestamp=1248185061	localtime=Jul 21 10:04:21	FAIL: tsc-worst-warp=-24470226
	END FAIL	monotonic_time.tsc	monotonic_time.tsc	timestamp=1248185061	localtime=Jul 21 10:04:21	
END GOOD	----	----	timestamp=1248185061	localtime=Jul 21 10:04:21	

Expected results:
1. The test should pass. No clock wrap

Additional info:
1. Could not be reproduced under 1 vcpu
2. Guest Platform: could be reproduce in all linux guests
   Above test platform is RHEL-Server-5.3-64
   Clocksource: jiffies (only available clocksource in RHEL-Server-5.3-64)
3. Both src and dst host are Intel(R) Xeon(R) CPU E5310 @ 1.60GHz 
4. Running monotonic_time test without migration, gtod and clock would pass, TSC warp could be limited into a small range ( <=5000 )
5. qemu-kvm command line: /usr/local/staf/test/RHEV/kvm/kvm-test/tests/kvm_runtest_2/qemu -name 'vm1' -monitor tcp:0:5400,server,nowait -drive file=/usr/local/staf/test/RHEV/kvm/kvm-test/tests/kvm_runtest_2/images/RHEL-Server-5.3-64.0.qcow2,if=ide,boot=on -uuid 872a99e3-da51-4e25-a3ab-f52c66d6d115 -net nic,vlan=0,macaddr=00:11:22:33:D3:00,model=e1000 -net tap,vlan=0,ifname=AUTOTEST_303,script=/etc/qemu-ifup-switch,downscript=no -m 16384 -usbdevice tablet -rtc-td-hack -no-hpet -cpu qemu64,+sse2 -smp 8 -vnc :0
6. not a regression, could be reproduced in 87el5,77el5
7. Could be reproduced in both AMD and Intel machine

Comment 1 Dor Laor 2009-07-22 21:03:02 UTC
Interesting problem!

Questions 1:
Without migration, but with adding load on the host, can you see it happening?

Question 2:
Can you re-do it with offline migration (stop the vm before the migration command, do the migration, 'cont' the vm on the destination.

Maybe we should cancel cpu scaling (not sure) - /sys/devices/system/cpu/cpu1/cpufreq/scaling_max_freq. Marcelo?

Comment 2 jason wang 2009-07-23 07:58:32 UTC
(In reply to comment #1)
> Interesting problem!
> 
> Questions 1:
> Without migration, but with adding load on the host, can you see it happening?
> 
running unixbench and monotonic
AMD platform: AMD Phenom(tm) 8750 Triple-Core Processor
-smp 3
gettimeofday() PASS
clock() PASS
TSC() FAIL with a big warp value
Could also notice obvious timedrift ( could be easily by watch -n 0 date )
Intel platform: Intel(R) Xeon(R) CPU E5310 @ 1.60GHz 
TSC() FAIL with a really small wrap ( something like -2010 )
Timedrift could also be noticed ( not as obvious as AMD processors but still could be see in just one minutes )
> Question 2:
> Can you re-do it with offline migration (stop the vm before the migration
> command, do the migration, 'cont' the vm on the destination.
Still could reproduce the problem: Intel(R) Xeon(R) CPU E5310 @ 1.60GHz 
-smp 8
All three Test Failed.
> Maybe we should cancel cpu scaling (not sure) -
> /sys/devices/system/cpu/cpu1/cpufreq/scaling_max_freq. Marcelo?

Comment 3 Marcelo Tosatti 2009-07-24 16:34:39 UTC
The problem is the TSC for different vcpus is not saved at exactly the same
(real) time point, so the TSC's go out of sync between the vcpus on the destination.

This is similar to what happens during guest initialization, which was "fixed" by:

http://mirror.celinuxforum.org/gitstat//commit-detail.php?commit=53f658b3c33616a4997ee254311b335e59063289

It can probably be fixed in a similar way (looking into that).

However note that handling of TSC during migration suffers from other issues (even
after this bug is fixed).

Dor, regarding cpufreq, if the hosts TSC does not tick at a constant rate, or if the host TSC stops for some reason (say ACPI deep sleep), its likely that Linux guests will encounter problems.

Comment 5 Dor Laor 2009-07-26 14:30:04 UTC
Marcelo, can you add release note for it? How do you disable the cpufreq changes and the deep sleep states?

Comment 6 Dor Laor 2009-07-27 21:41:59 UTC
Release note added. If any revisions are required, please set the 
"requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

New Contents:
When the tsc is not stable on the host, including cpufreq changes and deep C state, or even migration into faster tsc host. To stop deep C states in which the TSC can stop, add
"processor.max_cstate=1" as a host kernel boot option. 

To disable cpufreq (only necessary on hosts which lack constant_tsc flag in "flags" field of /proc/cpuinfo), edit /etc/sysconfig/cpuspeed
MIN_SPEED and MAX_SPEED variables to the highest frequency available.

Comment 8 Marcelo Tosatti 2009-07-28 00:06:19 UTC
Created attachment 355338 [details]
kvm-userspace-rhel5-savevm-tsc-synchronization.patch

Comment 10 Marcelo Tosatti 2009-07-29 05:27:51 UTC
jason,

The failure with big tsc warp, is it during migration of SMP guest? how many vcpus?

Comment 11 Marcelo Tosatti 2009-07-29 05:33:56 UTC
Ah. Do you mean that the processor.max_cstate=1 boot option eliminates the big tsc warp problem?

Comment 12 jason wang 2009-07-29 05:36:58 UTC
(In reply to comment #10)
> jason,
> 
> The failure with big tsc warp, is it during migration of SMP guest? how many
> vcpus?  

Forget to tell, 4 vcpus are used in all test.

Comment 13 jason wang 2009-07-29 07:53:15 UTC
(In reply to comment #11)
> Ah. Do you mean that the processor.max_cstate=1 boot option eliminates the big
> tsc warp problem?  

Only up is in the test of processor.mac_cstate=1.
So I would retest the smp guests with processor.mac_cstate=1

Comment 14 jason wang 2009-07-29 14:21:50 UTC
(In reply to comment #11)
> Ah. Do you mean that the processor.max_cstate=1 boot option eliminates the big
> tsc warp problem?  

Re-test in smp=4 wich processor.max_cstate=1 for five times.
ALL gettimeofday() PASS
ALL clock_gettime() PASS
ALL TSC FAIL with big wraps.

Comment 17 Marcelo Tosatti 2009-07-30 19:24:21 UTC
(In reply to comment #14)
> (In reply to comment #11)
> > Ah. Do you mean that the processor.max_cstate=1 boot option eliminates the big
> > tsc warp problem?  
> 
> Re-test in smp=4 wich processor.max_cstate=1 for five times.
> ALL gettimeofday() PASS
> ALL clock_gettime() PASS
> ALL TSC FAIL with big wraps.  

I don't see the big TSC warps here, with Intel host. Can you try that? 

At least the warp on system clock (which is what applications should be using) is reduced with the patch (it will increase as the number of vcpus increases).

Comment 18 jason wang 2009-07-31 17:02:13 UTC
(In reply to comment #17)
> (In reply to comment #14)
> > (In reply to comment #11)
> > > Ah. Do you mean that the processor.max_cstate=1 boot option eliminates the big
> > > tsc warp problem?  
> > 
> > Re-test in smp=4 wich processor.max_cstate=1 for five times.
> > ALL gettimeofday() PASS
> > ALL clock_gettime() PASS
> > ALL TSC FAIL with big wraps.  
> 
> I don't see the big TSC warps here, with Intel host. Can you try that? 
> 
> At least the warp on system clock (which is what applications should be using)
> is reduced with the patch (it will increase as the number of vcpus increases).  
Marcelo:
  Test it with two Intel Xeon E5310s with eight physical cpu on each host. Use kvm-autoest to do the ping-pong migration (about 10 rounds in this case) until the monotonic_time finished.

  Test both with and without processor.max_cstate=1.
  Test the guest with 2 vcpus and 8 vcpus.
  All test FAIL with big warps (gettimeofday, clock_gettime() TSC).
  
  The warp value of TSC is relative small (for example -465834 or -612732) compared to gtod/clock, the warp value of gtod/clock is very big.
 

Additional Information:
1 cat /proc/cpuinfo
...
processor       : 7
vendor_id       : GenuineIntel
cpu family      : 6
model           : 15
model name      : Intel(R) Xeon(R) CPU           E5310  @ 1.60GHz
stepping        : 11
cpu MHz         : 1595.927
cache size      : 4096 KB
physical id     : 1
siblings        : 4
core id         : 3
cpu cores       : 4
apicid          : 7
fpu             : yes
fpu_exception   : yes
cpuid level     : 10
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx lm constant_tsc pni monitor ds_cpl vmx tm2 cx16 xtpr lahf_lm
bogomips        : 3191.89
clflush size    : 64
cache_alignment : 64
address sizes   : 38 bits physical, 48 bits virtual
power management:

2. 
invariant TSC is zero in the host ( Xeon 5310)
but the invariant TSC is set in the previous AMD host (  Quad-Core AMD Opteron(tm) Processor 1352)

Comment 19 Marcelo Tosatti 2009-08-05 21:50:45 UTC
jason,

Can you please test migration with the "clocksource=acpi_pm notsc" option passed to the guest kernel?

Comment 21 Dor Laor 2009-10-29 22:05:28 UTC
TSC is not recommended in rhel5 guest. Either use the kvm pv clock or the PIT.