568618 – Time drifts after migrating on RHEL3.9 guest

Bug 568618 - Time drifts after migrating on RHEL3.9 guest

Summary: Time drifts after migrating on RHEL3.9 guest

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	kvm
Sub Component:
Version:	5.6
Hardware:	All
OS:	Linux
Priority:	low
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	Zachary Amsden
QA Contact:	Virtualization Bugs
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	Rhel5KvmTier2
TreeView+	depends on / blocked

Reported:	2010-02-26 06:51 UTC by Amos Kong
Modified:	2015-05-25 00:05 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2010-05-06 12:13:31 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
info of cpu and mem on host (996 bytes, application/x-gzip) 2010-02-26 06:51 UTC, Amos Kong	no flags	Details
cpuinfo and meminfo of host (AMD & Intel) (1.74 KB, application/x-gzip) 2010-04-19 09:30 UTC, Amos Kong	no flags	Details
View All

Description Amos Kong 2010-02-26 06:51:10 UTC

Created attachment 396485 [details]
info of cpu and mem on host

Description of problem:
Boot up a VM and do ping-pong migration, the time drift is too large(more than 300s).


Version-Release number of selected component (if applicable):
(guest)# rpm -qa|grep virtio
kmod-virtio-0.1-17.el3
(guest) # ntpdate -v
26 Feb 14:35:42 ntpdate[1297]: ntpdate 4.1.2 Thu Feb 15 04:30:51 EST 2007 (1)
26 Feb 14:35:42 ntpdate[1297]: no servers can be used, exiting

(host)# rpm -qa |grep kvm
etherboot-zroms-kvm-5.4.4-13.el5
kvm-83-160.el5
kvm-qemu-img-83-160.el5
kmod-kvm-83-160.el5
kvm-tools-83-160.el5
kvm-debuginfo-83-160.el5



How reproducible:
Reproduce every time

Steps to Reproduce:
1. Boot up a VM and update the guest time with ntp.
(guest)# ntpdate clock.redhat.com
26 Feb 14:09:44 ntpdate[1277]: adjust time server 66.187.233.4 offset -0.027956 sec

2. Do ping-pong migration.
(monitor)# migrate tcp:0:5200
..
(monitor)# migrate tcp:0:5200

3. Check the drifted time with ntp.
# ntpdate -q clock.redhat.com
server 66.187.233.4, stratum 1, offset 337.587340, delay 0.31523
26 Feb 14:20:12 ntpdate[1281]: step time server 66.187.233.4 offset 337.587340 sec

 
Actual results:
The drifted time is more than 300s

Expected results:
The drifted time is less than 10s.

Additional info:
(host)# uname -a
Linux amd-9600b-4-2.englab.nay.redhat.com 2.6.18-189.el5 #1 SMP Tue Feb 16 11:10:22 EST 2010 x86_64 x86_64 x86_64 GNU/Linux

(guest)# uname -a
Linux dhcp-66-82-241.nay.redhat.com 2.4.21-63.EL #1 SMP Wed Oct 28 23:13:12 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux

Command line of src VM1:
# qemu-kvm -drive file=/tmp/kvm_autotest_root/images/RHEL-3.9-64-virtio.raw,if=virtio,cache=writethrough,boot=on -net nic,vlan=0,model=virtio,macaddr=00:19:30:25:46:00 -net tap,vlan=0,ifname=e1000_0_6001,script=/usr/local/staf/test/RHEV/kvm-new/kvm-test/tests/kvm/scripts/qemu-ifup-switch,downscript=no -m 4096 -smp 4 -soundhw ac97 -usbdevice tablet -rtc-td-hack -no-hpet -cpu qemu64,+sse2 -no-kvm-pit-reinjection -rtc-td-hack -vnc :0

Command line of dest VM2:
# qemu-kvm -drive file=/tmp/kvm_autotest_root/images/RHEL-3.9-64-virtio.raw,if=virtio,cache=writethrough,boot=on -net nic,vlan=0,model=virtio,macaddr=00:19:30:25:46:00 -net tap,vlan=0,ifname=e1000_0_6002,script=/usr/local/staf/test/RHEV/kvm-new/kvm-test/tests/kvm/scripts/qemu-ifup-switch,downscript=no -m 4096 -smp 4 -soundhw ac97 -usbdevice tablet -rtc-td-hack -no-hpet -cpu qemu64,+sse2 -no-kvm-pit-reinjection -rtc-td-hack -vnc :1 -incoming tcp:0:5200

Comment 2 Marcelo Tosatti 2010-03-02 16:41:36 UTC

Amos,

Can you remove the -no-kvm-pit-reinjection flag?

Comment 3 Marcelo Tosatti 2010-03-02 16:43:03 UTC

Err, ignore comment #2.

Can you confirm if the problem is present with -smp 1?

Comment 4 Amos Kong 2010-03-03 02:58:21 UTC

(In reply to comment #3)
> Err, ignore comment #2.
> 
> Can you confirm if the problem is present with -smp 1?    

This problem also exists with -smp 1.

After ping migration:
[root@localhost root]# ntpdate -q clock.redhat.com
server 66.187.233.4, stratum 1, offset 4.507959, delay 0.30522
 2 Mar 21:42:34 ntpdate[1314]: step time server 66.187.233.4 offset 4.507959 sec

After pong migration:
[root@localhost root]# ntpdate -q clock.redhat.com
server 66.187.233.4, stratum 1, offset 233.897184, delay 0.30522
 2 Mar 21:51:05 ntpdate[1321]: step time server 66.187.233.4 offset 233.897184 sec

Comment 5 Zachary Amsden 2010-03-05 01:38:17 UTC

Just an update: I'm blocked on getting ISO images over slow link... but 90% there now.

Comment 6 Dor Laor 2010-03-10 14:47:02 UTC

Amos, it might be an issue with the RTC emulation too, what's the diff when you reboot a rhel3 VM?

Comment 7 Zachary Amsden 2010-03-10 17:24:44 UTC

it would also be most helpful to see /proc/cpuinfo from the host, can we get this information please?

Also, can you check the ntpdate without doing any migration?  Just use a reliable clock, measure 30 seconds, check guest date, wait 30 seconds, check again - and see if guest advances by 30 seconds or jumps ahead and then back.

Comment 8 Amos Kong 2010-03-11 03:00:48 UTC

(In reply to comment #6)
> Amos, it might be an issue with the RTC emulation too, what's the diff when you
> reboot a rhel3 VM?    

I have tested 5 times of reboot rhel3 VM, the drift time are all less than 1 sec.

Result:
offset -0.876304 sec
offset -0.526459 sec
offset -0.797713 sec
offset -0.525171 sec
offset -0.869227 sec

Comment 9 Amos Kong 2010-03-11 05:21:05 UTC

(In reply to comment #7)
> it would also be most helpful to see /proc/cpuinfo from the host, can we get
> this information please?

I had uploaded cpuinfo and meminfo of host here:
https://bugzilla.redhat.com/attachment.cgi?id=396485

> Also, can you check the ntpdate without doing any migration?  Just use a
> reliable clock, measure 30 seconds, check guest date, wait 30 seconds, check
> again - and see if guest advances by 30 seconds or jumps ahead and then back.    

The change is very little without migration.

steps:
1. boot a rhel3 VM
2. get guest IP address
3. execute below script
-----------------------------------------
#!/bin/bash

time=$1
guest_ip="10.66.83.68"

sync_guest_ntpdate() {
    ssh $guest_ip ntpdate clock.redhat.com
}
get_guest_ntpdate() {
    ssh $guest_ip ntpdate -q clock.redhat.com
}

ssh-copy-id $guest_ip ~/.ssh/id_rsa.pub > /dev/null

sync_guest_ntpdate
sync_guest_ntpdate
echo "---------------"
get_guest_ntpdate
sleep $time
echo "---------------"
get_guest_ntpdate
sleep $time
echo "---------------"
get_guest_ntpdate
-----------------------------------------


Result:
1) Set sleep time to 30 sec
# ./timedrift.sh 30
10 Mar 23:59:17 ntpdate[2089]: adjust time server 66.187.233.4 offset -0.040228 sec
10 Mar 23:59:19 ntpdate[2098]: adjust time server 66.187.233.4 offset -0.036673 sec
---------------
server 66.187.233.4, stratum 1, offset -0.034486, delay 0.29512
10 Mar 23:59:20 ntpdate[2107]: adjust time server 66.187.233.4 offset -0.034486 sec
---------------
server 66.187.233.4, stratum 1, offset -0.024299, delay 0.30476
10 Mar 23:59:52 ntpdate[2116]: adjust time server 66.187.233.4 offset -0.024299 sec
---------------
server 66.187.233.4, stratum 1, offset -0.005983, delay 0.30484
11 Mar 00:00:23 ntpdate[2130]: adjust time server 66.187.233.4 offset -0.005983 sec


2) Set sleep time to 300 sec
# ./timedrift.sh 300
11 Mar 00:03:32 ntpdate[2159]: adjust time server 66.187.233.4 offset 0.008697 sec
11 Mar 00:03:33 ntpdate[2168]: adjust time server 66.187.233.4 offset 0.003830 sec
---------------
server 66.187.233.4, stratum 1, offset 0.007391, delay 0.29539
11 Mar 00:03:35 ntpdate[2177]: adjust time server 66.187.233.4 offset 0.007391 sec
---------------
server 66.187.233.4, stratum 1, offset -0.014029, delay 0.30519
11 Mar 00:08:36 ntpdate[2188]: adjust time server 66.187.233.4 offset -0.014029 sec
---------------
server 66.187.233.4, stratum 1, offset -0.027170, delay 0.30522
11 Mar 00:13:37 ntpdate[2202]: adjust time server 66.187.233.4 offset -0.027170 sec

Comment 10 Zachary Amsden 2010-04-08 02:47:02 UTC

I can't reproduce on Intel hardware with kvm-83-164.el5 

Can you try upgrading your kvm and see if it still happens?

I am not using -rtc-td-hack however, let me try it and see if it matters.

Comment 11 Zachary Amsden 2010-04-09 04:24:40 UTC

-rtc-td-hack makes no difference.  I don't see any drift after migration.

Comment 12 Zachary Amsden 2010-04-15 00:53:40 UTC

Does this still happen?  All attempts I have made at it have been unable to reproduce the bug.

Comment 13 Amos Kong 2010-04-18 03:39:38 UTC

(In reply to comment #12)
> Does this still happen?  All attempts I have made at it have been unable to
> reproduce the bug.    

I'm testing with kvm-83-170.el5 for 50 times  (Intel & AMD)
The result will be reported later.

Comment 14 Amos Kong 2010-04-19 09:26:36 UTC

(In reply to comment #12)
> Does this still happen?  All attempts I have made at it have been unable to
> reproduce the bug.    

Kernel: 2.6.18-194.el5
# rpm -qa |grep kvm
kvm-83-170.el5
etherboot-zroms-kvm-5.4.4-13.el5
kvm-qemu-img-83-170.el5
kvm-debuginfo-83-170.el5
kmod-kvm-83-170.el5
kvm-tools-83-170.el5


Result:

---- Host mem:4G smp:4 "AMD Phenom(tm) 9600B Quad-Core Processor"
Guest mem:4G smp:4 RHEL-39-64: always reproduce
Guest mem:4G smp:4 RHEL-39-32: always reproduce

---- Host mem:4G smp:4 "Intel(R) Core(TM)2 Quad CPU    Q9400  @ 2.66GHz"
Guest mem:4G smp:4 RHEL-39-64: always could not reproduce
Guest mem:4G smp:4 RHEL-39-32: always reproduce


I also test on other machines, it's easy to reproduce when guest's memory size is same as host's.

cpuinfo and meminfo attached.
# tree
.
|-- amd
|   |-- cpuinfo
|   `-- meminfo
`-- intel
    |-- cpuinfo
    `-- meminfo

Comment 15 Amos Kong 2010-04-19 09:30:26 UTC

Created attachment 407521 [details]
cpuinfo and meminfo of host (AMD & Intel)

this is the cpuinfo & meminfo of the hosts (amd & intel)

Comment 16 Zachary Amsden 2010-04-28 23:38:13 UTC

(In reply to comment #14)
> I also test on other machines, it's easy to reproduce when guest's memory size
> is same as host's.

That's almost certainly why the bug happens; likely that we are hitting swap on the host and one of the qemu processes gets hosed.

It's also probable that this scenario is restricted to self migration.  Note that self migration requires

1) reading all memory from old VM
2) passing data over network socket
3) writing all memory to new VM

this strides over the entire host memory and requires easily > 2X the host memory plus resources for network and processes.  You will swap, and performance will be bad.

I suggest this is not a migration bug at all.  Solving this problem is exactly equivalent to solving the problem where a user issues SIGSTOP to the qemu process for ~90 seconds.

Does anyonne know if RHEL 3.9 has tick compensation or not?  (It is 2.4, so I guess it does not).

Perhaps dropping -kvm-no-pit-reinjection is all that is needed to solve the bug.

Can you try that and see if the bug goes away?

Comment 17 Amos Kong 2010-05-04 10:30:59 UTC

(In reply to comment #16)
> (In reply to comment #14)
> > I also test on other machines, it's easy to reproduce when guest's memory size
> > is same as host's.
> 
> That's almost certainly why the bug happens; likely that we are hitting swap on
> the host and one of the qemu processes gets hosed.
> 
> It's also probable that this scenario is restricted to self migration.  Note
> that self migration requires
> 
> 1) reading all memory from old VM
> 2) passing data over network socket
> 3) writing all memory to new VM
> 
> this strides over the entire host memory and requires easily > 2X the host
> memory plus resources for network and processes.  You will swap, and
> performance will be bad.

yes, the performance is very bad.

> I suggest this is not a migration bug at all.  Solving this problem is exactly
> equivalent to solving the problem where a user issues SIGSTOP to the qemu
> process for ~90 seconds.
> 
> Does anyonne know if RHEL 3.9 has tick compensation or not?  (It is 2.4, so I
> guess it does not).
> 
> Perhaps dropping -kvm-no-pit-reinjection is all that is needed to solve the
> bug.
> 
> Can you try that and see if the bug goes away?    

Do offline migration, RESULT:
with '-kvm-no-pit-reinjection' : bug exists
without '-kvm-no-pit-reinjection' : bug exists


host kernel: 2.6.18-194.el5
# rpm -qa |grep kvm
kvm-83-170.el5
etherboot-zroms-kvm-5.4.4-13.el5
kvm-qemu-img-83-170.el5
kvm-debuginfo-83-170.el5
kmod-kvm-83-170.el5
kvm-tools-83-170.el5

Comment 18 Dor Laor 2010-05-04 14:34:15 UTC

(In reply to comment #17)
> (In reply to comment #16)
> > (In reply to comment #14)
> > > I also test on other machines, it's easy to reproduce when guest's memory size
> > > is same as host's.
> > 
> > That's almost certainly why the bug happens; likely that we are hitting swap on
> > the host and one of the qemu processes gets hosed.
> > 
> > It's also probable that this scenario is restricted to self migration.  Note
> > that self migration requires
> > 
> > 1) reading all memory from old VM
> > 2) passing data over network socket
> > 3) writing all memory to new VM
> > 
> > this strides over the entire host memory and requires easily > 2X the host
> > memory plus resources for network and processes.  You will swap, and
> > performance will be bad.
> 
> yes, the performance is very bad.

Can you make sure the host does not swap? We know for a fact that w/o pvclock time keeping is not 100% reliable. If the host is loaded or swap the guests it will greatly influence time keeping

Comment 19 Zachary Amsden 2010-05-04 17:48:06 UTC

(In reply to comment #18)

> Can you make sure the host does not swap? We know for a fact that w/o pvclock
> time keeping is not 100% reliable. If the host is loaded or swap the guests it
> will greatly influence time keeping    

This scenario is nearly 100% guarantee that it will introduce swap, not just small swap, but heavy, heavy swapping.  I don't think it's necessarily a migration or timekeeping bug; however, swapping during migration could be particularly bad for timekeeping.

There is possibly some hints we could provide to the mm layer using madvise; the access pattern here is going to run over 4G over memory with MADV_SEQUENTIAL semantics.

However, a 4 gigabyte memory self-migration on a 4 gigabyte host is not going to be a good performance scenario in any case, and isn't a realistic usage pattern for a real deployment.  If the problem won't exist on a host->host migration and won't exist if the memory sizes are within the capacity of the host, I suggest it's not a high priority bug.

I will still try an experiment with inducing swapping on the host to see if the effect is the same, perhaps we can do better catchup of timekeeping after, but it isn't likely we can address this at all in RHEL 5.

Comment 20 Amos Kong 2010-05-06 11:52:36 UTC

(In reply to comment #19)
> (In reply to comment #18)
> 
> > Can you make sure the host does not swap? We know for a fact that w/o pvclock
> > time keeping is not 100% reliable. If the host is loaded or swap the guests it
> > will greatly influence time keeping    
> 
> This scenario is nearly 100% guarantee that it will introduce swap, not just
> small swap, but heavy, heavy swapping.  I don't think it's necessarily a
> migration or timekeeping bug; however, swapping during migration could be
> particularly bad for timekeeping.


Yes, this bug only can be reproduce when swap is heavy. If set guest_mem to host_mem/2, the drift is very small.

I've tested the migration between two host, the drift is very small(0.28, this can be ignored). The memory size of guest and host are all 4G. 


> There is possibly some hints we could provide to the mm layer using madvise;
> the access pattern here is going to run over 4G over memory with
> MADV_SEQUENTIAL semantics.
> 
> However, a 4 gigabyte memory self-migration on a 4 gigabyte host is not going
> to be a good performance scenario in any case, and isn't a realistic usage
> pattern for a real deployment.  If the problem won't exist on a host->host
> migration and won't exist if the memory sizes are within the capacity of the
> host, I suggest it's not a high priority bug.
> 
> I will still try an experiment with inducing swapping on the host to see if the
> effect is the same, perhaps we can do better catchup of timekeeping after, but
> it isn't likely we can address this at all in RHEL 5.

Comment 21 Dor Laor 2010-05-06 12:13:31 UTC

In this case I'll close the bug as won't fix

Note You need to log in before you can comment on or make changes to this bug.