Bug 531268 - Timedrift on VM with pv_clock enabled, causing system hangs and sporadic time behaviour
Summary: Timedrift on VM with pv_clock enabled, causing system hangs and sporadic time...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.4
Hardware: All
OS: Linux
urgent
medium
Target Milestone: rc
: ---
Assignee: Glauber Costa
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
: 521517 (view as bug list)
Depends On:
Blocks: 528898 531025 537027 570824
TreeView+ depends on / blocked
 
Reported: 2009-10-27 14:27 UTC by Dan Yasny
Modified: 2018-12-06 14:34 UTC (History)
20 users (show)

Fixed In Version: kernel-2.6.18-176.el5
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 570824 (view as bug list)
Environment:
Last Closed: 2010-03-30 06:53:12 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
gettimeofday (7.03 KB, application/octet-stream)
2010-01-29 07:34 UTC, Qunfang Zhang
no flags Details
gettimeofday.c (595 bytes, text/x-csrc)
2010-01-29 07:34 UTC, Qunfang Zhang
no flags Details
time go backwards log. (208.00 KB, application/octet-stream)
2010-02-01 05:08 UTC, Qunfang Zhang
no flags Details
32bitrhel5.5-AMD-timeback.txt (18.00 KB, text/plain)
2010-03-02 09:37 UTC, lihuang
no flags Details
32bitrhel5.5-Intel-timeback.txt (17.90 KB, text/plain)
2010-03-02 09:38 UTC, lihuang
no flags Details
64bitrhel5.5-Intel-timeback.txt (18.01 KB, text/plain)
2010-03-02 09:39 UTC, lihuang
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2010:0178 0 normal SHIPPED_LIVE Important: Red Hat Enterprise Linux 5.5 kernel security and bug fix update 2010-03-29 12:18:21 UTC

Description Dan Yasny 2009-10-27 14:27:59 UTC
Description of problem:
RHEL5.4 VM running on RHEV-H host showing inconsistent time, hangs during script execution

Version-Release number of selected component (if applicable):
Host:
[root@beta-vdsa ~]# uname -a
Linux beta-vdsa.gss.lab.tlv.redhat.com 2.6.18-164.2.1.el5 #1 SMP Mon Sep 21 04:37:42 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux
[root@beta-vdsa ~]# rpm -qa |grep -i kvm
kmod-kvm-83-105.el5_4.8
kvm-tools-83-105.el5_4.8
etherboot-zroms-kvm-5.4.4-10.el5
kvm-debuginfo-83-105.el5_4.8
kvm-qemu-img-83-105.el5_4.8
kvm-83-105.el5_4.8

[root@beta-vdsa ~]# cat /proc/cpuinfo                                                                                       
processor       : 0                                                                                                         
vendor_id       : GenuineIntel                                                                                              
cpu family      : 6                                                                                                         
model           : 23                                                                                                        
model name      : Intel(R) Xeon(R) CPU           E5420  @ 2.50GHz                                                           
stepping        : 6                                                                                                         
cpu MHz         : 2493.750                                                                                                  
cache size      : 6144 KB                                                                                                   
physical id     : 0                                                                                                         
siblings        : 4                                                                                                         
core id         : 0                                                                                                         
cpu cores       : 4                                                                                                         
apicid          : 0                                                                                                         
fpu             : yes                                                                                                       
fpu_exception   : yes                                                                                                       
cpuid level     : 10                                                                                                        
wp              : yes                                                                                                       
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall lm constant_tsc pni monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr sse4_1 lahf_lm                                                                                                                                                 
bogomips        : 4987.50                                                                                                                                                             
clflush size    : 64                                                                                                                                                                  
cache_alignment : 64                                                                                                                                                                  
address sizes   : 38 bits physical, 48 bits virtual                                                                                                                                   
power management:                                                                                                                                                                     

processor       : 1
vendor_id       : GenuineIntel
cpu family      : 6           
model           : 23          
model name      : Intel(R) Xeon(R) CPU           E5420  @ 2.50GHz
stepping        : 6                                              
cpu MHz         : 2493.750                                       
cache size      : 6144 KB                                        
physical id     : 0                                              
siblings        : 4                                              
core id         : 1                                              
cpu cores       : 4                                              
apicid          : 1                                              
fpu             : yes                                            
fpu_exception   : yes                                            
cpuid level     : 10                                             
wp              : yes                                            
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall lm constant_tsc pni monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr sse4_1 lahf_lm                                                                                                                                                 
bogomips        : 4987.47                                                                                                                                                             
clflush size    : 64                                                                                                                                                                  
cache_alignment : 64                                                                                                                                                                  
address sizes   : 38 bits physical, 48 bits virtual                                                                                                                                   
power management:                                                                                                                                                                     

processor       : 2
vendor_id       : GenuineIntel
cpu family      : 6           
model           : 23          
model name      : Intel(R) Xeon(R) CPU           E5420  @ 2.50GHz
stepping        : 6                                              
cpu MHz         : 2493.750                                       
cache size      : 6144 KB                                        
physical id     : 0                                              
siblings        : 4                                              
core id         : 2                                              
cpu cores       : 4                                              
apicid          : 2                                              
fpu             : yes                                            
fpu_exception   : yes                                            
cpuid level     : 10                                             
wp              : yes                                            
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall lm constant_tsc pni monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr sse4_1 lahf_lm                                                                                                                                                 
bogomips        : 4987.50                                                                                                                                                             
clflush size    : 64                                                                                                                                                                  
cache_alignment : 64                                                                                                                                                                  
address sizes   : 38 bits physical, 48 bits virtual                                                                                                                                   
power management:                                                                                                                                                                     

processor       : 3
vendor_id       : GenuineIntel
cpu family      : 6           
model           : 23
model name      : Intel(R) Xeon(R) CPU           E5420  @ 2.50GHz
stepping        : 6
cpu MHz         : 2493.750
cache size      : 6144 KB
physical id     : 0
siblings        : 4
core id         : 3
cpu cores       : 4
apicid          : 3
fpu             : yes
fpu_exception   : yes
cpuid level     : 10
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall lm constant_tsc pni monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr sse4_1 lahf_lm
bogomips        : 4987.49
clflush size    : 64
cache_alignment : 64
address sizes   : 38 bits physical, 48 bits virtual
power management:

_________________________________________

VM:
[root@localhost ~]# uname -a
Linux localhost.localdomain 2.6.18-164.2.1.el5 #1 SMP Mon Sep 21 04:37:42 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux

[root@localhost ~]# dmesg|grep -i kvm
kvm-clock: cpu 0, msr 7eff:80433401, boot clock
kvm-clock: cpu 0, msr 0:1575401, primary cpu clock
kvm_get_tsc_khz: cpu 0, msr 0:1602001
kvm-clock: cpu 1, msr 0:157da81, secondary cpu clock
time.c: Using 1.193182 MHz WALL KVM GTOD KVM timer.



How reproducible:  
This is exactly one minute of te script running, note the output doesn't show the actual 60 seconds in stdout. Also 5 seconds are lost
[root@localhost ~]# while true; do sleep 1; date; done
Wed Oct 28 01:14:36 IST 2009
Wed Oct 28 01:14:43 IST 2009
Wed Oct 28 01:14:49 IST 2009
Wed Oct 28 01:14:56 IST 2009
Wed Oct 28 01:15:02 IST 2009
Wed Oct 28 01:15:09 IST 2009
Wed Oct 28 01:15:10 IST 2009
Wed Oct 28 01:15:17 IST 2009
Wed Oct 28 01:15:23 IST 2009
Wed Oct 28 01:15:24 IST 2009
Wed Oct 28 01:15:31 IST 2009


This is another un of the script, note the time jumping back:
[root@localhost ~]# while true; do sleep 1; date; done
Wed Oct 28 01:17:56 IST 2009
Wed Oct 28 01:18:02 IST 2009
Wed Oct 28 01:18:09 IST 2009
Wed Oct 28 01:18:15 IST 2009
Wed Oct 28 01:18:16 IST 2009
Wed Oct 28 01:18:12 IST 2009
Wed Oct 28 01:18:18 IST 2009
Wed Oct 28 01:18:25 IST 2009
Wed Oct 28 01:18:31 IST 2009
Wed Oct 28 01:18:32 IST 2009
Wed Oct 28 01:18:28 IST 2009
Wed Oct 28 01:18:34 IST 2009
Wed Oct 28 01:18:41 IST 2009
Wed Oct 28 01:18:47 IST 2009
Wed Oct 28 01:18:43 IST 2009



  
Actual results:
adding clock=pmtmr divider=10 made things a bit better.

Expected results:
no timedrift, no hangups of the running script, seeing every second in stdout, or at least almost every second, while running while true; do sleep 1; date; done


Additional info:
a VM exists on which it is reproducible. ping me on #gss-rhev or via email if you require access.

Comment 1 Dan Yasny 2009-10-27 14:56:37 UTC
after reboot the script ra for 60 seconds, showing every second.
however it counted 50 sec during actual 60sec.

Comment 2 Glauber Costa 2009-10-30 08:37:02 UTC
This is probably fixed by the last series I sent out.

Dan, I gave you a kernel with the fix included. Can you please confirm that it fixes the issue for you ?

Thanks!

Comment 3 Dan Yasny 2009-10-30 09:04:45 UTC
The VM with the new kernel is running, I'll test it again on Monday, and update the BZ with the results

Comment 4 Dan Yasny 2009-11-02 10:07:36 UTC
Checked the VM today - looks like the problem is solved by the new kernel - no visible drift at all.

Comment 14 Don Zickus 2009-12-02 21:14:52 UTC
in kernel-2.6.18-176.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Please do NOT transition this bugzilla state to VERIFIED until our QE team
has sent specific instructions indicating when to do so.  However feel free
to provide a comment indicating that this fix has been verified.

Comment 16 Glauber Costa 2009-12-28 12:56:18 UTC
*** Bug 521517 has been marked as a duplicate of this bug. ***

Comment 17 Qunfang Zhang 2010-01-29 07:31:11 UTC
I ran a rhel5.5 guest in rhel5.5 Intel host, time inside guest went backwards after 4 hours.Ran the guest in AMD host for a whole night,the problem does not exist.

host: kernel-2.6.18-185.el5
      kvm-83-154.el5
guest:kernel-2.6.18-185.el5 (rhel5.5-64bit)

Steps:
1.Boot a rhel5.5 guest
/usr/libexec/qemu-kvm -drive file=RHEL-Server-5.4-64-virtio.qcow2,if=virtio,boot=on -no-hpet -rtc-td-hack -usbdevice tablet -startdate now -smp 2 -m 2G -net nic,model=virtio,macaddr=20:20:20:11:23:5f,vlan=0 -net tap,vlan=0,script=/etc/qemu-ifup -cpu qemu64,+sse2 -name 64 -monitor stdio -vnc :8 -no-kvm-pit-reinjection

2.Run #./gettimeofday  inside guest.("gettimeofday" will be attached.)

Result: Time went backwards after 4 hours.
[root@localhost ~]# ./gettimeofday
time went backwards:
tv.tv_sec = 1264747084, tv.tv_usec = 940012
lasttv.tv_sec = 1264747084, lasttv.tv_usec = 940013

In guest:
[root@localhost ~]# dmesg | grep time.c
time.c: Using tsc for timekeeping HZ 1000
time.c: Using 1.193182 MHz WALL KVM GTOD KVM timer.
time.c: Detected 2826.230 MHz processor.

Host info:(4 cpu,here only list one)
processor	: 3
vendor_id	: GenuineIntel
cpu family	: 6
model		: 23
model name	: Intel(R) Core(TM)2 Quad CPU    Q9550  @ 2.83GHz
stepping	: 10
cpu MHz		: 2826.231
cache size	: 6144 KB
physical id	: 0
siblings	: 4
core id		: 3
cpu cores	: 4
apicid		: 3
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx lm constant_tsc pni monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr sse4_1 lahf_lm
bogomips	: 5652.50
clflush size	: 64
cache_alignment	: 64
address sizes	: 36 bits physical, 48 bits virtual
power management:

Comment 18 Qunfang Zhang 2010-01-29 07:34:31 UTC
Created attachment 387520 [details]
gettimeofday

Comment 19 Qunfang Zhang 2010-01-29 07:34:59 UTC
Created attachment 387521 [details]
gettimeofday.c

Comment 23 Qunfang Zhang 2010-02-01 05:08:01 UTC
Created attachment 387970 [details]
time go backwards log.

Comment 24 Dor Laor 2010-02-10 15:18:29 UTC
Do you want to reopen the bug?

Comment 25 Chris Ward 2010-02-11 10:34:09 UTC
~~ Attention Customers and Partners - RHEL 5.5 Beta is now available on RHN ~~

RHEL 5.5 Beta has been released! There should be a fix present in this 
release that addresses your request. Please test and report back results 
here, by March 3rd 2010 (2010-03-03) or sooner.

Upon successful verification of this request, post your results and update 
the Verified field in Bugzilla with the appropriate value.

If you encounter any issues while testing, please describe them and set 
this bug into NEED_INFO. If you encounter new defects or have additional 
patch(es) to request for inclusion, please clone this bug per each request
and escalate through your support representative.

Comment 26 lihuang 2010-03-02 09:36:07 UTC
(In reply to comment #24)
> Do you want to reopen the bug?    

have the guests running for 48hour. we have reproduced it again.
both guest and host are installed from tree RHEL5.5-Server-20100217.0/
kernel-2.6.18-189.el5
kvm-83-157.el5

--------------------------------------------
guest 	        host 	result
--------------------------------------------
64bitrhel5.5 	amd* 	pass--no time back
64bitrhel5.5 	Intel* 	failed
32bitrhel5.5 	amd 	failed
32bitrhel5.5 	Intel 	failed 
--------------------------------------------

CLI.
/usr/libexec/qemu-kvm -no-hpet -usbdevice tablet -rtc-td-hack -smp 2 -m 2G -drive file=rhel5.5-64-virtio.qcow2,if=virtio,boot=on -net nic,vlan=0,macaddr=20:88:99:11:20:86 -net tap,vlan=0,script=/etc/qemu-ifup -uuid eb8a5d04-feae-480c-989c-edc431f3363f -cpu qemu64,+sse2 -vnc :10 -monitor stdio -notify all -name 64_Intel -startdate now

/usr/libexec/qemu-kvm -no-hpet -usbdevice tablet -rtc-td-hack -smp 2 -m 2G -drive file=rhel5.5-32-virtio.qcow2,if=virtio,boot=on -net nic,vlan=0,macaddr=20:88:99:11:20:59 -net tap,vlan=0,script=/etc/qemu-ifup -uuid babd64c0-cd07-46fd-bcfb-3f8f6015bbf3 -cpu qemu64,+sse2 -vnc :11 -monitor stdio -notify all -M rhel5.5.0 -startdate now -name 32_intel

/usr/libexec/qemu-kvm -no-hpet -usbdevice tablet -rtc-td-hack -smp 2 -m 2G -drive file=rhel5.5-64-virtio.qcow2,if=virtio,boot=on -net nic,vlan=0,macaddr=20:88:99:11:20:69 -net tap,vlan=0,script=/etc/qemu-ifup -uuid c431b0cd-ddba-4a5c-9aa2-edccf9048316 -cpu qemu64,+sse2 -vnc :10 -monitor stdio -notify all -name 64_amd -startdate now

/usr/libexec/qemu-kvm -no-hpet -usbdevice tablet -rtc-td-hack -smp 2 -m 2G -drive file=rhel5.5-32-virtio.qcow2,if=virtio,boot=on -net nic,vlan=0,macaddr=20:88:99:11:20:56 -net tap,vlan=0,script=/etc/qemu-ifup -uuid 91a76cfe-8df4-466a-a177-5f80ed7ecf91 -cpu qemu64,+sse2 -vnc :11 -monitor stdio -notify all -M rhel5.5.0 -startdate now -name 32_amd


host cpuinfo:
processor : 3
vendor_id : AuthenticAMD
cpu family : 16
model : 2
model name : AMD Phenom(tm) 9600B Quad-Core Processor
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx lm constant_tsc pni monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr sse4_1 lahf_lm

processor : 3
vendor_id : GenuineIntel
cpu family : 6
model : 23
model name : Intel(R) Core(TM)2 Quad CPU Q9400 @ 2.66GHz
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx lm constant_tsc pni monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr sse4_1 lahf_lm

Comment 27 lihuang 2010-03-02 09:37:55 UTC
Created attachment 397265 [details]
32bitrhel5.5-AMD-timeback.txt

Comment 28 lihuang 2010-03-02 09:38:36 UTC
Created attachment 397266 [details]
32bitrhel5.5-Intel-timeback.txt

Comment 29 lihuang 2010-03-02 09:39:06 UTC
Created attachment 397267 [details]
64bitrhel5.5-Intel-timeback.txt

Comment 30 lihuang 2010-03-02 09:39:41 UTC
reopen

Comment 36 errata-xmlrpc 2010-03-30 06:53:12 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2010-0178.html


Note You need to log in before you can comment on or make changes to this bug.