Description of problem: I create serveral HVM/PV DomUs on 3 RHEL-5.2-x86_64 Dom0 box, and set them with vcpus=2. All of them have encountered the same problem. If I ping from the DomU inside to another box, a negative or a huge RTT should appear. for example: a windows 2003 x64 DomU Reply from 10.10.63.22: bytes=32 time=-843420ms TTL=128 Reply from 10.10.63.22: bytes=32 time=-843420ms TTL=128 Reply from 10.10.63.22: bytes=32 time=-843420ms TTL=128 Reply from 10.10.63.22: bytes=32 time=-843420ms TTL=128 Reply from 10.10.63.22: bytes=32 time=843424ms TTL=128 Reply from 10.10.63.22: bytes=32 time<1ms TTL=128 Reply from 10.10.63.22: bytes=32 time<1ms TTL=128 Reply from 10.10.63.22: bytes=32 time=843428ms TTL=128 Reply from 10.10.63.22: bytes=32 time<1ms TTL=128 Reply from 10.10.63.22: bytes=32 time<1ms TTL=128 Reply from 10.10.63.22: bytes=32 time=843433ms TTL=128 Reply from 10.10.63.22: bytes=32 time=843433ms TTL=128 Reply from 10.10.63.22: bytes=32 time=843433ms TTL=128 Reply from 10.10.63.22: bytes=32 time<1ms TTL=128 Reply from 10.10.63.22: bytes=32 time=843436ms TTL=128 Reply from 10.10.63.22: bytes=32 time=843436ms TTL=128 Version-Release number of selected component (if applicable): [@63.17 ~]# yum list | fgrep xen | fgrep installed This system is not registered with RHN. RHN support will be disabled. kernel-xen.x86_64 2.6.18-92.1.13.el5 installed xen.x86_64 3.0.3-64.el5_2.1 installed xen-libs.i386 3.0.3-64.el5_2.1 installed xen-libs.x86_64 3.0.3-64.el5_2.1 installed [@63.17 ~]# yum list | fgrep virt This system is not registered with RHN. RHN support will be disabled. libvirt.i386 0.3.3-7.el5 installed libvirt.x86_64 0.3.3-7.el5 installed libvirt-python.x86_64 0.3.3-7.el5 installed python-virtinst.noarch 0.300.2-8.el5 installed virt-manager.x86_64 0.5.3-8.el5 installed virt-viewer.x86_64 0.0.2-2.el5 installed How reproducible: Steps to Reproduce: 1. setup a hvm domu with vcpus=2, windows/linux guest os are both ok. 2. ping another machine from the domu. 3. check the output of ping Actual results: ping get negative or huge result like below Reply from 10.10.63.22: bytes=32 time=-843420ms TTL=128 Reply from 10.10.63.22: bytes=32 time=-843420ms TTL=128 Reply from 10.10.63.22: bytes=32 time=-843420ms TTL=128 Reply from 10.10.63.22: bytes=32 time=-843420ms TTL=128 Reply from 10.10.63.22: bytes=32 time=843424ms TTL=128 Reply from 10.10.63.22: bytes=32 time<1ms TTL=128 Reply from 10.10.63.22: bytes=32 time<1ms TTL=128 Reply from 10.10.63.22: bytes=32 time=843428ms TTL=128 Reply from 10.10.63.22: bytes=32 time<1ms TTL=128 Reply from 10.10.63.22: bytes=32 time<1ms TTL=128 Reply from 10.10.63.22: bytes=32 time=843433ms TTL=128 Reply from 10.10.63.22: bytes=32 time=843433ms TTL=128 Reply from 10.10.63.22: bytes=32 time=843433ms TTL=128 Reply from 10.10.63.22: bytes=32 time<1ms TTL=128 Reply from 10.10.63.22: bytes=32 time=843436ms TTL=128 Reply from 10.10.63.22: bytes=32 time=843436ms TTL=128 Expected results: ping get a normal result. Additional info: I have not tried i386 host nor guest. Maybe they are broken too. It seems like http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=895 . I have tried 'timer_mode=1' , but nothing happens.
*** Bug 491014 has been marked as a duplicate of this bug. ***
This event sent from IssueTracker by mmahudha issue 275099
What timer source does the HVM domain in comment #0 use? PIT? TSC? HPET? PMTIMER?
I have tried timer=1 and no timer=. What are your suggestion? [@63.36 xen]# cat win2k3-rd-bdc04 name = "win2k3-rd-bdc04" uuid = "ae5cb128-e29c-4791-099f-74ca35f4f8cc" maxmem = 1024 memory = 768 vcpus = 2 builder = "hvm" kernel = "/usr/lib/xen/boot/hvmloader" boot = "c" pae = 1 acpi = 1 apic = 0 timer_mode = 1 localtime = 1 on_poweroff = "destroy" on_reboot = "restart" on_crash = "restart" device_model = "/usr/lib64/xen/bin/qemu-dm" usbdevice = "tablet" sdl = 0 vnc = 1 vncunused = 1 disk = [ "phy:/opt/vmware/win2k3-rd-bdc04/main.img,hda,w"] vif = [ "mac=00:16:3e:20:19:44,bridge=xenbr0" ] serial = "pty" [@63.36 xen]#
Additional messages: [@63.36 xen]# rpm -qa | egrep 'xen|virt|qemu' python-virtinst-0.300.2-12.el5 libvirt-python-0.3.3-14.el5 kernel-xen-devel-2.6.18-128.1.1.el5 xen-libs-3.0.3-80.el5 kernel-xen-2.6.18-128.1.1.el5 virt-manager-0.5.3-10.el5 xen-libs-3.0.3-80.el5 libvirt-0.3.3-14.el5 xen-3.0.3-80.el5 libvirt-0.3.3-14.el5 virt-viewer-0.0.2-2.el5 [@63.36 xen]# uname -a Linux 63.36 2.6.18-128.1.1.el5xen #1 SMP Mon Jan 26 14:19:09 EST 2009 x86_64 x86_64 x86_64 GNU/Linux [@63.36 xen]# xm info host : 63.36 release : 2.6.18-128.1.1.el5xen version : #1 SMP Mon Jan 26 14:19:09 EST 2009 machine : x86_64 nr_cpus : 4 nr_nodes : 1 sockets_per_node : 2 cores_per_socket : 2 threads_per_core : 1 cpu_mhz : 1995 hw_caps : bfebfbff:20100800:00000000:00000140:0004e33d:00000000:00000001 total_memory : 8191 free_memory : 2392 node_to_cpu : node0:0-3 xen_major : 3 xen_minor : 1 xen_extra : .2-128.1.1.el5 xen_caps : xen-3.0-x86_64 xen-3.0-x86_32p hvm-3.0-x86_32 hvm-3.0-x86_32p hvm-3.0-x86_64 xen_pagesize : 4096 platform_params : virt_start=0xffff800000000000 xen_changeset : unavailable cc_compiler : gcc version 4.1.2 20080704 (Red Hat 4.1.2-44) cc_compile_by : mockbuild cc_compile_domain : redhat.com cc_compile_date : Mon Jan 26 13:53:58 EST 2009 xend_config_format : 2
Xen makes several timer sources available to OSes inside HVM domains. What timer source does the guest OS inside the HVM domain in comment #0 use?
Dear Rik van Riel, how to check which timer source does I taken? I have already paste my xen domu configuration file in comment #9, is that not enough?
The answer to the Rik's question is what timer source windows guest is using. The default timer source in windows is TSC, if you have not changed it. So the answer is TSC. Unfortunately the problem reported in this bugzilla does not seem to be with underlying Virtualization, but with the QueryPerformanceCounter function in windows. Hence this problem is found if the Hyper-V is used as the virtualization mechanism as well. The solution is to use /usepmtimer flag in boot.in to set timer srouce as pmtimer on windows. See more details in the below link. http://blogs.msdn.com/tvoellm/archive/2008/06/05/negative-ping-times-in-windows-vm-s-whats-up.aspx Rik, is there anything we have to do on this?
Another Blog: http://blogs.msdn.com/mikekol/archive/2008/10/15/problems-with-queryperformancecounter-on-windows-server-2003-multi-processor-hyper-v-guests-here-s-why.aspx
Kirby, if you boot Windows with the /usepmtimer boot flag, does the bug still occur?
Thanks to all, it seems ok. At least, there is no negative or very large time in the output of ping. In my opinion, the modification of /usepmtimer boot flag should become a part of xen-pv driver setup program. BTW: Is there any linux kernel parameter to adjust the timer for some similar problem inside a KVM or Hyper-V based guest machine?
(In reply to comment #17) > BTW: Is there any linux kernel parameter to adjust the timer for some similar > problem inside a KVM or Hyper-V based guest machine? You should be able to use the pmtmr boot parameter. The kbase article http://kbase.redhat.com/faq/docs/DOC-3117, though not discussing virtualized systems, does have an example.
This bug can be caused by a combination of two main factors: - while doing disk IO, one VCPU of an HVM guest can miss timer ticks - Xen did not re-deliver those missed timer ticks later on, causing clock skew between VCPUs inside an HVM guest Both of these issues should be resolved with the backport of the AIO disk handling code and upstream Xen 'no missed-tick accounting' timer code. Please test the test RPMs from http://people.redhat.com/riel/.xenaiotime/ and let us know if those (experimental!) test packages resolve the issue.
Rik, do you have pointer to 'missed-tick accounting' so that I can take this from you?
The redelivering of missed ticks is done in our current Xen code base. What I did not do when backporting those patches is backporting the HPET fixes and improvements, since those seemed to cause a regression in clalance's tests.
This is either fixed or anyway a dup of bug 525752. Since the reporter was satisfied by /usepmtimer, and the reproduction instructions for the other bug are clear, closing this one.