Bug 1107835
Summary: | Windows XP VM hangs after live migration | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Retired] oVirt | Reporter: | Markus Stockhausen <mst> | ||||||||
Component: | ovirt-engine-webadmin | Assignee: | Francesco Romani <fromani> | ||||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Ilanit Stein <istein> | ||||||||
Severity: | urgent | Docs Contact: | |||||||||
Priority: | unspecified | ||||||||||
Version: | 3.4 | CC: | bugs, ecohen, fromani, gklein, iheim, mavital, mgoldboi, michal.skrivanek, mst, rbalakri, yeylon | ||||||||
Target Milestone: | --- | Keywords: | Triaged | ||||||||
Target Release: | 3.5.0 | ||||||||||
Hardware: | Unspecified | ||||||||||
OS: | Unspecified | ||||||||||
Whiteboard: | virt | ||||||||||
Fixed In Version: | ovirt-3.5.0-beta2 | Doc Type: | Bug Fix | ||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2014-10-17 12:35:13 UTC | Type: | Bug | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | Virt | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Bug Depends On: | |||||||||||
Bug Blocks: | 1083529 | ||||||||||
Attachments: |
|
Description
Markus Stockhausen
2014-06-10 18:26:56 UTC
Created attachment 907354 [details]
vdsm.log
Created attachment 907359 [details]
video
Video attached. The VM has a XP telnet server installed. As you can see the machine can respond to network packets. Nevertheless in its stalled state it does not allow to login to the telnet server (which is possible when machine has a normal state). A few further tests revealed a situation where at least the task manager was still active and responsive in the SPICE console. Nevertheless it did not provide any updates. Created attachment 907818 [details]
taskmanager of stalled vm
My observations led to the conclusion that it must be somehow related to guest clock handling. So I installed the old Windows 98 executable choice.exe. This prompts for an input and allows to end without input after a predefined time. E.G. choice.exe /t:y,3 InputSomething - Will prompt "InputSomething[Y,N]?" - Allows the user to input Y or N key - Will end after 3 seconds if no input is given After start of VM the program will behave as expected. It will end after 3 seconds. If the machine has gone into pathologic state (after X migrations), the time trigger does not work anymore. Nevertheless program it will react on user inputs. To further nail things down I followed several advices about guest timing issues. My XP SP3 VM has following additional settings active: - Parameter /usepmtimer has been added to boot.ini entry for system start - HKEY_LOCAL_MACHINE\SYSTEM\ControlSet001\Services\Processor\Start has been set to a value of 4. This avoids processor idling and in OVirt VM CPU ist always 100% Nevertheless both settings do net help to mitigate the problem. Don't really know why, but inspired by the attached threads I added a hook to modify libvirt xml to ... <timer name='rtc' track='guest'/> ... This can be achieved with the following python script: #!/usr/bin/python import os import sys import hooking import traceback domxml = hooking.read_domxml() t = domxml.getElementsByTagName('timer')[0] t.setAttribute('track','guest') hooking.write_domxml(domxml) This calls qemu with the extended command line "-rtc clock=vm,...". Afterwards I generated a testscript on the engine that migrates the vm every 30 seconds via: #!/bin/bash i=0 while [ 1 -eq 1 ]; do ovirt-shell -c -E "action vm colvm36 migrate" i=`expr $i + 1` echo Machine migrated $i times. sleep 30 done While I'm writing these lines my XP VM has passed 57 consecutive online migrations without any problem. This is a stellar jump versus the usual hangups after 5-10 online migrations in the default configuration. Links: http://stackoverflow.com/questions/17784178/qemu-failed-in-loadvm-while-the-guest-system-is-windows-xp http://lists.gnu.org/archive/html/qemu-devel/2009-10/msg00762.html P.S. Ths XP is a domain member. Similar bug where qemu parametrization could be enhanced: BZ1110305 do you see the same improvement in bug 1110305 after your modification ? In contrast to BZ1110305 i can confirm that setting the track=guest option improves the stability of the VM dramatically during live migrations. Sorry for having no other technical explanations. What makes this bug different from BZ1110305: - relax option of hypervisor will not affect Windowx XP VM behaviour (as far as I understand) - this bug is about live migrations - BZ1110305 is about runnning VMs that give BSOD due to high load. VDSM patch posted for review Markus, thanks for the extensive investigation and for the excellent BZ entry! Thanks for the quick fix. Just to understand it right. The patches will allow to set a "hyperv" flag. With that two switches are enabled. - qemu relax_hv option => To improve Windows 7 stability - qemu clock track=guest option => To improve Windows Xp migration stability. So we make no difference between XP and Win7 and simply activate these switches for all Windows VMs? This is correct. These patches are part of a series which will collectively improve the hyperv support. This is way alle the settings are toggled by the new 'hypervEnable' boolean. All the new settings are going in the direction of the libvirt recommended settings, so they should be good for all the windows-es. Moreover, but I need to check, I'm not sure Engine distinguish between windows releases, e.g. winXP vs win7. Thanks for the clarification. The patches are VDSM only? If yes should I file a new BZ to enable those features in the engine? Engine support is required (and already in the works) to fully resolve this BZ. I don't think you need a separate one. VDSM support merged. The new code will be transparently enabled for windows guests once this patch gets merged http://gerrit.ovirt.org/#/c/29238/ engine support merged in both master and 3.5: engine master: http://gerrit.ovirt.org/#/c/29238/ engine 3.5.0 http://gerrit.ovirt.org/#/c/30188/ VDSM master: http://gerrit.ovirt.org/#/c/27619/ http://gerrit.ovirt.org/#/c/29233/ turns out VDSM patch was merged after 3.5 branched. Posted backports: http://gerrit.ovirt.org/#/c/30254/ http://gerrit.ovirt.org/#/c/30255/ Verified on ovirt-engine 3.5 - rc1 vdsm vdsm-4.16.1-6.gita4a4614.el6.x86_64 oVirt 3.5 has been released and should include the fix for this issue. |