Bug 1110305
Summary: | BSOD - CLOCK_WATCHDOG_TIMEOUT_2 - Win 7SP1 guest, need to set hv_relaxed | ||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Retired] oVirt | Reporter: | Markus Stockhausen <mst> | ||||||||||||||||||||||
Component: | vdsm | Assignee: | Francesco Romani <fromani> | ||||||||||||||||||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Pavel Novotny <pnovotny> | ||||||||||||||||||||||
Severity: | high | Docs Contact: | |||||||||||||||||||||||
Priority: | unspecified | ||||||||||||||||||||||||
Version: | 3.4 | CC: | amit.shah, bazulay, berrange, cfergeau, crobinso, dougsland, dwmw2, fromani, fsimonce, gklein, iheim, itamar, mavital, mgoldboi, michal.skrivanek, pbonzini, rbalakri, rjones, scottt.tw, virt-maint, yeylon | ||||||||||||||||||||||
Target Milestone: | --- | Keywords: | Triaged | ||||||||||||||||||||||
Target Release: | 3.5.0 | ||||||||||||||||||||||||
Hardware: | Unspecified | ||||||||||||||||||||||||
OS: | Unspecified | ||||||||||||||||||||||||
Whiteboard: | virt | ||||||||||||||||||||||||
Fixed In Version: | ovirt-3.5.0-beta2 | Doc Type: | Bug Fix | ||||||||||||||||||||||
Doc Text: | Story Points: | --- | |||||||||||||||||||||||
Clone Of: | Environment: | ||||||||||||||||||||||||
Last Closed: | 2014-10-17 12:40:40 UTC | Type: | Bug | ||||||||||||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||||||||||||
Documentation: | --- | CRM: | |||||||||||||||||||||||
Verified Versions: | Category: | --- | |||||||||||||||||||||||
oVirt Team: | Virt | RHEL 7.3 requirements from Atomic Host: | |||||||||||||||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||||||||||||
Embargoed: | |||||||||||||||||||||||||
Bug Depends On: | |||||||||||||||||||||||||
Bug Blocks: | 1073943, 1083529 | ||||||||||||||||||||||||
Attachments: |
|
Description
Markus Stockhausen
2014-06-17 12:11:33 UTC
Crash occured again on several VMs. This happened during start of a single VM. We are driving that node in a OVirt NFS environment and collect OS data. So I attach the graphs of everything. 1) CPU of node colovn04 - the hypervisor node 2) Timedrift of node colovn04 (just in case that helps) 3) Memory usage of node colovn04 - yellow are KSM pages - the black line shows "uncompressed" KSM pages 4) Infiniband interface bytes - NFS is residing on that interface 5) NFS server IO Bytes 6) NFS Server IOs 7) NFS server average IO times 8) NFS server CPU usage Created attachment 909635 [details]
1 cpu hypervisor
Created attachment 909636 [details]
2 timedrift hypervisor
Created attachment 909638 [details]
3 memory hypervisor
Created attachment 909639 [details]
4 infiniband/NFS hypervisor
Created attachment 909640 [details]
5 - io bytes NFS server
Created attachment 909641 [details]
6 IOs NFS
Created attachment 909642 [details]
7 io times NFS
Created attachment 909643 [details]
8 cpu nfs server
Created attachment 909644 [details]
9 swap io hypervisor
Created attachment 909645 [details]
10 swap usage hypervisor
9/10 show swap IOs and usage on the hypervisor node Kernel on hypversior is 3.14.4-200.fc20.x86_64 There's a kbase article about this: https://access.redhat.com/site/solutions/755943 https://bugzilla.redhat.com/show_bug.cgi?id=990824 The suggested solution is to pass this with libvirt: <domain ...> <features> <hyperv> <relaxed state='on'/> </hyperv> </features> </domain> So ovirt should be doing that for windows 7 guests, reassigning Similar bug where qemu parametrization could be enhanced: BZ1107835 Francesco, can we handle this? (In reply to Cole Robinson from comment #13) > There's a kbase article about this: > > https://access.redhat.com/site/solutions/755943 > https://bugzilla.redhat.com/show_bug.cgi?id=990824 > > The suggested solution is to pass this with libvirt: > > <domain ...> > <features> > <hyperv> > <relaxed state='on'/> > </hyperv> > </features> > </domain> > > So ovirt should be doing that for windows 7 guests, reassigning Yes, there are already plans and patch floating: https://bugzilla.redhat.com/show_bug.cgi?id=1083529 http://gerrit.ovirt.org/#/c/27619/3 However, a few details still need to be sorted out to have proper support. (fixing product) (In reply to Francesco Romani from comment #16) we may try to expedite the hv_relaxed part…that's the simplest one since it's not a regression, AFAIK, I'd not block 3.5 for now A short update. Up to now I cannot tell if the bug is or not with the "relax" setting. We had the errors sporadic (once in two weeks) so no direct before/after effect comparable. For setting the parameter I simply rely on Cole Robinsons comment 13. VDSM patch posted for review. VDSM patch merged, Engine patch posted turns out VDSM patch was merged after 3.5 branched. Posted backports: http://gerrit.ovirt.org/#/c/30254/ http://gerrit.ovirt.org/#/c/30255/ Verified in vdsm-4.16.0-42.git3bfad86.el6.x86_64 (oVirt 3.5 beta2). Windows guests have now the hv_relaxed flag enabled, i.e., the QEMU process now looks like: 10774 ? Sl 0:10 /usr/libexec/qemu-kvm -name win7 -S -M rhel6.5.0 -cpu Nehalem,hv_relaxed -enable-kvm -m 1024 ... oVirt 3.5 has been released and should include the fix for this issue. |