Bug 1292922

Summary: Windows 2012 R2 systems periodically lock up.
Product: Red Hat Enterprise Virtualization Manager Reporter: Robert McSwain <rmcswain>
Component: vdsmAssignee: Dan Kenigsberg <danken>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Nisim Simsolo <nsimsolo>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 3.5.6CC: bazulay, ghammer, gklein, lsurette, michal.skrivanek, rmcswain, ycui, yeylon, ykaul
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Windows   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-02-17 12:24:24 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Virt RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Robert McSwain 2015-12-18 18:37:15 UTC
Description of problem:
Windows 2012 R2 systems periodically lock up and are unreachable, issue does not occur with Windows 2008 servers. Guest tools are installed with the latest.

Version-Release number of selected component (if applicable):
rhevm-3.5.6.2-0.1.el6ev.noarch

How reproducible:
Unknown, only seen with this customer so far.

Actual results:
Windows 2012 VMs are accessible 

Expected results:
VM is always accessible when it is running Win 2012 R2 systems

Additional info:
Data and analysis coming shortly

Comment 3 Dan Kenigsberg 2015-12-18 22:05:04 UTC
It is highly unlikely to be a Vdsm issue, as it is oblivious of which OS is being run on the VM.

Robert, can you specify versions of libvirt, qemu-kvm and kernel on the host running the VM?

Gal, can you take a look or ask a relevant question?

Comment 5 Dan Kenigsberg 2015-12-19 22:36:21 UTC
I'm guessing it's qemu, kenel, or Window guest driver, and not Vdsm.

Comment 6 Gal Hammer 2015-12-20 08:16:13 UTC
(In reply to Dan Kenigsberg from comment #3)
> It is highly unlikely to be a Vdsm issue, as it is oblivious of which OS is
> being run on the VM.
> 
> Robert, can you specify versions of libvirt, qemu-kvm and kernel on the host
> running the VM?
> 
> Gal, can you take a look or ask a relevant question?

Robert, is it possible to ask the customer to use NMI and provide a windows crash dump while this hang occurs? See https://support.microsoft.com/en-us/kb/927069 for instructions. Thanks.

Comment 7 Robert McSwain 2015-12-31 17:11:10 UTC
Dan, here's what the customer found. It seems like this isn't a hard lock based on his description:

No dump files were generated in the Dump file location (%SystemRoot%\MEMORY.DMP) on either VM.  I suspect there could be an underlying storage issue when we run in to this problem.  I should note that the systems still respond to ICMP ping when this problem occurs.  However, remote desktop cannot establish a socket connection and the log on screen interface via VNC is still there, but is non-responsive (locked up/frozen).

Comment 8 Michal Skrivanek 2016-01-08 17:33:15 UTC
hyperv enlightenment perhaps makes a difference? It's only partially in 3.5. But try disable it for that OS type in osinfo config
Do they have rhel 7 based hypervisor? Worth a try, it may not happen there

Comment 9 Robert McSwain 2016-01-11 19:56:52 UTC
Michal, I'll be glad to ask the customer to test under these conditions. Can you confirm how the customer would disable it? Is this under the Edit->Operating System options for the VM or is this in the /etc/ovirt-engine/osinfo.conf.d/00-defaults.properties file on the manager? And if it's in that file, what exactly are we setting and/or changing?

Comment 10 Michal Skrivanek 2016-01-11 22:46:45 UTC
The file. 
See also issues mentioned in bug 1163828 and associated patch. It is supposedly disabled in 3.5.6 already, just switch it regardless what it is now. There were some contradictory reports about the effect. 
Trying on RHEL 7 might make a big difference too

Comment 13 Michal Skrivanek 2016-01-23 06:10:37 UTC
Thanks. Well, time to try to enable hyperv flag or (better) run on 7.2 hypervisor

Comment 14 Red Hat Bugzilla 2023-09-14 03:15:05 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days