Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1094882

Summary: Running VMs are showing paused and cannot be migrated.
Product: Red Hat Enterprise Virtualization Manager Reporter: James W. Mills <jamills>
Component: vdsmAssignee: Francesco Romani <fromani>
Status: CLOSED INSUFFICIENT_DATA QA Contact: meital avital <mavital>
Severity: high Docs Contact:
Priority: high    
Version: 3.3.0CC: bazulay, iheim, jamills, jhunsaker, lpeer, michal.skrivanek, mkalinin, ofrenkel, yeylon
Target Milestone: ---   
Target Release: 3.4.2   
Hardware: All   
OS: Linux   
Whiteboard: virt
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-06-23 12:35:44 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description James W. Mills 2014-05-06 16:45:24 UTC
Description of problem:

After a redundant storage path outage, customer has 2 hypervisors with VMs reporting as paused, but are in fact still running.


Version-Release number of selected component (if applicable):

* vdsm-4.13.2-0.13.el6ev.x86_64
* Red Hat Enterprise Virtualization Hypervisor release 6.5 (20140407.0.el6ev)

How reproducible:

Not reproducible at this point.  We have been having to rely on customer environment.

Steps to Reproduce:

* Disabled one of the switch network redundant paths

* RHEV detected a fault and started automatically migrating everything at once, overloading the network.

* now many virtual machines are set in a paused state, but they are not paused, and we can't unpause them, can't migrate them, can only shut them down

Actual results:

VMs in paused state according to RHEVM, vdsClient, but in gfact running, and reported as running via virsh directly on host.

Expected results:

vdsm reads state from qemu/libvirt properly

Additional info:

Logs will be attached soon.

Here are the steps we used to attempt to get the status updated:

* Unpause from RHEVM - no change

* Set state to up in DB directly - changed back by vdsm

* Shutdown supervdsm, vdsm, and libvirtd on host, restarted all. - VMs still reporting paused

* Via virsh directly, all VMs report "running", and are reachable on the network.

Comment 2 Omer Frenkel 2014-05-07 06:10:45 UTC
first we need vdsm.log, libvirt.log and engine.log for the time of the failure and also un-pause attempts

paused vms were migrated? if so, are they paused on original source host, or destination.

Comment 5 Francesco Romani 2014-05-12 09:51:26 UTC
taking the bug

Comment 12 Francesco Romani 2014-06-04 13:32:34 UTC
sorry for the noise, wrong browser tab when adding blocked bug.

Comment 15 Francesco Romani 2014-06-23 12:35:44 UTC
The supplied VDSM logs did not cover the incident time window, so it is not possible to understand what happened.