Bug 1072282 - VM split brain caused by network outage
Summary: VM split brain caused by network outage
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 3.3.0
Hardware: All
OS: Linux
urgent
urgent
Target Milestone: ---
: 3.4.0
Assignee: Roy Golan
QA Contact: Artyom
URL:
Whiteboard: virt
: 1090536 (view as bug list)
Depends On:
Blocks: 1074578 rhev3.4beta 1142926
TreeView+ depends on / blocked
 
Reported: 2014-03-04 09:53 UTC by Roman Hodain
Modified: 2019-04-28 10:42 UTC (History)
16 users (show)

Fixed In Version: av3
Doc Type: Bug Fix
Doc Text:
Previously, the GetVmStats request failed. This caused high-availability virtual machines to be incorrectly listed as "down". They were then rescheduled on another host, causing two instances of the same virtual machine to be running at once. Now, the system ignores virtual machine updates until the next monitoring cycle, allowing the system to retrieve the virtual machine stats.
Clone Of:
: 1074578 (view as bug list)
Environment:
Last Closed: 2014-06-09 15:05:03 UTC
oVirt Team: ---
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2014:0506 0 normal SHIPPED_LIVE Moderate: Red Hat Enterprise Virtualization Manager 3.4.0 update 2014-06-09 18:55:38 UTC
oVirt gerrit 25547 0 None None None Never
oVirt gerrit 25548 0 None None None Never

Description Roman Hodain 2014-03-04 09:53:50 UTC
Description of problem:
	After a network outage some VMs are started on more then one hypervisor
as they are incorrectly considered as in down state.

Version-Release number of selected component (if applicable):
	rhevm-3.3.0-0.46.el6ev.noarch

How reproducible:
	Not clear yet

Steps to Reproduce:
	Not clear yet, but the scenario could be:
		1. intall more then on hypervisor
		2. prevent RHEV-M to connecto to those VMs (also power management
		   is defunct due to netwrok outage)
		3. Let one hyoervisor to be reachable by RHEV-M

Actual results:
	Some VMs are considered as down and are started on another hypervisor

Expected results:
	Vms are marked as in unknown stated

Comment 7 Roy Golan 2014-03-06 12:32:00 UTC
Roman - can you get the logs from the host bl460-282 with data time after the one attached - i.e 2014-02-15 15:01:01 and onward

I want to see what this host reported to backend as its internal vm list
and this might explain this.

If the host reported that vm svcz0plgfa50 is not in its list currently, 
then its only natural to see this

2014-02-15 15:09:59,229 INFO  [org.ovirt.engine.core.bll.VdsEventListener] (DefaultQuartzScheduler_Worker-61) [5989b7e9] Highly Available VM went down. Attempting to restart. VM Name: svcz0plgfa50-bwd00, VM Id:1
f91022e-ef39-4120-877a-05d15432dfac

Comment 9 Michal Skrivanek 2014-03-09 12:30:12 UTC
Roy, I'm all for option 1

Comment 11 Artyom 2014-03-18 17:19:30 UTC
Verified on av3
Until host on what runs vms, not change status to up, vms stay in unknown status

Comment 18 Michal Skrivanek 2014-04-24 14:39:27 UTC
*** Bug 1090536 has been marked as a duplicate of this bug. ***

Comment 19 errata-xmlrpc 2014-06-09 15:05:03 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2014-0506.html


Note You need to log in before you can comment on or make changes to this bug.