Bug 1099846

Summary:	Not handling VM that crashed correctly
Product:	[Retired] oVirt	Reporter:	Arik <ahadas>
Component:	ovirt-engine-core	Assignee:	Roy Golan <rgolan>
Status:	CLOSED DUPLICATE	QA Contact:	Pavel Stehlik <pstehlik>
Severity:	unspecified	Docs Contact:
Priority:	high
Version:	3.5	CC:	acathrow, bugs, gklein, iheim, yeylon
Target Milestone:	---	Keywords:	Regression
Target Release:	3.5.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:	virt
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2014-08-15 19:30:06 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	Virt	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Arik 2014-05-21 10:49:45 UTC

Description of problem:
VM that went down is handled by the hosts/vms monitoring twice:
1. as a VM that switched to DOWN (because it is reported as DOWN by VDSM)
2. as a VM that wasn't returned by VDSM (and running in the DB)
Obviously, #2 shouldn't happen. It is a regression that was caused by http://gerrit.ovirt.org/#/c/25547: in VdsUpdateRunTimeInfo#removeVmsFromCache we are skipping a VM if its status wasn't change instead of if it was reported as running or not.
As a result we are calling VmPoolHandler#processVmPoolOnStopVm twice and it is wrong.

Version-Release number of selected component (if applicable):


How reproducible:
100%

Steps to Reproduce:
1. kill qemu process
2.
3.

Actual results:
VmPoolHandler#processVmPoolOnStopVm is called twice

Expected results:
VmPoolHandler#processVmPoolOnStopVm should be called one time

Additional info:
changing _vmsMovedToDown to Set or ensure we don't put the same VM more than once to it is not the right solution, we should fix the logic or change the logic (and the documentation) properly.

Comment 1 Arik 2014-08-15 19:30:06 UTC

Eventually as part of bz 1098791, I changed _vmsMovedToDown to be Set.
Working on a better solution doesn't worth the time, as the work on the refactored monitoring is already in progress.

*** This bug has been marked as a duplicate of bug 1098791 ***