Bug 1099846

Summary: Not handling VM that crashed correctly
Product: [Retired] oVirt Reporter: Arik <ahadas>
Component: ovirt-engine-coreAssignee: Roy Golan <rgolan>
Status: CLOSED DUPLICATE QA Contact: Pavel Stehlik <pstehlik>
Severity: unspecified Docs Contact:
Priority: high    
Version: 3.5CC: acathrow, bugs, gklein, iheim, yeylon
Target Milestone: ---Keywords: Regression
Target Release: 3.5.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: virt
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-08-15 19:30:06 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Virt RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Arik 2014-05-21 10:49:45 UTC
Description of problem:
VM that went down is handled by the hosts/vms monitoring twice:
1. as a VM that switched to DOWN (because it is reported as DOWN by VDSM)
2. as a VM that wasn't returned by VDSM (and running in the DB)
Obviously, #2 shouldn't happen. It is a regression that was caused by http://gerrit.ovirt.org/#/c/25547: in VdsUpdateRunTimeInfo#removeVmsFromCache we are skipping a VM if its status wasn't change instead of if it was reported as running or not.
As a result we are calling VmPoolHandler#processVmPoolOnStopVm twice and it is wrong.

Version-Release number of selected component (if applicable):


How reproducible:
100%

Steps to Reproduce:
1. kill qemu process
2.
3.

Actual results:
VmPoolHandler#processVmPoolOnStopVm is called twice

Expected results:
VmPoolHandler#processVmPoolOnStopVm should be called one time

Additional info:
changing _vmsMovedToDown to Set or ensure we don't put the same VM more than once to it is not the right solution, we should fix the logic or change the logic (and the documentation) properly.

Comment 1 Arik 2014-08-15 19:30:06 UTC
Eventually as part of bz 1098791, I changed _vmsMovedToDown to be Set.
Working on a better solution doesn't worth the time, as the work on the refactored monitoring is already in progress.

*** This bug has been marked as a duplicate of bug 1098791 ***