Bug 1558709 - VM remains migrating forever with no Host (actually doesn't exist) after StopVmCommand fails to DestroyVDS
Summary: VM remains migrating forever with no Host (actually doesn't exist) after Stop...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: BLL.Virt
Version: 4.2.2
Hardware: x86_64
OS: Linux
high
urgent
Target Milestone: ovirt-4.3.0
: 4.3.0
Assignee: Arik
QA Contact: Polina
URL:
Whiteboard:
Depends On:
Blocks: 1584885
TreeView+ depends on / blocked
 
Reported: 2018-03-20 20:45 UTC by Polina
Modified: 2019-02-13 07:46 UTC (History)
1 user (show)

Fixed In Version: ovirt-engine-4.3.0_alpha
Clone Of:
: 1584885 (view as bug list)
Environment:
Last Closed: 2019-02-13 07:46:35 UTC
oVirt Team: Virt
Embargoed:
rule-engine: ovirt-4.3+


Attachments (Terms of Use)
logs and screenshot (2.78 MB, application/x-gzip)
2018-03-20 20:45 UTC, Polina
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 90938 0 master MERGED core: cleanup in destroy-vm parameters 2020-07-28 07:14:21 UTC
oVirt gerrit 90939 0 master MERGED core: do not remove vm async running command on power-off 2020-07-28 07:14:20 UTC
oVirt gerrit 90940 0 master MERGED core: remove async vm running command only on successful destroy 2020-07-28 07:14:20 UTC
oVirt gerrit 90941 0 master MERGED core: omit misleading log during vm migration 2020-07-28 07:14:21 UTC
oVirt gerrit 90942 0 master MERGED core: fix power-off migrating vm 2020-07-28 07:14:20 UTC
oVirt gerrit 91059 0 master MERGED core: ignore noVM when monitoring destroys migrating vm 2020-07-28 07:14:20 UTC
oVirt gerrit 91199 0 ovirt-engine-4.2 MERGED core: cleanup in destroy-vm parameters 2020-07-28 07:14:19 UTC
oVirt gerrit 91200 0 ovirt-engine-4.2 MERGED core: do not remove vm async running command on power-off 2020-07-28 07:14:19 UTC
oVirt gerrit 91201 0 ovirt-engine-4.2 MERGED core: remove async vm running command only on successful destroy 2020-07-28 07:14:19 UTC
oVirt gerrit 91202 0 ovirt-engine-4.2 MERGED core: omit misleading log during vm migration 2020-07-28 07:14:19 UTC
oVirt gerrit 91203 0 ovirt-engine-4.2 MERGED core: fix power-off migrating vm 2020-07-28 07:14:19 UTC
oVirt gerrit 91204 0 ovirt-engine-4.2 MERGED core: ignore noVM when monitoring destroys migrating vm 2020-07-28 07:14:18 UTC

Description Polina 2018-03-20 20:45:03 UTC
Created attachment 1410813 [details]
logs and screenshot

Description of problem:VM remains migrating forever with no Host (actually doesn't exist) after StopVmCommand fails to DestroyVDS


Version-Release number of selected component (if applicable):rhv-release-4.2.2-6-001.noarch


How reproducible:30%


Steps to Reproduce:
1. Create affinity group under given cluster with 
{'name': 'affinity_enforcement_01', 'cluster_name': 'golden_env_mixed_1', 
'vms_rule': {'enabled': False}, 
'hosts_rule': {'positive': True, 'enforcing': True}, 
'hosts': ['host_mixed_1'], 
'vms': ['golden_env_mixed_virtio_1_0']}
2. RunOnce VM on host host_mixed_2
3. Wait for balancing to migrate the VM to the host_mixed_1.
4. Stop VM

Actual result: sometimes this scenario  ends with failure:
               'org.ovirt.engine.core.bll.StopVmCommand' failed: EngineException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to DestroyVDS, error = Virtual machine does not exist: {'vmId': '8114111b-dc24-4321-ab50-97f9b6fa1f6b'}, code = 1 (Failed with error noVM and code 1)
               ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-24) [vms_syncAction_0af46fa7-91b9-450c] EVENT_ID: USER_FAILED_STOP_VM(56), Failed to power off VM golden_env_mixed_virtio_1_0 (Host: host_mixed_2, User: admin@internal-authz).   
               StopVmCommand] (default task-1) [vms_syncAction_2d443c2e-323e-4f67] Strange, according to the status 'MigratingTo' virtual machine '8114111b-dc24-4321-ab50-97f9b6fa1f6b' should be running in a host but it isn't.

Expected: The VM is stopped with no errors
logs and screenshot from UI are attached.

Additional info: logs attached

Comment 1 Michal Skrivanek 2018-03-21 06:28:06 UTC
This should be reproducible with any migration

Comment 2 Arik 2018-05-08 12:51:09 UTC
Increase the severity as the user has no way of recovering from this state (VM in MigratingTo state while run_on_vds=NULL) without manually changing the database.

Comment 3 Polina 2018-05-31 15:12:07 UTC
The bug is verified in version  rhvm-4.2.4-0.1.el7.noarch
The Steps according to the description and also automation test (TestEnforcementUnderHostAffinity01/02 classes in art/tests/rhevmtests/compute/sla/scheduler_tests/affinity_host_to_vm/affinity_host_to_vm_test.py)

Comment 4 Polina 2019-01-01 21:32:41 UTC
Verified on ovirt-engine-4.3.0-0.4.master.20181231193012.git1f27a84.el7.noarch , vdsm-4.30.4-84.gita708fe4.el7.x86_64, libvirt-4.5.0-10.el7_6.3.x86_64.

According to the description and also automation test run art/tests/rhevmtests/compute/sla/scheduler_tests/affinity_host_to_vm/affinity_host_to_vm_test.py.

Comment 5 Sandro Bonazzola 2019-02-13 07:46:35 UTC
This bugzilla is included in oVirt 4.3.0 release, published on February 4th 2019.

Since the problem described in this bug report should be
resolved in oVirt 4.3.0 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.