Bug 1519289 - If migration of HE VM failed because of timeout, source host will have hanged state "EngineMigratingAway"
Summary: If migration of HE VM failed because of timeout, source host will have hanged...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-hosted-engine-ha
Classification: oVirt
Component: Agent
Version: 2.2.0
Hardware: x86_64
OS: Linux
high
medium
Target Milestone: ovirt-4.2.1
: ---
Assignee: Andrej Krejcir
QA Contact: Nikolai Sednev
URL:
Whiteboard:
Depends On:
Blocks: 1458711
TreeView+ depends on / blocked
 
Reported: 2017-11-30 14:38 UTC by Artyom
Modified: 2021-09-09 12:55 UTC (History)
5 users (show)

Fixed In Version: ovirt-hosted-engine-ha-2.2.4
Clone Of:
Environment:
Last Closed: 2018-02-22 10:01:08 UTC
oVirt Team: SLA
Embargoed:
rule-engine: ovirt-4.2+


Attachments (Terms of Use)
logs from engine and from source host (1.86 MB, application/zip)
2017-11-30 14:38 UTC, Artyom
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHV-43466 0 None None None 2021-09-09 12:55:22 UTC
oVirt gerrit 86166 0 master MERGED agent: Detect canceled migration 2020-10-08 16:50:20 UTC

Description Artyom 2017-11-30 14:38:41 UTC
Created attachment 1360964 [details]
logs from engine and from source host

Description of problem:
If migration of HE VM failed because of a timeout, source host will have hanged state "EngineMigratingAway".
Example of timeout traceback under vdsm.log
2017-11-30 16:12:19,020+0200 ERROR (migsrc/30e333df) [virt.vm] (vmId='30e333df-79e9-4749-af8a-37e3c68ddce5') Failed to migrate (migration:455)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/virt/migration.py", line 437, in _regular_run
    self._startUnderlyingMigration(time.time())
  File "/usr/lib/python2.7/site-packages/vdsm/virt/migration.py", line 510, in _startUnderlyingMigration
    self._perform_with_downtime_thread(duri, muri)
  File "/usr/lib/python2.7/site-packages/vdsm/virt/migration.py", line 579, in _perform_with_downtime_thread
    self._perform_migration(duri, muri)
  File "/usr/lib/python2.7/site-packages/vdsm/virt/migration.py", line 528, in _perform_migration
    self._migration_flags)
  File "/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line 98, in f
    ret = attr(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line 125, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 586, in wrapper
    return func(inst, *args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1679, in migrateToURI3
    if ret == -1: raise libvirtError ('virDomainMigrateToURI3() failed', dom=self)
libvirtError: operation aborted: migration job: canceled by client

Version-Release number of selected component (if applicable):
ovirt-hosted-engine-ha-2.2.0-0.2.master.gitcbe3c76.el7ev.noarch

How reproducible:
Always

Steps to Reproduce:
1. Configure HE environment with at least two hosts
2. Put host with HE VM to maintenance
3. 

Actual results:
In case if migration will take a lot of time, VDSM will cancel it, but the host still will have HE state "EngineMigratingAway".

Expected results:
I believe in case if VDSM cancels HE VM migration, host HE state must be changed to EngineUp.

Additional info:
The bug exists both in 4.1 and 4.2, but it more critical for 4.1.
Under 4.2 you can enable global maintenance and it will reset host state, but in 4.1 it does not work(restart of HE VM helped me in this case)

Comment 1 Nikolai Sednev 2018-02-15 14:46:53 UTC
I've migrated back and forth at least 10 times between pair of ha-hosts.
I had not seen this bug reproduced, hence moving to verified.
Works for me on these components on host:
rhvm-appliance-4.2-20180202.0.el7.noarch
ovirt-hosted-engine-ha-2.2.4-1.el7ev.noarch
ovirt-hosted-engine-setup-2.2.9-1.el7ev.noarch
Linux 3.10.0-693.el7.x86_64 #1 SMP Thu Jul 6 19:56:57 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux

On engine:
ovirt-engine-setup-4.2.1.5-0.1.el7.noarch
Linux 3.10.0-693.19.1.el7.x86_64 #1 SMP Thu Feb 1 12:34:44 EST 2018 x86_64 x86_64 x86_64 GNU/Linux

Comment 2 Sandro Bonazzola 2018-02-22 10:01:08 UTC
This bugzilla is included in oVirt 4.2.1 release, published on Feb 12th 2018.

Since the problem described in this bug report should be
resolved in oVirt 4.2.1 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.