Bug 1640467 - [RHHI] Hosted Engine migration fails in gluster storage domain
Summary: [RHHI] Hosted Engine migration fails in gluster storage domain
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: rhhi
Version: rhhi-1.1
Hardware: x86_64
OS: Linux
high
urgent
Target Milestone: ---
: RHHI-V 1.5.z Async
Assignee: Sahina Bose
QA Contact: SATHEESARAN
URL:
Whiteboard:
Depends On: 1641798
Blocks: 1548985 RHHIV-1.5.z-Backlog-BZs
TreeView+ depends on / blocked
 
Reported: 2018-10-18 07:12 UTC by bipin
Modified: 2019-05-20 04:55 UTC (History)
8 users (show)

Fixed In Version: RHEL 7.6 batch update1
Doc Type: Known Issue
Doc Text:
Cause: A change in libvirt in ensuring shared storage is used before migration does not take into account symlinks in the path Consequence: Hosted Engine cannot be migrated Workaround (if any): HE VM restarts due to ha-agent monitoring, it can also be stopped and started using the hosted-engine CLI tool to workaround this. # hosted-engine --vm-shutdown - to stop HE VM # hosted-engine --vm-start - executed from host where the VM should be started Result: Hosted engine is restarted on a different host
Clone Of: 1640465
Environment:
Last Closed: 2019-05-20 04:55:40 UTC
Embargoed:


Attachments (Terms of Use)

Description bipin 2018-10-18 07:12:11 UTC
+++ This bug was initially created as a clone of Bug #1640465 +++

Description of problem:
======================
While migrating the Hosted Engine to other hosts available in the cluster,the migration fails.

Version-Release number of selected component (if applicable):
============================================================
redhat-release-virtualization-host-4.2-7.3
libvirt-4.5.0-10.el7_6.2.x86_64
ovirt-hosted-engine-ha-2.2.18-1.el7ev.noarch

How reproducible:
================
100%

Steps to Reproduce:
==================
1.Deploy RHHI( Hosted Engine setup with 3 hosts)
2.The hosts are capable of hosting the HE
3.Migrate the HE to any of the hosts within the cluster

Actual results:
==============
The migration fails to other hosts

Expected results:
================
The migration should be successful


Additional info:
===============

This issue looks similar to 1632711. 
Also to note, the application vm's are able to migrate but the Hosted Engine vm fails to migrate. 

PS: The libvirt changes are applied

--- Additional comment from Red Hat Bugzilla Rules Engine on 2018-10-18 03:02:53 EDT ---

This request has been proposed as a blocker, but a release flag has not been requested. Please set a release flag to ? to ensure we may track this bug against the appropriate upcoming release, and reset the blocker flag to ?.

--- Additional comment from bipin on 2018-10-18 03:03:40 EDT ---

engine log:
==========
2018-10-18 11:46:09,417+05 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateStatusVDSCommand] (EE-ManagedThreadFactory-engine-Thread-706) [] START, MigrateStatusVDSCommand(HostName = rhsqa-grafton7-nic2.lab.eng.blr.redhat.com, MigrateStatusVDSCommandParameters:{hostId='ba522bcc-96e2-46e7-9bcb-db6305f7c82c', vmId='3338d78c-c2a0-4f16-a79b-d2fdff5ae2d6'}), log id: 7958da7b
2018-10-18 11:46:09,422+05 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateStatusVDSCommand] (EE-ManagedThreadFactory-engine-Thread-706) [] FINISH, MigrateStatusVDSCommand, log id: 7958da7b
2018-10-18 11:46:09,433+05 WARN  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-706) [] EVENT_ID: VM_MIGRATION_TRYING_RERUN(128), Failed to migrate VM HostedEngine to Host rhsqa-grafton8-nic2.lab.eng.blr.redhat.com . Trying to migrate to another Host.
2018-10-18 11:46:09,512+05 WARN  [org.ovirt.engine.core.bll.MigrateVmCommand] (EE-ManagedThreadFactory-engine-Thread-706) [] Validation of action 'MigrateVm' failed for user admin@internal-authz. Reasons: VAR__ACTION__MIGRATE,VAR__TYPE__VM,VAR__ACTION__MIGRATE,VAR__TYPE__VM,VAR__ACTION__MIGRATE,VAR__TYPE__VM,VAR__ACTION__MIGRATE,VAR__TYPE__VM,SCHEDULING_NO_HOSTS
2018-10-18 11:46:09,534+05 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-706) [] EVENT_ID: VM_MIGRATION_FAILED(65), Migration failed  (VM: Ho
stedEngine, Source: rhsqa-grafton7-nic2.lab.eng.blr.redhat.com).




vdsm log:
========
2018-10-18 11:46:09,031+0530 INFO  (migsrc/3338d78c) [virt.vm] (vmId='3338d78c-c2a0-4f16-a79b-d2fdff5ae2d6') starting migration to qemu+tls://rhsqa-grafton8-nic2.lab.eng.blr.redhat.com/system with miguri tcp://10.70.36.242 (migration:502)
2018-10-18 11:46:09,036+0530 ERROR (migsrc/3338d78c) [virt.vm] (vmId='3338d78c-c2a0-4f16-a79b-d2fdff5ae2d6') Unsafe migration: Migration without shared storage is unsafe (migration:290)
2018-10-18 11:46:09,256+0530 INFO  (jsonrpc/6) [jsonrpc.JsonRpcServer] RPC call Host.ping2 succeeded in 0.00 seconds (__init__:573)
2018-10-18 11:46:09,259+0530 INFO  (jsonrpc/3) [vdsm.api] START repoStats(domains=[u'e8cf4b06-2e59-4828-8a5d-243d6574896f']) from=::1,46260, task_id=7d309403-34e3-4254-b064-993bc54d1ef2 (api:46)
2018-10-18 11:46:09,259+0530 INFO  (jsonrpc/3) [vdsm.api] FINISH repoStats return={u'e8cf4b06-2e59-4828-8a5d-243d6574896f': {'code': 0, 'actual': True, 'version': 4, 'acquired': True, 'delay': '0.000750338', 'lastCheck': '2.1', 'valid': True}} from=::1,46260, task_id=7d309403-34e3-4254-b064-993bc54d1ef2 (api:52)
2018-10-18 11:46:09,259+0530 INFO  (jsonrpc/3) [jsonrpc.JsonRpcServer] RPC call Host.getStorageRepoStats succeeded in 0.00 seconds (__init__:573)
2018-10-18 11:46:09,366+0530 ERROR (migsrc/3338d78c) [virt.vm] (vmId='3338d78c-c2a0-4f16-a79b-d2fdff5ae2d6') Failed to migrate (migration:455)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/virt/migration.py", line 437, in _regular_run
    self._startUnderlyingMigration(time.time())
  File "/usr/lib/python2.7/site-packages/vdsm/virt/migration.py", line 509, in _startUnderlyingMigration
    self._perform_with_conv_schedule(duri, muri)
  File "/usr/lib/python2.7/site-packages/vdsm/virt/migration.py", line 587, in _perform_with_conv_schedule
    self._perform_migration(duri, muri)
  File "/usr/lib/python2.7/site-packages/vdsm/virt/migration.py", line 529, in _perform_migration
    self._migration_flags)
  File "/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line 98, in f
    ret = attr(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/common/libvirtconnection.py", line 130, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/common/function.py", line 92, in wrapper
    return func(inst, *args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1779, in migrateToURI3
    if ret == -1: raise libvirtError ('virDomainMigrateToURI3() failed', dom=self)
libvirtError: Unsafe migration: Migration without shared storage is unsafe

--- Additional comment from bipin on 2018-10-18 03:10:58 EDT ---

Logs @ http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1640465/

Comment 6 SATHEESARAN 2018-10-24 08:44:05 UTC
The dependent RHEL 7.6 bug is on POST state, moving this bug too to the correct state

Comment 7 SATHEESARAN 2018-10-25 05:28:43 UTC
Removing the qa_ack from this bug, as this bug will not be fixed for RHHI 2.0 ( RHHI 1.5 now )

Comment 8 SATHEESARAN 2018-10-25 05:29:28 UTC
resetting the pm_ack & blocker for the same reason as comment7

Comment 10 SATHEESARAN 2018-11-27 06:29:43 UTC
The bug was claimed to be fixed with RHEL 7.6 batch update1. But when tested with RHVH based on RHEL 7.6 batch update1, the issue looks different now and a new libvirt bug raised for the same

Comment 11 SATHEESARAN 2019-01-09 09:04:16 UTC
(In reply to SATHEESARAN from comment #10)
> The bug was claimed to be fixed with RHEL 7.6 batch update1. But when tested
> with RHVH based on RHEL 7.6 batch update1, the issue looks different now and
> a new libvirt bug raised for the same

The issue is no longer observed. One possible reason is because of the setup issue.
Again not sure, but with RHEL 7.6 update1, this issue is no longer seen.

This bug should be targetted for RHHI-V 1.6

Comment 14 SATHEESARAN 2019-01-09 09:57:50 UTC
This relevant RHEL 7.6 bug - https://bugzilla.redhat.com/show_bug.cgi?id=1641798 - is already verified.
Moving this bug to ON_QA

Comment 15 SATHEESARAN 2019-01-09 09:59:06 UTC
Tested with RHVH-4.2.8 & RHGS 3.4.3 ( glusterfs-3.12.2-35.el7rhgs )

1. Create a RHHI-V setup
2. Migrate the HE VM from one host to the another host.

HE VM migrated successfully from one host to the other


Note You need to log in before you can comment on or make changes to this bug.