Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1098167

Summary:

HE VM migration via WEBUI not properly handled by HA agent in the hosts

Product:

Red Hat Enterprise Virtualization Manager

Reporter:

Nikolai Sednev <nsednev>

Component:

ovirt-hosted-engine-ha

Assignee:

Jiri Moskovcak <jmoskovc>

Status:

CLOSED WORKSFORME

QA Contact:

Nikolai Sednev <nsednev>

Severity:

high

Docs Contact:

Priority:

unspecified

Version:

3.4.0

CC:

dfediuck, iheim, mavital, mkalinin, nsednev, pablo.iranzo, sherold

Target Milestone:

---

Keywords:

Triaged

Target Release:

3.5.0

Hardware:

x86_64

OS:

Linux

Whiteboard:

sla

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2014-11-17 17:01:58 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

SLA

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
HE engine.log&picture of expected warning to be received	none

Description Nikolai Sednev 2014-05-15 12:33:42 UTC

Created attachment 895885 [details]
HE engine.log&picture of expected warning to be received

Description of problem:
HEVM migration via WEBUI shouldn't be supported and appropriate warning have to be popped-up.

HEVM's migration should always be initiated by HA only!

Version-Release number of selected component (if applicable):
Hosts components:
libvirt-0.10.2-29.el6_5.7.x86_64
sanlock-2.8-1.el6.x86_64
qemu-kvm-rhev-0.12.1.2-2.415.el6_5.9.x86_64
ovirt-hosted-engine-ha-1.1.2-3.el6ev.noarch
vdsm-4.14.7-2.el6ev.x86_64

HE components:
rhevm-3.4.0-0.20.el6ev.noarch
ovirt-host-deploy-1.2.1-1.el6ev.noarch


How reproducible:
100%

Steps to Reproduce:
1.Assemble HE setup with two hosts running RHEL6.5.
2.Via WEBUI of the engine initiate HEVM's migration via Virtual Machines->HE->Migrate.
3.Receive error from engine: -"VM HostedEngine is down. Exit message: internal error Failed to acquire lock: error -243."

Actual results:
HEVM migrated via engine's WEBUI and error message is received, although VM is up and running, but HA thinks that it's dead, as moved not by it and doesn't exists any-more at host from which was migrated. 

Expected results:
No error should appear within engine's log and migration should be prohibited with appropriate message, like is shown within the attached picture.

Additional info:
See the message within attached log:
2014-05-15 14:08:54,724 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-74) Correlation ID: nul
l, Call Stack: null, Custom Event ID: -1, Message: VM HostedEngine is down. Exit message: internal error Failed to acquire lock: error -243.

Comment 1 Itamar Heim 2014-05-15 14:51:14 UTC

doron - please remind me - what is preventing live migration by via engine?

Comment 2 Doron Fediuck 2014-05-15 16:43:51 UTC

VM migration for hosted engine may impact the whole setup, so it's a sensitive process which needs to be initiated by the HA agents. The main motivations for
a migration are improving placement in case a host state become degregated and maintenance. In other cases such as load balancing we prefer all other VMs to
move around and the engine VM to keep running on the same host.

Currently the HA agents are capable of migrating the VMs in case of a failure.
We left a manual option to migrate the VM from the UI for maintenance propose.
In light of this issue we should make sure this is being handled properly by
the HA agent.

I'd like to clarify that we should keep the manual migration option available
so the issue here is not about allowing migration from UI, but making sure
it is handled properly across the stack.

Comment 3 Nikolai Sednev 2014-05-18 08:04:46 UTC

(In reply to Doron Fediuck from comment #2)
> VM migration for hosted engine may impact the whole setup, so it's a
> sensitive process which needs to be initiated by the HA agents. The main
> motivations for
> a migration are improving placement in case a host state become degregated
> and maintenance. In other cases such as load balancing we prefer all other
> VMs to
> move around and the engine VM to keep running on the same host.
> 
> Currently the HA agents are capable of migrating the VMs in case of a
> failure.
> We left a manual option to migrate the VM from the UI for maintenance
> propose.
> In light of this issue we should make sure this is being handled properly by
> the HA agent.
> 
> I'd like to clarify that we should keep the manual migration option available
> so the issue here is not about allowing migration from UI, but making sure
> it is handled properly across the stack.

1.So migration via UI should be allowed if and only if one of the hosts under maintenance.
2.Agents didn't handled migration via UI properly, that's what I found during this bug opening.

Comment 4 Doron Fediuck 2014-05-20 11:46:59 UTC

(In reply to Nikolai Sednev from comment #3)
> (In reply to Doron Fediuck from comment #2)
 
> 1.So migration via UI should be allowed if and only if one of the hosts
> under maintenance.
> 2.Agents didn't handled migration via UI properly, that's what I found
> during this bug opening.

As I explained, the issue here is not about allowing migration from UI, but making sure it is handled properly across the stack.
Such a migration can be initiated by the user regardless of the host status, as in some cases we may prefer to migrate this VM first and let all other VMs follow.

Comment 6 Marina Kalinin 2014-08-18 16:33:14 UTC

And one more - depends on the solution accepted here, if we do not want to allow live-migration of HEVM as is, maybe we can add additional check for this, before initiating migration, if the hosted-engine is not in global maintenance, and notify the user with some message?

Comment 7 Nikolai Sednev 2014-08-19 10:39:53 UTC

Currently HE live migration of the HE works on 3.5 rc1 ( ovirt-engine-setup-3.5.0-0.0.master.20140804172041.git23b558e.el6.noarch ) using Migrate via WEBUI, while within the HE host cluster.

Only if you have 2 hosts for HE and adding additional hosts that are not part of HA (adding these hosts via WEBUI of the engine), then these hosts may be chosen to try to migrate the HE to them, but migration fails with error as described bellow:
"Migration failed, No available host found (VM: HostedEngine, Source: rose05.qa.lab.tlv.redhat.com)."

IMHO migration of the HE via WEBUI have to inspect the relevant hosts first and only then list checked and relevant hosts with HA running on them as hosts available for HE migration.

Comment 8 Marina Kalinin 2014-08-19 19:12:51 UTC

Nikolai, thank you for your explanation.
So, following up this case, I created 2 articles:
"How to perform live migration of Hosted Engine VM":
https://access.redhat.com/solutions/1168373

And:
"Receive error VM HostedEngine is down":
https://access.redhat.com/solutions/1168103

Please take a look and ack or comment.

Comment 9 Nikolai Sednev 2014-08-20 13:28:39 UTC

(In reply to Marina from comment #8)
> Nikolai, thank you for your explanation.
> So, following up this case, I created 2 articles:
> "How to perform live migration of Hosted Engine VM":
> https://access.redhat.com/solutions/1168373
> 
> And:
> "Receive error VM HostedEngine is down":
> https://access.redhat.com/solutions/1168103
> 
> Please take a look and ack or comment.

I must apologize, but I don't see anything within both links, nothing is inside, that is because I need some kind of subscription, right?

Comment 10 Jiri Moskovcak 2014-08-26 08:02:10 UTC

So to get this straight, the migration finishes fine, the vm with engine is up'n'running, but the problem is that it issues the error message about acquiring the log?

Comment 11 Nikolai Sednev 2014-08-26 11:36:37 UTC

(In reply to Jiri Moskovcak from comment #10)
> So to get this straight, the migration finishes fine, the vm with engine is
> up'n'running, but the problem is that it issues the error message about
> acquiring the log?

Error and logic of HA have to be changed, please review the actual results at the top of the bug's description.

Comment 12 Jiri Moskovcak 2014-08-27 07:30:34 UTC

(In reply to Nikolai Sednev from comment #11)
> (In reply to Jiri Moskovcak from comment #10)
> > So to get this straight, the migration finishes fine, the vm with engine is
> > up'n'running, but the problem is that it issues the error message about
> > acquiring the log?
> 
> Error and logic of HA have to be changed, please review the actual results
> at the top of the bug's description.

ok, so let me rephrase it, I read the bug description and I need to clear some things. After the migration finishes does the engine continue to work? And after sometime (approx 10 mins) did the agent notice that the engine is up?

Comment 13 Nikolai Sednev 2014-08-27 12:27:28 UTC

Engine continues to work as expected, but HA thinks that HE is dead, as it can't track it any more as its migrated from a host.

No, host's HA not becomes aware of HE had been migrated after severe period of time.

Comment 14 Jiri Moskovcak 2014-09-11 12:31:09 UTC

I can't reproduce this with the following versions:

ovirt-hosted-engine-ha-1.1.5-1.el6ev.noarch
libvirt-0.10.2-29.el6_5.12.x86_64
vdsm-4.14.13-2.el6ev.x86_64
rhevm-3.4.2-1.1.el6ev.noarch

the migration just works fine, even the agents properly detect that the migration started and finished.

Comment 15 Doron Fediuck 2014-11-17 17:01:58 UTC

If relevant re-open and provide a reproducer based on recent versions (beta5 or later).

Comment 16 Nikolai Sednev 2014-11-18 12:31:10 UTC

I verified this one on  Red Hat Enterprise Virtualization Manager Version: 3.5.0-0.20.el6ev  and it works for me via WEBUI OK.
qemu-kvm-rhev-0.12.1.2-2.448.el6.x86_64
libvirt-0.10.2-46.el6_6.1.x86_64
vdsm-4.16.7.4-1.el6ev.x86_64
sanlock-2.8-1.el6.x86_64