Bug 1098167
| Summary: | HE VM migration via WEBUI not properly handled by HA agent in the hosts | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Nikolai Sednev <nsednev> | ||||
| Component: | ovirt-hosted-engine-ha | Assignee: | Jiri Moskovcak <jmoskovc> | ||||
| Status: | CLOSED WORKSFORME | QA Contact: | Nikolai Sednev <nsednev> | ||||
| Severity: | high | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 3.4.0 | CC: | dfediuck, iheim, mavital, mkalinin, nsednev, pablo.iranzo, sherold | ||||
| Target Milestone: | --- | Keywords: | Triaged | ||||
| Target Release: | 3.5.0 | ||||||
| Hardware: | x86_64 | ||||||
| OS: | Linux | ||||||
| Whiteboard: | sla | ||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2014-11-17 17:01:58 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | SLA | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
doron - please remind me - what is preventing live migration by via engine? VM migration for hosted engine may impact the whole setup, so it's a sensitive process which needs to be initiated by the HA agents. The main motivations for a migration are improving placement in case a host state become degregated and maintenance. In other cases such as load balancing we prefer all other VMs to move around and the engine VM to keep running on the same host. Currently the HA agents are capable of migrating the VMs in case of a failure. We left a manual option to migrate the VM from the UI for maintenance propose. In light of this issue we should make sure this is being handled properly by the HA agent. I'd like to clarify that we should keep the manual migration option available so the issue here is not about allowing migration from UI, but making sure it is handled properly across the stack. (In reply to Doron Fediuck from comment #2) > VM migration for hosted engine may impact the whole setup, so it's a > sensitive process which needs to be initiated by the HA agents. The main > motivations for > a migration are improving placement in case a host state become degregated > and maintenance. In other cases such as load balancing we prefer all other > VMs to > move around and the engine VM to keep running on the same host. > > Currently the HA agents are capable of migrating the VMs in case of a > failure. > We left a manual option to migrate the VM from the UI for maintenance > propose. > In light of this issue we should make sure this is being handled properly by > the HA agent. > > I'd like to clarify that we should keep the manual migration option available > so the issue here is not about allowing migration from UI, but making sure > it is handled properly across the stack. 1.So migration via UI should be allowed if and only if one of the hosts under maintenance. 2.Agents didn't handled migration via UI properly, that's what I found during this bug opening. (In reply to Nikolai Sednev from comment #3) > (In reply to Doron Fediuck from comment #2) > 1.So migration via UI should be allowed if and only if one of the hosts > under maintenance. > 2.Agents didn't handled migration via UI properly, that's what I found > during this bug opening. As I explained, the issue here is not about allowing migration from UI, but making sure it is handled properly across the stack. Such a migration can be initiated by the user regardless of the host status, as in some cases we may prefer to migrate this VM first and let all other VMs follow. And one more - depends on the solution accepted here, if we do not want to allow live-migration of HEVM as is, maybe we can add additional check for this, before initiating migration, if the hosted-engine is not in global maintenance, and notify the user with some message? Currently HE live migration of the HE works on 3.5 rc1 ( ovirt-engine-setup-3.5.0-0.0.master.20140804172041.git23b558e.el6.noarch ) using Migrate via WEBUI, while within the HE host cluster. Only if you have 2 hosts for HE and adding additional hosts that are not part of HA (adding these hosts via WEBUI of the engine), then these hosts may be chosen to try to migrate the HE to them, but migration fails with error as described bellow: "Migration failed, No available host found (VM: HostedEngine, Source: rose05.qa.lab.tlv.redhat.com)." IMHO migration of the HE via WEBUI have to inspect the relevant hosts first and only then list checked and relevant hosts with HA running on them as hosts available for HE migration. Nikolai, thank you for your explanation. So, following up this case, I created 2 articles: "How to perform live migration of Hosted Engine VM": https://access.redhat.com/solutions/1168373 And: "Receive error VM HostedEngine is down": https://access.redhat.com/solutions/1168103 Please take a look and ack or comment. (In reply to Marina from comment #8) > Nikolai, thank you for your explanation. > So, following up this case, I created 2 articles: > "How to perform live migration of Hosted Engine VM": > https://access.redhat.com/solutions/1168373 > > And: > "Receive error VM HostedEngine is down": > https://access.redhat.com/solutions/1168103 > > Please take a look and ack or comment. I must apologize, but I don't see anything within both links, nothing is inside, that is because I need some kind of subscription, right? So to get this straight, the migration finishes fine, the vm with engine is up'n'running, but the problem is that it issues the error message about acquiring the log? (In reply to Jiri Moskovcak from comment #10) > So to get this straight, the migration finishes fine, the vm with engine is > up'n'running, but the problem is that it issues the error message about > acquiring the log? Error and logic of HA have to be changed, please review the actual results at the top of the bug's description. (In reply to Nikolai Sednev from comment #11) > (In reply to Jiri Moskovcak from comment #10) > > So to get this straight, the migration finishes fine, the vm with engine is > > up'n'running, but the problem is that it issues the error message about > > acquiring the log? > > Error and logic of HA have to be changed, please review the actual results > at the top of the bug's description. ok, so let me rephrase it, I read the bug description and I need to clear some things. After the migration finishes does the engine continue to work? And after sometime (approx 10 mins) did the agent notice that the engine is up? Engine continues to work as expected, but HA thinks that HE is dead, as it can't track it any more as its migrated from a host. No, host's HA not becomes aware of HE had been migrated after severe period of time. I can't reproduce this with the following versions: ovirt-hosted-engine-ha-1.1.5-1.el6ev.noarch libvirt-0.10.2-29.el6_5.12.x86_64 vdsm-4.14.13-2.el6ev.x86_64 rhevm-3.4.2-1.1.el6ev.noarch the migration just works fine, even the agents properly detect that the migration started and finished. If relevant re-open and provide a reproducer based on recent versions (beta5 or later). I verified this one on Red Hat Enterprise Virtualization Manager Version: 3.5.0-0.20.el6ev and it works for me via WEBUI OK. qemu-kvm-rhev-0.12.1.2-2.448.el6.x86_64 libvirt-0.10.2-46.el6_6.1.x86_64 vdsm-4.16.7.4-1.el6ev.x86_64 sanlock-2.8-1.el6.x86_64 |
Created attachment 895885 [details] HE engine.log&picture of expected warning to be received Description of problem: HEVM migration via WEBUI shouldn't be supported and appropriate warning have to be popped-up. HEVM's migration should always be initiated by HA only! Version-Release number of selected component (if applicable): Hosts components: libvirt-0.10.2-29.el6_5.7.x86_64 sanlock-2.8-1.el6.x86_64 qemu-kvm-rhev-0.12.1.2-2.415.el6_5.9.x86_64 ovirt-hosted-engine-ha-1.1.2-3.el6ev.noarch vdsm-4.14.7-2.el6ev.x86_64 HE components: rhevm-3.4.0-0.20.el6ev.noarch ovirt-host-deploy-1.2.1-1.el6ev.noarch How reproducible: 100% Steps to Reproduce: 1.Assemble HE setup with two hosts running RHEL6.5. 2.Via WEBUI of the engine initiate HEVM's migration via Virtual Machines->HE->Migrate. 3.Receive error from engine: -"VM HostedEngine is down. Exit message: internal error Failed to acquire lock: error -243." Actual results: HEVM migrated via engine's WEBUI and error message is received, although VM is up and running, but HA thinks that it's dead, as moved not by it and doesn't exists any-more at host from which was migrated. Expected results: No error should appear within engine's log and migration should be prohibited with appropriate message, like is shown within the attached picture. Additional info: See the message within attached log: 2014-05-15 14:08:54,724 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-74) Correlation ID: nul l, Call Stack: null, Custom Event ID: -1, Message: VM HostedEngine is down. Exit message: internal error Failed to acquire lock: error -243.