1349745 – EL6 to EL7 cluster upgrade is not possible with hosted engine in the same cluster

Bug 1349745 - EL6 to EL7 cluster upgrade is not possible with hosted engine in the same cluster

Summary: EL6 to EL7 cluster upgrade is not possible with hosted engine in the same clu...

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat Enterprise Virtualization Manager
Classification:	Red Hat
Component:	ovirt-engine
Sub Component:
Version:	3.6.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	ovirt-3.6.9
Target Release:	---
Assignee:	Nobody
QA Contact:	meital avital
Docs Contact:
URL:
Whiteboard:	hosted-engine
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2016-06-24 07:08 UTC by Roman Mohr
Modified:	2020-07-09 17:12 UTC (History)
CC List:	13 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2016-11-20 08:52:24 UTC
oVirt Team:	Docs
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Roman Mohr 2016-06-24 07:08:50 UTC

Description of problem:

When trying to upgrade hosts in a cluster from el6 to el7 with the InClusterUpgrade policy and the hosted engine is in that cluster you can't activate the policy. 

A requirement for the policy to be activated was to only allow the activation when all VMs are migratable. The requirement of the hosted engine is that it is only managed by the host agents (which means it is not migratable). This was not an issue for as long as the hosted engine was not a managed VM.

This is a possible blocker for the el6 to el7 upgrade flow.

Version-Release number of selected component (if applicable):


How reproducible:
always

Steps to Reproduce:
1. Create a cluster with el6 hosts and hosted engine
2. Try to activate the InClusterUpgrade policy to start the upgrade flow
3. It fails because the hosted engine VM is not migratable

Actual results:
Cluster upgrade from el6 to el7 can't be started

Expected results:
It should be allowed. Maybe check if the not migratable VM is the hosted engine and go on if all other VMs are migratable.


Additional info:

A user on the mailinglist reported this: http://lists.ovirt.org/pipermail/users/2016-June/040582.html
I suggested the following workaround:

> You can create a temporary cluster, move one host and the hosted
> engine VM there, upgrade all hosts and then start the hosted-engine VM
> in the original cluster again.

> The detailed steps are:

> 1) Enter the global maintenance mode
> 2) Create a temporary cluster
> 3) Put one of the hosted engine hosts which does not currently host
> the engine into maintenance
> 4) Move this host to the temporary cluster
> 5) Stop the hosted-engine-vm with `hosted-engine --destroy-vm` (it
> should not come up again since you are in maintenance mode)
> 6) Start the hosted-egine-vm with `hosted-engine --start-vm` on the
> host in the temporary cluster
> 7) Now you can enable the InClusterUpgrade policy on your main cluster
> 7) Proceed with your main cluster like described in
> https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Virtualization/3.6/html/Self-Hosted_Engine_Guide/Upgrading_the_Self-Hosted_Engine_from_6_to_7.html
> 8) When all hosts are upgraded and InClusterUpgrade policy is disabled
> again, move the hosted-engine-vm back to the original cluster
> 9) Upgrade the last host
> 10) Migrate the last host back
> 11) Delete the temporary cluster
> 12) Deactivate maintenance mode

I see one possible problem with that: What happens to the engine when the imported hosted engine VM suddenly appears on another cluster? Will the cluster id in the db for the hosted engine VM be changed? Do we cover that? This can always happen when someone decides to move a hosted engine host to another cluster.

Comment 1 Roman Mohr 2016-06-27 06:11:14 UTC

Doron reminded me that Hosted Engine was explicitly out of scope for the el6 to el7 cluster upgrade policy. The documentation needs to be fixed.

@Roy @Doron still I think we should fix the problem that starting the hosted engine on a different cluster than where it was imported should update the hosted engine VM. It currently still thinks that it is running in the old cluster. Opinions?

Comment 2 Yaniv Lavi 2016-06-28 09:07:32 UTC

We need a process for this use case, please provide one to document.

Comment 3 Roman Mohr 2016-07-11 06:36:59 UTC

(In reply to Yaniv Dary from comment #2)
> We need a process for this use case, please provide one to document.

There is a small bug hidden: When the HE-VM is started-on/migrated-to a host which is not part of the cluster where the VM was initially imported, the VM still thinks it is part of the old cluster.

This has consequences for both upgrade processes:

1) InClusterUpgrade (critical for that one, but not officially part of the design scope):
 - No matter if you move the HE-VM to another cluster by migration or restart, you can't start the InClusterUpgrade because the cluster still thinks it has a not automatically migratable VM (the HE-VM) in the cluster.

Only solution is to change the DB-Entry.

2) Normal upgrade (Move all hosts to a new cluster)

After all hosts and VMs are on the new cluster, the original cluster still thinks that one VM is running in it (the HE-VM) and therefore you can't delete the old cluster after all hosts are migrated. That is probably not a blocker but can also only be solved with a DB-update.

So we should do the following:
 1) Make sure that the VM and the Cluster always know in which cluster the HE-VM is
 2) Fix the documentation. It should describe the second scenario for the normal upgrade (instead of the InClusterUpgrade) as the default scenario.

@Yaniv should we create a second bug for one of those?

Comment 4 Roy Golan 2016-07-13 12:48:07 UTC

Bug 1351533 will solve the problem of making the HE VM migrating out-of-cluster and the cluster be updated properly under for the HE VM. Then the host would be able to go to maintenance, removed and the cluster would be able to be removed.

Comment 7 Lucy Bopf 2016-11-11 05:05:57 UTC

Clearing doc text and flag, as this issue appears to have been fixed by bug 1351533.

Comment 11 Doron Fediuck 2016-11-20 08:52:24 UTC

The code worked as expected detecting that the at least hosted-engine host was still on el6 so it proposed to abort upgrading the HE hosts first.

Note You need to log in before you can comment on or make changes to this bug.