Bug 1364557

Summary:	3.6.8 engine forbids set InClusterUpgrade policy on a 3.5 cluster with HE VM
Product:	Red Hat Enterprise Virtualization Manager	Reporter:	Jiri Belka <jbelka>
Component:	ovirt-engine	Assignee:	Roman Mohr <rmohr>
Status:	CLOSED CURRENTRELEASE	QA Contact:	Artyom <alukiano>
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	3.6.8	CC:	alukiano, bugs, dfediuck, eedri, gklein, lsurette, mkalinin, rbalakri, Rhev-m-bugs, rmohr, srevivo, stirabos, ykaul, ylavi
Target Milestone:	ovirt-3.6.9	Keywords:	TestOnly, Triaged, ZStream
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:	sla
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2016-09-27 12:21:46 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	SLA	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1356127
Bug Blocks:	1370968

Description Jiri Belka 2016-08-05 17:12:39 UTC

Description of problem:

I'm was updating 3.5 SHE EL6 env and while following ovirt-engine/docs/manual/en_US/html/Self-Hosted_Engine_Guide/Upgrading_the_Self-Hosted_Engine_from_6_to_7.html I got stucked at:

~~~
In the Administration Portal, set the InClusterUpgrade scheduling policy on the cluster:
Click the Clusters tab.
Select the cluster and click Edit.
Click the Scheduling Policy tab.
From the Select Policy drop-down list, select InClusterUpgrade.
Click OK.
~~~

Even I clicked on HE VM and put it in Global Maintenance (double checked via hosted-engine --vm-status), I still could not define InClusterUpgrade policy as engine was screaming that HE VM does not allow migration.

(Sorry I did not have DEBUG level on engine.log)

Anyway, I was brave enough and I did:

engine=# UPDATE vm_static SET migration_support = 0 where vm_guid = '<uuid>';

thus I set it from '1' to '0'. After this workaround engine allowed me to define InClusterUpgrade policy in the cluster in Admin Portal.

Version-Release number of selected component (if applicable):
rhevm-backend-3.6.8.1-0.1.el6.noarch

How reproducible:
right now

Steps to Reproduce:
1. have 3.5 SHE env with the engine iself is 3.6
2. engine-config -s CheckMixedRhelVersions=false --cver=3.5
3. service ovirt-engine restart
4. define 'InClusterUpgrade' scheduling policy on the HE VM cluster

Actual results:
cannot be defined as HE VM is not set to be migrated
changing migration option in Edit VM -> Host -> 'Migration options' did not help (IIRC I could not set it as it said this VM (ie. HE VM) is a kind of external VM, or something like this)

Expected results:
should work as docs says

Additional info:

Comment 3 Roman Mohr 2016-08-08 09:34:26 UTC

Hi,


(In reply to Jiri Belka from comment #0)
> Description of problem:
> 
> I'm was updating 3.5 SHE EL6 env and while following
> ovirt-engine/docs/manual/en_US/html/Self-Hosted_Engine_Guide/
> Upgrading_the_Self-Hosted_Engine_from_6_to_7.html I got stucked at:
> 

The documentation is wrong. We should fix this, as I also suggested in Bug 1349745.

Having the HE-VM in a cluster where you do an el6 to el7 upgrade is not supported. This was out of scope to reduce the complexity of the feature. 
You can create an extra cluster for it and move the HE-VM there.

Note that starting from 3.6.7 you can migrate the HE-VM through the UI but because of a bug the old cluster still "thinks" that it is running on it.
At least for 4.0.2 I see a fix on Bug 1351533, I don't know if it is already fixed for latest 3.6 releases.


> ~~~
> In the Administration Portal, set the InClusterUpgrade scheduling policy on
> the cluster:
> Click the Clusters tab.
> Select the cluster and click Edit.
> Click the Scheduling Policy tab.
> From the Select Policy drop-down list, select InClusterUpgrade.
> Click OK.
> ~~~
> 
> Even I clicked on HE VM and put it in Global Maintenance (double checked via
> hosted-engine --vm-status), I still could not define InClusterUpgrade policy
> as engine was screaming that HE VM does not allow migration.
> 
> (Sorry I did not have DEBUG level on engine.log)
> 
> Anyway, I was brave enough and I did:
> 
> engine=# UPDATE vm_static SET migration_support = 0 where vm_guid = '<uuid>';
> 

Jiri, this change you made in the DB should be enough to make things work with HE-VM in the same cluster. All we are trying to make sure with the initial checks is to make a successful and problem-free migration for all hosts as likely as possible. But again we do not support or recommend that flow.

Comment 4 Jiri Belka 2016-08-08 17:10:36 UTC

> Having the HE-VM in a cluster where you do an el6 to el7 upgrade is not
> supported. This was out of scope to reduce the complexity of the feature. 
> You can create an extra cluster for it and move the HE-VM there.

I'm not HE expert but I have doubts you can move HE VM to a cluster which is not SHE cluster and IIUC you cannot add additional host into SHE env while it would be in different cluster. But this should be clarified by HE QE.

Comment 5 Roman Mohr 2016-08-09 06:35:55 UTC

(In reply to Jiri Belka from comment #4)
> > Having the HE-VM in a cluster where you do an el6 to el7 upgrade is not
> > supported. This was out of scope to reduce the complexity of the feature. 
> > You can create an extra cluster for it and move the HE-VM there.
> 
> I'm not HE expert but I have doubts you can move HE VM to a cluster which is
> not SHE cluster and IIUC you cannot add additional host into SHE env while
> it would be in different cluster. But this should be clarified by HE QE.

You can. The storage domain for the hosted engine is mounted directly (in parallel to normal storage domains) by the HE-Agents on the hosts and vdsm, the engine just displays everything. HE-agents do not care in which cluster they are. 

If running on an older engine version which does not come with the managed HE-VM feature (which allows migrations through the UI), you have to go through the following steps:

1) enter global maintenance mode
2) move one empty HE-host to a different cluster
3) stop the HE-VM directly on the host it is running on via the hosted-engine tool
4) ssh into the host on the different cluster
5) start the HE-VM with the hosted-engine tool there

Comment 9 Yaniv Lavi 2016-09-15 12:13:56 UTC

Steps to tests:
1. Deployed 3.6.9 HE on pair of clean 3.6.9 el6 3.5 hosts over NFS.
2. Created new 3.6.9 host cluster with one el6 3.5 host.
3.Added additional hosted-engine-host to new 3.6.9 host cluster via the shell.
4.Migrated HE-VM using WEBUI from old 3.6.9 host cluster to new 3.6.9 host cluster (cross-host-cluster-migration).
5. enable in cluster upgrade policy.
6. upgrade a host to 3.5 el7 RHEV-H in the first cluster.
7. upgrade all hosts to 3.6 el7 in the first cluster.
8. Migrate HE back and upgrade the host in the second cluster to 3.6 el7.

Comment 10 Artyom 2016-09-19 14:28:01 UTC

Verified on rhevm-3.6.9.2-0.1.el6.noarch

1) Deploy HE 3.5 on two hosts(one RHEL6.8 and one RHEV-H 20160707.3.el6ev)
Hosts:
vdsm-yajsonrpc-4.16.38-1.el6ev.noarch
vdsm-4.16.38-1.el6ev.x86_64
vdsm-cli-4.16.38-1.el6ev.noarch
vdsm-jsonrpc-4.16.38-1.el6ev.noarch
vdsm-hook-ethtool-options-4.16.38-1.el6ev.noarch
vdsm-xmlrpc-4.16.38-1.el6ev.noarch
vdsm-python-zombiereaper-4.16.38-1.el6ev.noarch
vdsm-python-4.16.38-1.el6ev.noarch
ovirt-hosted-engine-setup-1.2.6.1-1.el6ev.noarch
ovirt-hosted-engine-ha-1.2.10-1.el6ev.noarch

Engine:
rhevm-3.5.0-0.20.el6ev.noarch
##################################################

2) Upgrade engine to 3.6
 * Enable GlobalMaintenance
 * Upgrade the engine
 * Disable GlobalMaintenance
Engine:
rhevm-3.6.9.2-0.1.el6.noarch
##################################################

3) Enable InClusterUpgrade
 * engine-config -s CheckMixedRhelVersions=false --cver=3.5 && service ovirt-engine restart
 * change cluster scheduler policy to InClusterUpgrade - PASS
##################################################

4) Upgrade RHEL host to RHEL7.2 with 3.5 packages
 * put host to maintenance and remove host from the engine
 * reprovision host to RHEL7.2
 * add 3.5 repositories
 * re-deploy host to HE(choose the same ID that host has before)
##################################################

5) Upgrade RHEV-H to RHEV-H 7.2 with 3.5 packages
 * put host to maintenance and remove host from the engine
 * reprovision host to RHEV-H 7.2
 * re-deploy host to HE(choose the same ID that host has before)
##################################################

6) Upgrade packages to 3.6 on both hosts
 * put host to the maintenance
 * for RHEL host add 3.6 repos and run yum update
 * for RHEV-H install rhev package on the engine and upgrade the host via the engine
Hosts:
vdsm-4.17.35-1.el7ev.noarch
vdsm-python-4.17.35-1.el7ev.noarch
vdsm-jsonrpc-4.17.35-1.el7ev.noarch
vdsm-hook-ethtool-options-4.17.35-1.el7ev.noarch
vdsm-cli-4.17.35-1.el7ev.noarch
vdsm-yajsonrpc-4.17.35-1.el7ev.noarch
vdsm-hook-vmfex-dev-4.17.35-1.el7ev.noarch
vdsm-infra-4.17.35-1.el7ev.noarch
vdsm-xmlrpc-4.17.35-1.el7ev.noarch
ovirt-hosted-engine-ha-1.3.5.8-1.el7ev.noarch
ovirt-hosted-engine-setup-1.3.7.3-1.el7ev.noarch
##################################################

7) Change cluster scheduler policy to none and compatibility version to 3.6
8) Change data center compatibility version to 3.6

Comment 11 Jiri Belka 2016-09-19 17:11:51 UTC

I'm little bit surprise this bug is solved as in another bug it seems mixing EL variants in one cluster is not supported anymore https://bugzilla.redhat.com/show_bug.cgi?id=1349745

Comment 12 Marina Kalinin 2016-09-20 14:38:41 UTC

Assuming we are talking here about 3.5 HE setup upgrade to 3.6, when the hosts are rhel6 (rhel and/or rhevh).
The procedure below still does not look satisfying to me.
See my comments below please and maybe tell me I am wrong.
(In reply to Artyom from comment #10)
> Verified on rhevm-3.6.9.2-0.1.el6.noarch
> 
> 1) Deploy HE 3.5 on two hosts(one RHEL6.8 and one RHEV-H 20160707.3.el6ev)
> Hosts:
> vdsm-yajsonrpc-4.16.38-1.el6ev.noarch
> vdsm-4.16.38-1.el6ev.x86_64
> vdsm-cli-4.16.38-1.el6ev.noarch
> vdsm-jsonrpc-4.16.38-1.el6ev.noarch
> vdsm-hook-ethtool-options-4.16.38-1.el6ev.noarch
> vdsm-xmlrpc-4.16.38-1.el6ev.noarch
> vdsm-python-zombiereaper-4.16.38-1.el6ev.noarch
> vdsm-python-4.16.38-1.el6ev.noarch
> ovirt-hosted-engine-setup-1.2.6.1-1.el6ev.noarch
> ovirt-hosted-engine-ha-1.2.10-1.el6ev.noarch
> 
> Engine:
> rhevm-3.5.0-0.20.el6ev.noarch
> ##################################################
> 
> 2) Upgrade engine to 3.6
According to this bug, if there are el6 hosts in the setup, it will not happen.
https://bugzilla.redhat.com/show_bug.cgi?id=1311027
But if you tested it, then probably it works.
>  * Enable GlobalMaintenance
>  * Upgrade the engine
>  * Disable GlobalMaintenance
> Engine:
> rhevm-3.6.9.2-0.1.el6.noarch
> ##################################################
> 
> 3) Enable InClusterUpgrade
>  * engine-config -s CheckMixedRhelVersions=false --cver=3.5 && service
> ovirt-engine restart
>  * change cluster scheduler policy to InClusterUpgrade - PASS
> ##################################################
> 
> 4) Upgrade RHEL host to RHEL7.2 with 3.5 packages
>  * put host to maintenance and remove host from the engine
>  * reprovision host to RHEL7.2
>  * add 3.5 repositories
There is no 3.5 repositories for vdsm. 
We are back to the same problem again.
The channel is shared for 3.5 and 3.6 el7 vdsm and ha packages.
That's we decided to introduce RHEV-H 3.5 el7 previously.
>  * re-deploy host to HE(choose the same ID that host has before)
P.S. did this actually work? did it let you reusing the host id even though it is still part of the he setup, when you run --vm-status ?
> ##################################################
> 
> 5) Upgrade RHEV-H to RHEV-H 7.2 with 3.5 packages
>  * put host to maintenance and remove host from the engine
>  * reprovision host to RHEV-H 7.2
>  * re-deploy host to HE(choose the same ID that host has before)
> ##################################################
> 
> 6) Upgrade packages to 3.6 on both hosts
>  * put host to the maintenance
>  * for RHEL host add 3.6 repos and run yum update
>  * for RHEV-H install rhev package on the engine and upgrade the host via
> the engine
> Hosts:
> vdsm-4.17.35-1.el7ev.noarch
> vdsm-python-4.17.35-1.el7ev.noarch
> vdsm-jsonrpc-4.17.35-1.el7ev.noarch
> vdsm-hook-ethtool-options-4.17.35-1.el7ev.noarch
> vdsm-cli-4.17.35-1.el7ev.noarch
> vdsm-yajsonrpc-4.17.35-1.el7ev.noarch
> vdsm-hook-vmfex-dev-4.17.35-1.el7ev.noarch
> vdsm-infra-4.17.35-1.el7ev.noarch
> vdsm-xmlrpc-4.17.35-1.el7ev.noarch
> ovirt-hosted-engine-ha-1.3.5.8-1.el7ev.noarch
> ovirt-hosted-engine-setup-1.3.7.3-1.el7ev.noarch
> ##################################################
Where are the steps to verify that HE storage was actually imported and upgraded to 3.6?
> 
> 7) Change cluster scheduler policy to none and compatibility version to 3.6
> 8) Change data center compatibility version to 3.6

Comment 13 Artyom 2016-09-20 16:40:39 UTC

I had some specific bug to verify, that existing HE VM in the cluster does not prevent from change cluster scheduling policy to the InClusterUpgrade policy.

1) Upgrade of the engine to 3.6 will work, the cluster compatibility version 3.5.
2) I checked it via our QE repositories and not via channels.
3) Yes, it gives you the possibility to reuse the same ID's when you redeploy host.
   I remember we had some bug connects to the fact that you can not use ID 1 when you redeploy the host(I just removed check from the code locally to make it possible to reuse ID 1, do not see the reason for restriction)
4) And again auto-import does not part of the verification, but auto-import succeeded(after W/A!)

I did not write this verification as part of the upgrade documentation, so you will need to adapt it to the channel case.

According to W/A!
I believe the most problematic part of the upgrade flow that I described above, the fact that you can get to the situation when the host HE sanlock ID different from the sanlock ID that engine gave to the host. In this case, auto-import will fail. For more details you can check the bug - https://bugzilla.redhat.com/show_bug.cgi?id=1322849

Comment 15 Eyal Edri 2016-09-27 12:21:46 UTC

3.6.9 was released.

Comment 16 Marina Kalinin 2016-11-12 02:03:32 UTC

(In reply to Artyom from comment #13)
> I had some specific bug to verify, that existing HE VM in the cluster does
> not prevent from change cluster scheduling policy to the InClusterUpgrade
> policy.
> 
> 1) Upgrade of the engine to 3.6 will work, the cluster compatibility version
> 3.5.
What were the uses case you've checked?
Currently, if there is only one host in the cluster, it will be not possible changing the policy of the cluster. Due to bz#1349745.
With 3.6.9 it fails.

> 2) I checked it via our QE repositories and not via channels.
> 3) Yes, it gives you the possibility to reuse the same ID's when you
> redeploy host.
>    I remember we had some bug connects to the fact that you can not use ID 1
> when you redeploy the host(I just removed check from the code locally to
> make it possible to reuse ID 1, do not see the reason for restriction)
> 4) And again auto-import does not part of the verification, but auto-import
> succeeded(after W/A!)
What is W/A?
> 
> I did not write this verification as part of the upgrade documentation, so
> you will need to adapt it to the channel case.
> 
> According to W/A!
> I believe the most problematic part of the upgrade flow that I described
> above, the fact that you can get to the situation when the host HE sanlock
> ID different from the sanlock ID that engine gave to the host. In this case,
> auto-import will fail. For more details you can check the bug -
> https://bugzilla.redhat.com/show_bug.cgi?id=1322849

Comment 17 Artyom 2016-11-13 09:42:02 UTC

(In reply to Marina from comment #16)
> (In reply to Artyom from comment #13)
> > I had some specific bug to verify, that existing HE VM in the cluster does
> > not prevent from change cluster scheduling policy to the InClusterUpgrade
> > policy.
> > 
> > 1) Upgrade of the engine to 3.6 will work, the cluster compatibility version
> > 3.5.
> What were the uses case you've checked?
> Currently, if there is only one host in the cluster, it will be not possible
> changing the policy of the cluster. Due to bz#1349745.
> With 3.6.9 it fails.
I had two hosts in the cluster.
> 
> > 2) I checked it via our QE repositories and not via channels.
> > 3) Yes, it gives you the possibility to reuse the same ID's when you
> > redeploy host.
> >    I remember we had some bug connects to the fact that you can not use ID 1
> > when you redeploy the host(I just removed check from the code locally to
> > make it possible to reuse ID 1, do not see the reason for restriction)
> > 4) And again auto-import does not part of the verification, but auto-import
> > succeeded(after W/A!)
> What is W/A?
> > 
> > I did not write this verification as part of the upgrade documentation, so
> > you will need to adapt it to the channel case.
> > 
This one:
> > According to W/A!
> > I believe the most problematic part of the upgrade flow that I described
> > above, the fact that you can get to the situation when the host HE sanlock
> > ID different from the sanlock ID that engine gave to the host. In this case,
> > auto-import will fail. For more details you can check the bug -
> > https://bugzilla.redhat.com/show_bug.cgi?id=1322849