Bug 1364557
Summary: | 3.6.8 engine forbids set InClusterUpgrade policy on a 3.5 cluster with HE VM | ||
---|---|---|---|
Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Jiri Belka <jbelka> |
Component: | ovirt-engine | Assignee: | Roman Mohr <rmohr> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Artyom <alukiano> |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | 3.6.8 | CC: | alukiano, bugs, dfediuck, eedri, gklein, lsurette, mkalinin, rbalakri, Rhev-m-bugs, rmohr, srevivo, stirabos, ykaul, ylavi |
Target Milestone: | ovirt-3.6.9 | Keywords: | TestOnly, Triaged, ZStream |
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | sla | ||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2016-09-27 12:21:46 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | SLA | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1356127 | ||
Bug Blocks: | 1370968 |
Description
Jiri Belka
2016-08-05 17:12:39 UTC
Hi, (In reply to Jiri Belka from comment #0) > Description of problem: > > I'm was updating 3.5 SHE EL6 env and while following > ovirt-engine/docs/manual/en_US/html/Self-Hosted_Engine_Guide/ > Upgrading_the_Self-Hosted_Engine_from_6_to_7.html I got stucked at: > The documentation is wrong. We should fix this, as I also suggested in Bug 1349745. Having the HE-VM in a cluster where you do an el6 to el7 upgrade is not supported. This was out of scope to reduce the complexity of the feature. You can create an extra cluster for it and move the HE-VM there. Note that starting from 3.6.7 you can migrate the HE-VM through the UI but because of a bug the old cluster still "thinks" that it is running on it. At least for 4.0.2 I see a fix on Bug 1351533, I don't know if it is already fixed for latest 3.6 releases. > ~~~ > In the Administration Portal, set the InClusterUpgrade scheduling policy on > the cluster: > Click the Clusters tab. > Select the cluster and click Edit. > Click the Scheduling Policy tab. > From the Select Policy drop-down list, select InClusterUpgrade. > Click OK. > ~~~ > > Even I clicked on HE VM and put it in Global Maintenance (double checked via > hosted-engine --vm-status), I still could not define InClusterUpgrade policy > as engine was screaming that HE VM does not allow migration. > > (Sorry I did not have DEBUG level on engine.log) > > Anyway, I was brave enough and I did: > > engine=# UPDATE vm_static SET migration_support = 0 where vm_guid = '<uuid>'; > Jiri, this change you made in the DB should be enough to make things work with HE-VM in the same cluster. All we are trying to make sure with the initial checks is to make a successful and problem-free migration for all hosts as likely as possible. But again we do not support or recommend that flow. > Having the HE-VM in a cluster where you do an el6 to el7 upgrade is not
> supported. This was out of scope to reduce the complexity of the feature.
> You can create an extra cluster for it and move the HE-VM there.
I'm not HE expert but I have doubts you can move HE VM to a cluster which is not SHE cluster and IIUC you cannot add additional host into SHE env while it would be in different cluster. But this should be clarified by HE QE.
(In reply to Jiri Belka from comment #4) > > Having the HE-VM in a cluster where you do an el6 to el7 upgrade is not > > supported. This was out of scope to reduce the complexity of the feature. > > You can create an extra cluster for it and move the HE-VM there. > > I'm not HE expert but I have doubts you can move HE VM to a cluster which is > not SHE cluster and IIUC you cannot add additional host into SHE env while > it would be in different cluster. But this should be clarified by HE QE. You can. The storage domain for the hosted engine is mounted directly (in parallel to normal storage domains) by the HE-Agents on the hosts and vdsm, the engine just displays everything. HE-agents do not care in which cluster they are. If running on an older engine version which does not come with the managed HE-VM feature (which allows migrations through the UI), you have to go through the following steps: 1) enter global maintenance mode 2) move one empty HE-host to a different cluster 3) stop the HE-VM directly on the host it is running on via the hosted-engine tool 4) ssh into the host on the different cluster 5) start the HE-VM with the hosted-engine tool there Steps to tests: 1. Deployed 3.6.9 HE on pair of clean 3.6.9 el6 3.5 hosts over NFS. 2. Created new 3.6.9 host cluster with one el6 3.5 host. 3.Added additional hosted-engine-host to new 3.6.9 host cluster via the shell. 4.Migrated HE-VM using WEBUI from old 3.6.9 host cluster to new 3.6.9 host cluster (cross-host-cluster-migration). 5. enable in cluster upgrade policy. 6. upgrade a host to 3.5 el7 RHEV-H in the first cluster. 7. upgrade all hosts to 3.6 el7 in the first cluster. 8. Migrate HE back and upgrade the host in the second cluster to 3.6 el7. Verified on rhevm-3.6.9.2-0.1.el6.noarch 1) Deploy HE 3.5 on two hosts(one RHEL6.8 and one RHEV-H 20160707.3.el6ev) Hosts: vdsm-yajsonrpc-4.16.38-1.el6ev.noarch vdsm-4.16.38-1.el6ev.x86_64 vdsm-cli-4.16.38-1.el6ev.noarch vdsm-jsonrpc-4.16.38-1.el6ev.noarch vdsm-hook-ethtool-options-4.16.38-1.el6ev.noarch vdsm-xmlrpc-4.16.38-1.el6ev.noarch vdsm-python-zombiereaper-4.16.38-1.el6ev.noarch vdsm-python-4.16.38-1.el6ev.noarch ovirt-hosted-engine-setup-1.2.6.1-1.el6ev.noarch ovirt-hosted-engine-ha-1.2.10-1.el6ev.noarch Engine: rhevm-3.5.0-0.20.el6ev.noarch ################################################## 2) Upgrade engine to 3.6 * Enable GlobalMaintenance * Upgrade the engine * Disable GlobalMaintenance Engine: rhevm-3.6.9.2-0.1.el6.noarch ################################################## 3) Enable InClusterUpgrade * engine-config -s CheckMixedRhelVersions=false --cver=3.5 && service ovirt-engine restart * change cluster scheduler policy to InClusterUpgrade - PASS ################################################## 4) Upgrade RHEL host to RHEL7.2 with 3.5 packages * put host to maintenance and remove host from the engine * reprovision host to RHEL7.2 * add 3.5 repositories * re-deploy host to HE(choose the same ID that host has before) ################################################## 5) Upgrade RHEV-H to RHEV-H 7.2 with 3.5 packages * put host to maintenance and remove host from the engine * reprovision host to RHEV-H 7.2 * re-deploy host to HE(choose the same ID that host has before) ################################################## 6) Upgrade packages to 3.6 on both hosts * put host to the maintenance * for RHEL host add 3.6 repos and run yum update * for RHEV-H install rhev package on the engine and upgrade the host via the engine Hosts: vdsm-4.17.35-1.el7ev.noarch vdsm-python-4.17.35-1.el7ev.noarch vdsm-jsonrpc-4.17.35-1.el7ev.noarch vdsm-hook-ethtool-options-4.17.35-1.el7ev.noarch vdsm-cli-4.17.35-1.el7ev.noarch vdsm-yajsonrpc-4.17.35-1.el7ev.noarch vdsm-hook-vmfex-dev-4.17.35-1.el7ev.noarch vdsm-infra-4.17.35-1.el7ev.noarch vdsm-xmlrpc-4.17.35-1.el7ev.noarch ovirt-hosted-engine-ha-1.3.5.8-1.el7ev.noarch ovirt-hosted-engine-setup-1.3.7.3-1.el7ev.noarch ################################################## 7) Change cluster scheduler policy to none and compatibility version to 3.6 8) Change data center compatibility version to 3.6 I'm little bit surprise this bug is solved as in another bug it seems mixing EL variants in one cluster is not supported anymore https://bugzilla.redhat.com/show_bug.cgi?id=1349745 Assuming we are talking here about 3.5 HE setup upgrade to 3.6, when the hosts are rhel6 (rhel and/or rhevh). The procedure below still does not look satisfying to me. See my comments below please and maybe tell me I am wrong. (In reply to Artyom from comment #10) > Verified on rhevm-3.6.9.2-0.1.el6.noarch > > 1) Deploy HE 3.5 on two hosts(one RHEL6.8 and one RHEV-H 20160707.3.el6ev) > Hosts: > vdsm-yajsonrpc-4.16.38-1.el6ev.noarch > vdsm-4.16.38-1.el6ev.x86_64 > vdsm-cli-4.16.38-1.el6ev.noarch > vdsm-jsonrpc-4.16.38-1.el6ev.noarch > vdsm-hook-ethtool-options-4.16.38-1.el6ev.noarch > vdsm-xmlrpc-4.16.38-1.el6ev.noarch > vdsm-python-zombiereaper-4.16.38-1.el6ev.noarch > vdsm-python-4.16.38-1.el6ev.noarch > ovirt-hosted-engine-setup-1.2.6.1-1.el6ev.noarch > ovirt-hosted-engine-ha-1.2.10-1.el6ev.noarch > > Engine: > rhevm-3.5.0-0.20.el6ev.noarch > ################################################## > > 2) Upgrade engine to 3.6 According to this bug, if there are el6 hosts in the setup, it will not happen. https://bugzilla.redhat.com/show_bug.cgi?id=1311027 But if you tested it, then probably it works. > * Enable GlobalMaintenance > * Upgrade the engine > * Disable GlobalMaintenance > Engine: > rhevm-3.6.9.2-0.1.el6.noarch > ################################################## > > 3) Enable InClusterUpgrade > * engine-config -s CheckMixedRhelVersions=false --cver=3.5 && service > ovirt-engine restart > * change cluster scheduler policy to InClusterUpgrade - PASS > ################################################## > > 4) Upgrade RHEL host to RHEL7.2 with 3.5 packages > * put host to maintenance and remove host from the engine > * reprovision host to RHEL7.2 > * add 3.5 repositories There is no 3.5 repositories for vdsm. We are back to the same problem again. The channel is shared for 3.5 and 3.6 el7 vdsm and ha packages. That's we decided to introduce RHEV-H 3.5 el7 previously. > * re-deploy host to HE(choose the same ID that host has before) P.S. did this actually work? did it let you reusing the host id even though it is still part of the he setup, when you run --vm-status ? > ################################################## > > 5) Upgrade RHEV-H to RHEV-H 7.2 with 3.5 packages > * put host to maintenance and remove host from the engine > * reprovision host to RHEV-H 7.2 > * re-deploy host to HE(choose the same ID that host has before) > ################################################## > > 6) Upgrade packages to 3.6 on both hosts > * put host to the maintenance > * for RHEL host add 3.6 repos and run yum update > * for RHEV-H install rhev package on the engine and upgrade the host via > the engine > Hosts: > vdsm-4.17.35-1.el7ev.noarch > vdsm-python-4.17.35-1.el7ev.noarch > vdsm-jsonrpc-4.17.35-1.el7ev.noarch > vdsm-hook-ethtool-options-4.17.35-1.el7ev.noarch > vdsm-cli-4.17.35-1.el7ev.noarch > vdsm-yajsonrpc-4.17.35-1.el7ev.noarch > vdsm-hook-vmfex-dev-4.17.35-1.el7ev.noarch > vdsm-infra-4.17.35-1.el7ev.noarch > vdsm-xmlrpc-4.17.35-1.el7ev.noarch > ovirt-hosted-engine-ha-1.3.5.8-1.el7ev.noarch > ovirt-hosted-engine-setup-1.3.7.3-1.el7ev.noarch > ################################################## Where are the steps to verify that HE storage was actually imported and upgraded to 3.6? > > 7) Change cluster scheduler policy to none and compatibility version to 3.6 > 8) Change data center compatibility version to 3.6 I had some specific bug to verify, that existing HE VM in the cluster does not prevent from change cluster scheduling policy to the InClusterUpgrade policy. 1) Upgrade of the engine to 3.6 will work, the cluster compatibility version 3.5. 2) I checked it via our QE repositories and not via channels. 3) Yes, it gives you the possibility to reuse the same ID's when you redeploy host. I remember we had some bug connects to the fact that you can not use ID 1 when you redeploy the host(I just removed check from the code locally to make it possible to reuse ID 1, do not see the reason for restriction) 4) And again auto-import does not part of the verification, but auto-import succeeded(after W/A!) I did not write this verification as part of the upgrade documentation, so you will need to adapt it to the channel case. According to W/A! I believe the most problematic part of the upgrade flow that I described above, the fact that you can get to the situation when the host HE sanlock ID different from the sanlock ID that engine gave to the host. In this case, auto-import will fail. For more details you can check the bug - https://bugzilla.redhat.com/show_bug.cgi?id=1322849 3.6.9 was released. (In reply to Artyom from comment #13) > I had some specific bug to verify, that existing HE VM in the cluster does > not prevent from change cluster scheduling policy to the InClusterUpgrade > policy. > > 1) Upgrade of the engine to 3.6 will work, the cluster compatibility version > 3.5. What were the uses case you've checked? Currently, if there is only one host in the cluster, it will be not possible changing the policy of the cluster. Due to bz#1349745. With 3.6.9 it fails. > 2) I checked it via our QE repositories and not via channels. > 3) Yes, it gives you the possibility to reuse the same ID's when you > redeploy host. > I remember we had some bug connects to the fact that you can not use ID 1 > when you redeploy the host(I just removed check from the code locally to > make it possible to reuse ID 1, do not see the reason for restriction) > 4) And again auto-import does not part of the verification, but auto-import > succeeded(after W/A!) What is W/A? > > I did not write this verification as part of the upgrade documentation, so > you will need to adapt it to the channel case. > > According to W/A! > I believe the most problematic part of the upgrade flow that I described > above, the fact that you can get to the situation when the host HE sanlock > ID different from the sanlock ID that engine gave to the host. In this case, > auto-import will fail. For more details you can check the bug - > https://bugzilla.redhat.com/show_bug.cgi?id=1322849 (In reply to Marina from comment #16) > (In reply to Artyom from comment #13) > > I had some specific bug to verify, that existing HE VM in the cluster does > > not prevent from change cluster scheduling policy to the InClusterUpgrade > > policy. > > > > 1) Upgrade of the engine to 3.6 will work, the cluster compatibility version > > 3.5. > What were the uses case you've checked? > Currently, if there is only one host in the cluster, it will be not possible > changing the policy of the cluster. Due to bz#1349745. > With 3.6.9 it fails. I had two hosts in the cluster. > > > 2) I checked it via our QE repositories and not via channels. > > 3) Yes, it gives you the possibility to reuse the same ID's when you > > redeploy host. > > I remember we had some bug connects to the fact that you can not use ID 1 > > when you redeploy the host(I just removed check from the code locally to > > make it possible to reuse ID 1, do not see the reason for restriction) > > 4) And again auto-import does not part of the verification, but auto-import > > succeeded(after W/A!) > What is W/A? > > > > I did not write this verification as part of the upgrade documentation, so > > you will need to adapt it to the channel case. > > This one: > > According to W/A! > > I believe the most problematic part of the upgrade flow that I described > > above, the fact that you can get to the situation when the host HE sanlock > > ID different from the sanlock ID that engine gave to the host. In this case, > > auto-import will fail. For more details you can check the bug - > > https://bugzilla.redhat.com/show_bug.cgi?id=1322849 |