Bug 1356127 - Can't upgrade to new cluster version when HE VM is running in it
Summary: Can't upgrade to new cluster version when HE VM is running in it
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 3.6.8
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ovirt-3.6.9
: 3.6.9
Assignee: Roy Golan
QA Contact: Nikolai Sednev
URL:
Whiteboard:
Depends On: 1351533 1375188 1375240 1375573
Blocks: 1364557
TreeView+ depends on / blocked
 
Reported: 2016-07-13 12:59 UTC by Roy Golan
Modified: 2019-04-28 13:29 UTC (History)
15 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1351533
Environment:
Last Closed: 2016-09-21 18:04:33 UTC
oVirt Team: SLA
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2016:1929 normal SHIPPED_LIVE Moderate: Red Hat Virtualization Manager (RHV) bug fix 3.6.9 2016-09-21 21:57:10 UTC
oVirt gerrit 62425 ovirt-engine-3.6 MERGED core: Support migrating the HE VM out of cluster 2016-09-06 07:17:13 UTC
oVirt gerrit 63377 ovirt-engine-3.6.9 MERGED core: Support migrating the HE VM out of cluster 2016-09-06 09:20:44 UTC
oVirt gerrit 63378 ovirt-engine-3.6 MERGED core: fix wrong imports order 2016-09-06 07:53:49 UTC

Description Roy Golan 2016-07-13 12:59:50 UTC
+++ This bug was initially created as a clone of Bug #1351533 +++

Description of problem:
The cluster of the HE VM can't have its compat level updated from 3.6 to 4.0, because modifying the HE VM cluster is a blocked operation. So eventhough technically the HE VM could be started or migrated out-of-cluster (right click, migrate, advanced, choose your 4.0 cluster), the underlying ChangeVmCluster action is blocked specifically

Version-Release number of selected component (if applicable):
all engine versions

How reproducible:
100%

Steps to Reproduce:
1. have 3.6 cluster which the new 4.0 HE VM is running in
2. create new 4.0 cluster, deploy and additinonal HE host there
3. right click HE-VM, migrate -> advanced -> choose new 4.0 cluster

Actual results:
VM migrates, but its cluster will show its still the old one
in the logs it will fail with:

```log
2016-06-28 21:55:20,734 WARN  [org.ovirt.engine.core.bll.ChangeVMClusterCommand] (ForkJoinPool-1-worker-0) [3ce2a3ad] Validation of action 'ChangeVMCluster' failed for user SYSTEM. Reasons: VAR__ACTION__UPDATE,VAR__TYPE__VM__CLUSTER,ACTION_TYPE_FAILED_CANNOT_RUN_ACTION_ON_NON_MANAGED_VM
```

Expected results:
Allow to change VM cluster on HE VM to complete the cluster compat upgrade

Additional info:

Probably we should allow to 'force' operations using REST API

--- Additional comment from Roy Golan on 2016-07-03 04:32:51 EDT ---

To make this action safe, it will prevent the ChangeVmCluster action if some HE hosts are active on other clusters which are not the destination cluster. 

All the HE hosts should be running on the destination cluster to prevent the HA cluster to span across ovirt clusters.

Comment 1 Yaniv Lavi 2016-08-28 18:02:14 UTC
What is the status of the backport. It's blocking the upgrade of the HE cluster.

Comment 2 Yaniv Lavi 2016-08-28 18:02:35 UTC
From el6 to el7.

Comment 3 Marina Kalinin 2016-09-08 00:55:03 UTC
This bug is quite confusing.
So, let's say we are talking about 3.5 and 3.6 here, right? Because it is for 3.6.z.
~~~
Steps to Reproduce:
1. have 3.5 cluster which the new 3.6 HE VM is running in
2. create new 3.6 cluster, deploy and additinonal HE host there
3. right click HE-VM, migrate -> advanced -> choose new 3.6 cluster
~~~~
So, we do need to create a new cluster for this upgrade, as per step 1 in here:
https://access.redhat.com/solutions/2351141

Not sure how this bug applies to this flow. I think Didi tested the flow in the kcs and it worked fine for him. 
Or maybe I am missing something.

Comment 5 Yedidyah Bar David 2016-09-11 12:31:23 UTC
This bug refers to [1]. IIUC the original reason for it is the fix to bug 1336527, which probably still wasn't fixed when I did my tests, which is why it then worked for me.

[1] https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Virtualization/3.6/html/Upgrade_Guide/chap-Post-Upgrade_Tasks.html

Comment 6 Marina Kalinin 2016-09-12 21:36:38 UTC
Didi, exactly my point.
as of 3.6.8, we rolled back the change introduced by bz#1336527, since it prevented performing rolling, online upgrades from our customers. Instead, a different fix was introduced in 3.6.8 and it is described in bz#1356194.

So, except 3.6.7, you can change CL w/out shutting down the VMs.
That's why I think this bug may not be relevant. Unless I misunderstand something. 

So for 3.5 to 3.6 HE upgrade, if it is from RHEL6 to RHEL7 hosts, you must create a new cluster anyway, cause at that point you still cannot mix those in same cluster.
For 3.5 environments with RHEL7 only hosts, you should be able changing the CL level without shutting down the VMs in the cluster, unless you are on 3.6.7. And if you are, upgrade to latest available.

I think this should be closed as not a bug, iiuc.

Comment 7 Yedidyah Bar David 2016-09-13 05:50:01 UTC
(In reply to Marina from comment #6)
> Didi, exactly my point.
> as of 3.6.8, we rolled back the change introduced by bz#1336527, since it
> prevented performing rolling, online upgrades from our customers. Instead, a
> different fix was introduced in 3.6.8 and it is described in bz#1356194.

Roy, can you please clarify?

> 
> So, except 3.6.7, you can change CL w/out shutting down the VMs.
> That's why I think this bug may not be relevant. Unless I misunderstand
> something. 
> 
> So for 3.5 to 3.6 HE upgrade, if it is from RHEL6 to RHEL7 hosts, you must
> create a new cluster anyway, cause at that point you still cannot mix those
> in same cluster.
> For 3.5 environments with RHEL7 only hosts, you should be able changing the
> CL level without shutting down the VMs in the cluster, unless you are on
> 3.6.7. And if you are, upgrade to latest available.
> 
> I think this should be closed as not a bug, iiuc.

Can't, by now, patches already merged. If you think it's notabug, we need to revert them. I admit I did not fully follow all the changes etc.

Comment 8 Nikolai Sednev 2016-09-13 08:28:35 UTC
I've followed this scenarion and it worked for me:
1.Environment running both engine and hosts in 3.6.9.
2.Upgrade one of the hosts to 4.0.x.
3.Migrate HE-VM to 4.0 host.
4.perform backup on engine and use appliance upgrade procedure on 4.0 host.
5.Restore from backed up data engine's environment.
6.Migrate all VMs to 4.0 host.
7.Upgrade old host 3.6.9->4.0.
8.Level up host cluster compatibility mode

Installing clean 4.0 engine on 3.6.9 host is not right thing to do.
Please consider closing this bug as not a bug.

Comment 11 Yedidyah Bar David 2016-09-13 12:57:44 UTC
(In reply to Nikolai Sednev from comment #9)
> 1.-I've deployed over NFS storage and using PXE, a new and clean Red Hat
> Virtualization Manager Version: 4.0.4.2-0.1.el7ev, on 3.6.9 el7.2 host.
> 2.-Got this:
> [ INFO  ] Still waiting for VDSM host to become operational...
>           The host alma03.qa.lab.tlv.redhat.com is in non-operational state.
>           Please try to activate it via the engine webadmin UI.
>           Retry checking host status or ignore this and continue (Retry,
> Ignore)[Retry]?
>           The host alma03.qa.lab.tlv.redhat.com is in non-operational state.
>           Please try to activate it via the engine webadmin UI.
>           Retry checking host status or ignore this and continue (Retry,
> Ignore)[Retry]? 
> 
> 3.-From the WEBUI I've tried to activate the host and got it in "Unassigned"
> status.
> 

What makes you think this is current bug?

If it's a new bug, please open one. IIRC we already have a similar one, but better open a new one and close as duplicate than add unrelated stuff to current bug.

Comment 13 Nikolai Sednev 2016-09-13 13:15:22 UTC
Cloning my latest comments to a new bug forth to comment #13.
https://bugzilla.redhat.com/show_bug.cgi?id=1375573

Comment 14 Nikolai Sednev 2016-09-13 13:24:08 UTC
Returning the need-info on Roy, removed it unintentionally from https://bugzilla.redhat.com/show_bug.cgi?id=1356127#c7.

Please consider changing current bug status to ON_QA when QA will have anything to do with the bug, as currently we can't verify it at all.

Comment 16 Roy Golan 2016-09-14 12:16:17 UTC
(In reply to Nikolai Sednev from comment #14)
> Returning the need-info on Roy, removed it unintentionally from
> https://bugzilla.redhat.com/show_bug.cgi?id=1356127#c7.
> 
> Please consider changing current bug status to ON_QA when QA will have
> anything to do with the bug, as currently we can't verify it at all.


The verification of this bug should be to be able to migrate the hosted VM to another cluster. It shouldn't be a 4.0 cluster (cause the bug is 3.6).

Comment 19 Nikolai Sednev 2016-09-14 21:44:30 UTC
As agreed with Roy, I've made these reproduction steps only to verify the bug:
1.Deployed 3.6.9HE on pair of clean 3.6.9 el7.2 hosts over NFS.
2.Created new 3.6.9 host cluster.
3.Added additional hosted-engine-host to new 3.6.9 host cluster via the shell as there is no add additional host in 3.6.9 from WEBUI yet.
4.Migrated HE-VM using WEBUI from old 3.6.9 host cluster to new 3.6.9 host cluster (cross-host-cluster-migration).

New reproduction scenario worked for me on these components on hosts:
ovirt-vmconsole-1.0.4-1.el7ev.noarch
ovirt-setup-lib-1.0.1-1.el7ev.noarch
qemu-kvm-rhev-2.3.0-31.el7_2.21.x86_64
rhev-release-3.6.9-2-001.noarch
sanlock-3.2.4-3.el7_2.x86_64
ovirt-vmconsole-host-1.0.4-1.el7ev.noarch
ovirt-hosted-engine-setup-1.3.7.3-1.el7ev.noarch
rhevm-appliance-20160620.0-1.el7ev.noarch
libvirt-client-1.2.17-13.el7_2.5.x86_64
mom-0.5.6-1.el7ev.noarch
vdsm-4.17.35-1.el7ev.noarch
ovirt-host-deploy-1.4.1-1.el7ev.noarch
rhevm-sdk-python-3.6.9.1-1.el7ev.noarch
ovirt-hosted-engine-ha-1.3.5.8-1.el7ev.noarch
Linux version 3.10.0-327.36.1.el7.x86_64 (mockbuild@x86-037.build.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-4) (GCC) ) #1 SMP Wed Aug 17 03:02:37 EDT 2016
Linux 3.10.0-327.36.1.el7.x86_64 #1 SMP Wed Aug 17 03:02:37 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 7.2 (Maipo)

On engine:
ovirt-engine-extension-aaa-jdbc-1.0.7-2.el6ev.noarch                                                                                              
ovirt-setup-lib-1.0.1-1.el6ev.noarch                                                                                                              
ovirt-vmconsole-1.0.4-1.el6ev.noarch                                                                                                              
rhevm-setup-plugin-ovirt-engine-3.6.9.1-0.1.1.el6.noarch                                                                                          
ovirt-vmconsole-proxy-1.0.4-1.el6ev.noarch                                                                                                        
rhevm-setup-plugin-ovirt-engine-common-3.6.9.1-0.1.1.el6.noarch                                                                                   
ovirt-host-deploy-1.4.1-1.el6ev.noarch
ovirt-host-deploy-java-1.4.1-1.el6ev.noarch
rhevm-image-uploader-3.6.1-2.el6ev.noarch
rhevm-dependencies-3.6.1-1.el6ev.noarch
rhevm-reports-setup-3.6.5.1-1.el6ev.noarch
rhevm-3.6.9.1-0.1.1.el6.noarch
rhevm-spice-client-x64-cab-3.6-7.el6.noarch
rhevm-setup-plugins-3.6.5-1.el6ev.noarch
rhevm-setup-base-3.6.9.1-0.1.1.el6.noarch
rhevm-setup-3.6.9.1-0.1.1.el6.noarch
rhevm-backend-3.6.9.1-0.1.1.el6.noarch
rhevm-branding-rhev-3.6.0-10.el6ev.noarch
rhevm-setup-plugin-ovirt-engine-3.6.9.1-0.1.1.el6.noarch
rhevm-dwh-setup-3.6.8-1.el6ev.noarch
rhevm-tools-backup-3.6.9.1-0.1.1.el6.noarch
rhevm-restapi-3.6.9.1-0.1.1.el6.noarch
rhevm-spice-client-x86-cab-3.6-7.el6.noarch
rhevm-guest-agent-common-1.0.11-6.el6ev.noarch
rhevm-sdk-python-3.6.9.1-1.el6ev.noarch
rhevm-setup-plugin-vmconsole-proxy-helper-3.6.9.1-0.1.1.el6.noarch
rhevm-vmconsole-proxy-helper-3.6.9.1-0.1.1.el6.noarch
rhevm-dwh-3.6.8-1.el6ev.noarch
rhevm-log-collector-3.6.1-1.el6ev.noarch
rhevm-dbscripts-3.6.9.1-0.1.1.el6.noarch
rhevm-userportal-3.6.9.1-0.1.1.el6.noarch
rhevm-spice-client-x86-msi-3.6-7.el6.noarch
rhev-release-3.6.9-2-001.noarch
rhevm-lib-3.6.9.1-0.1.1.el6.noarch
rhevm-setup-plugin-ovirt-engine-common-3.6.9.1-0.1.1.el6.noarch
rhevm-cli-3.6.9.0-1.el6ev.noarch
rhevm-doc-3.6.9-1.el6eng.noarch
rhevm-websocket-proxy-3.6.9.1-0.1.1.el6.noarch
rhevm-extensions-api-impl-3.6.9.1-0.1.1.el6.noarch
rhevm-webadmin-portal-3.6.9.1-0.1.1.el6.noarch
rhevm-setup-plugin-websocket-proxy-3.6.9.1-0.1.1.el6.noarch
rhev-guest-tools-iso-3.6-6.el6ev.noarch
rhevm-reports-3.6.5.1-1.el6ev.noarch
rhevm-tools-3.6.9.1-0.1.1.el6.noarch
rhevm-spice-client-x64-msi-3.6-7.el6.noarch
rhevm-iso-uploader-3.6.0-1.el6ev.noarch
Linux version 2.6.32-642.el6.x86_64 (mockbuild@x86-033.build.eng.bos.redhat.com) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-17) (GCC) ) #1 SMP Wed Apr 13 00:51:26 EDT 2016
Linux 2.6.32-642.el6.x86_64 #1 SMP Wed Apr 13 00:51:26 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 6.8 (Santiago)

Comment 21 errata-xmlrpc 2016-09-21 18:04:33 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2016-1929.html


Note You need to log in before you can comment on or make changes to this bug.