Bug 1356194 - During cluster level upgrade - warn and mark VMs as pending a configuration change when they are running
Summary: During cluster level upgrade - warn and mark VMs as pending a configuration c...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 3.6.7
Hardware: Unspecified
OS: Unspecified
urgent
high
Target Milestone: ovirt-3.6.8
: ---
Assignee: Marek Libra
QA Contact: sefi litmanovich
URL:
Whiteboard:
Depends On: 1348907
Blocks: 1356027 1357513
TreeView+ depends on / blocked
 
Reported: 2016-07-13 15:42 UTC by Marina Kalinin
Modified: 2020-02-14 17:50 UTC (History)
26 users (show)

Fixed In Version:
Doc Type: Enhancement
Doc Text:
Previously, cluster compatibility version upgrades were blocked if there was a running virtual machine in the cluster. Now, the user is informed about running/suspended virtual machines in a cluster when changing the cluster version. All such virtual machines are marked with a Next Run Configuration symbol to denote the requirement for rebooting them as soon as possible after the cluster version upgrade.
Clone Of: 1348907
: 1357513 (view as bug list)
Environment:
Last Closed: 2016-07-27 14:14:56 UTC
oVirt Team: Virt
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
engine log (269.51 KB, application/x-gzip)
2016-07-21 12:10 UTC, sefi litmanovich
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1356198 0 urgent CLOSED [Docs] Must specify that changing cluster compat mode level (CL) to 3.6 requires VMs shut down first 2021-02-22 00:41:40 UTC
Red Hat Knowledge Base (Solution) 2442801 0 None None None 2016-07-13 16:59:00 UTC
Red Hat Product Errata RHBA-2016:1507 0 normal SHIPPED_LIVE Red Hat Enterprise Virtualization Manager (rhevm) bug fix 3.6.8 2016-07-27 18:10:22 UTC
oVirt gerrit 59607 0 None MERGED webadmin: Warn for running VMs when cluster level change 2020-06-09 13:43:31 UTC
oVirt gerrit 59630 0 None MERGED webadmin: Warn for running VMs when cluster level change 2020-06-09 13:43:31 UTC
oVirt gerrit 60730 0 ovirt-engine-3.6 MERGED webadmin: Warn for running VMs when cluster level change 2020-06-09 13:43:31 UTC
oVirt gerrit 60936 0 ovirt-engine-3.6.8 MERGED webadmin: Warn for running VMs when cluster level change 2020-06-09 13:43:31 UTC

Internal Links: 1356198

Comment 2 Marina Kalinin 2016-07-13 15:56:13 UTC
The workaround would be to create a new cluster, with 3.5 compat mode and live migrate the VMs that cannot be restarted yet to that cluster. 

Or just remain in 3.5 compat mode until the fix for this bug is released.
The fix of this bug would allow to change cluster compat mode with VMs running and will add a note to each VM saying, it must be restarted to be fully 3.6. compatible. Until then it will run in 3.5 compat mode.

It is important to understand, that even if cluster compat mode would be changed on the fly with running VMs, the VMs would not be able to enjoy the new features of 3.6 until restarted. And this is due to the nature of the changes in 3.6, that require VM restart.

Comment 4 Marina Kalinin 2016-07-13 16:41:57 UTC
Copying here additional questions from the field for Michal:

(In reply to Michal Skrivanek from comment #0)
>
> Until the VMs are restarted the VM's behavior is not going to be correct,

- What does exactly "behavior is not going to be correct" mean or imply? 
- For how long can the restart of the VMs be deferred?

Comment 6 Carl Thompson 2016-07-13 19:02:07 UTC
(In reply to Marina from comment #2)
> The workaround would be to create a new cluster, with 3.5 compat mode and
> live migrate the VMs that cannot be restarted yet to that cluster. 

> ...

This does not work for the HE case as discussed in bug 1341023:

It does not work to cross-cluster migrate the HE VM. The VM migrates to the new host in a different cluster fine but the HE VM is still associated with the previous cluster. In other words, the VM is in one cluster but ends up running on a  host that's in a different cluster. That itself might be considered a bug. Because the HE VM still is associated with the first cluster oVirt won't let me update the Compatibility Version of the first cluster (even though the HE VM is actually running on a host in a different cluster)!

Thanks!

Comment 7 Michal Skrivanek 2016-07-14 09:53:06 UTC
(In reply to Marina from comment #4)
> Copying here additional questions from the field for Michal:
> 
> (In reply to Michal Skrivanek from comment #0)
> >
> > Until the VMs are restarted the VM's behavior is not going to be correct,
> 
> - What does exactly "behavior is not going to be correct" mean or imply? 
> - For how long can the restart of the VMs be deferred?

It should not be supported at all. But if have to have it then so be it. But our messaging around that should be to encourage users to do that at their earliest opportunity.
When you change a cluster level without the corresponding change in VM hardware those features depending on that changed HW or behavior won't work correctly. Features are not designed in that way, they are not coded that way, nor tested.

"those features" are intentionally vague. I don't know which they are, probably not too many, a prominent one and simple to observe is the hotplug RAM, but there were tens of other features introduced in 3.6 and without a detailed review of all of them we can't really say for sure.

Comment 11 Marina Kalinin 2016-07-14 14:55:31 UTC
Note for QE: based on this situation, we also need to test 3.5 compatibility mode cluster in 3.6 and make sure basic features work, like opening the console and performing live migrations of VMs.

I think one version back would be sufficient.
For 3.6 we should check 3.5 compat mode.

For 4.0 we should check 3.6 compat mode.

Ideally, we would like to test all the combinations, but I believe one version back should be sufficient enough.

Comment 25 sefi litmanovich 2016-07-21 12:08:05 UTC
Hi,


Setup: RHEVM Version rhevm-3.6.8.1-0.1.el6.noarch

DC + cluster comp 3.5
2 RHEL 7.2 hosts (nested) - vdsm: 4.16.38-1.el7ev

		
Upgrade flow:	
   1. Switch Host_1 to maintenance
   2. Upgrade Host_1 to 3.6 - vdsm: 4.17.33-1.el7ev
   3. Start host_1
   4. Switch Host_2 to maintenance
   5. Upgrade Host_2 to 3.6
   6. Start host_2
   7. Upgrade compatibility version for DC and Cluster to 3.6

Before upgrade:
1. 1 vm suspended
2. 1 vm up with snapshot (not live)
3. 1 vm up with snapshot (live with memory)
4. 1 template
5. 1 vm up with run_once with VNC/CIRRUS
6. 1 vm up with run_once with VNC/QXL
7. 1 vm up with SPICE/QXL run_once with VNC 

Upgrade Warning was issued as expected following this patch + vms are marked for reconfiguration on next run.

After upgrade:  
Cases (with 3.5 vms):			
1. Suspended VM: After upgrade check that vm is resumed proparely : Pass	
2. Snapshot: Check that snapshot created in 3.5 restore in 3.6.
With memory: pass
Without Memeory: pass
Preview + commit: pass
Clone from snapshot: pass
Remove snapshot: Pass
3. Sanity: stop + start suspend + remove: Pass
4. Migration: Pass
5. Template: Create vm from template: pass ; Create new template version from a 3.5 vm after upgrade: pass; Create a vm from the new template version: pass.
5. CPU hot-plug: pass
6. Memory hot-plug: Failed on all 3.5 vms (original and cloned) + on 3.5 vms that were restarted + on vms created from the 3.5 template after the upgrade.
The feature worked only on new vms created after the upgrade from a different template (or from scratch).
Failure of memory hotplug didn't produce any message in engine.log + on vm right after upgrade it also doesn't change the memory in UI general tab.	
7. Consoles (case from -Bug 1297404): all cases passed.

Will attach the engine log (note that this run started yesterday afternoon) although not very informative.

Please let me know if this covers what we want to test on 3.6 case and if we want to open a bug on memory hotplug.
Also if there's any other feature/flow you think that should be covered.

Comment 26 sefi litmanovich 2016-07-21 12:09:25 UTC
Moran, please see my previous comment.
10x.

Comment 27 sefi litmanovich 2016-07-21 12:10:49 UTC
Created attachment 1182450 [details]
engine log

Comment 30 Michal Skrivanek 2016-07-21 14:24:29 UTC
(In reply to sefi litmanovich from comment #25)

Hi Sefi, let me ask for few more details

> Before upgrade:
> 1. 1 vm suspended
> 2. 1 vm up with snapshot (not live)
> 3. 1 vm up with snapshot (live with memory)
> 4. 1 template
> 5. 1 vm up with run_once with VNC/CIRRUS
> 6. 1 vm up with run_once with VNC/QXL
> 7. 1 vm up with SPICE/QXL run_once with VNC 
> 
> Upgrade Warning was issued as expected following this patch + vms are marked
> for reconfiguration on next run.
> 
> After upgrade:  
> Cases (with 3.5 vms):			
> 1. Suspended VM: After upgrade check that vm is resumed proparely : Pass	

it should have been restored as a 3.5 VM and the troubles with hotplug mem confirms it.

> 2. Snapshot: Check that snapshot created in 3.5 restore in 3.6.
> With memory: pass
> Without Memeory: pass

you should notice a difference. restored VM with memory will be a "3.5 VM", without memory will be a "3.6 VM"

> Preview + commit: pass
> Clone from snapshot: pass
> Remove snapshot: Pass
> 3. Sanity: stop + start suspend + remove: Pass
> 4. Migration: Pass
> 5. Template: Create vm from template: pass ; Create new template version
> from a 3.5 vm after upgrade: pass; Create a vm from the new template
> version: pass.
> 5. CPU hot-plug: pass
> 6. Memory hot-plug: Failed on all 3.5 vms (original and cloned) + on 3.5 vms
> that were restarted + on vms created from the 3.5 template after the upgrade.
> The feature worked only on new vms created after the upgrade from a
> different template (or from scratch).
> Failure of memory hotplug didn't produce any message in engine.log + on vm
> right after upgrade it also doesn't change the memory in UI general tab.	
> 7. Consoles (case from -Bug 1297404): all cases passed.

Additionally I would expect a specific graphics-focused test would actually fail in 3.6->4.0 case since the "3.6 VM" uses cirrus and "4.0 VM" uses vga with different video RAM sizes


> Will attach the engine log (note that this run started yesterday afternoon)
> although not very informative.
> 
> Please let me know if this covers what we want to test on 3.6 case and if we
> want to open a bug on memory hotplug.

there is no bug, this is what fixing this bug means

> Also if there's any other feature/flow you think that should be covered.

Comment 31 sefi litmanovich 2016-07-21 15:51:11 UTC
(In reply to Michal Skrivanek from comment #30)
> (In reply to sefi litmanovich from comment #25)
> 
> Hi Sefi, let me ask for few more details
> 
> > Before upgrade:
> > 1. 1 vm suspended
> > 2. 1 vm up with snapshot (not live)
> > 3. 1 vm up with snapshot (live with memory)
> > 4. 1 template
> > 5. 1 vm up with run_once with VNC/CIRRUS
> > 6. 1 vm up with run_once with VNC/QXL
> > 7. 1 vm up with SPICE/QXL run_once with VNC 
> > 
> > Upgrade Warning was issued as expected following this patch + vms are marked
> > for reconfiguration on next run.
> > 
> > After upgrade:  
> > Cases (with 3.5 vms):			
> > 1. Suspended VM: After upgrade check that vm is resumed proparely : Pass	
> 
> it should have been restored as a 3.5 VM and the troubles with hotplug mem
> confirms it.

Yes I think we got the expected behaviour in this case.

> > 2. Snapshot: Check that snapshot created in 3.5 restore in 3.6.
> > With memory: pass
> > Without Memeory: pass
> 
> you should notice a difference. restored VM with memory will be a "3.5 VM",
> without memory will be a "3.6 VM"

I don't recall any differences between them, but I can always re produce again and re verify it.

> > Preview + commit: pass
> > Clone from snapshot: pass
> > Remove snapshot: Pass
> > 3. Sanity: stop + start suspend + remove: Pass
> > 4. Migration: Pass
> > 5. Template: Create vm from template: pass ; Create new template version
> > from a 3.5 vm after upgrade: pass; Create a vm from the new template
> > version: pass.
> > 5. CPU hot-plug: pass
> > 6. Memory hot-plug: Failed on all 3.5 vms (original and cloned) + on 3.5 vms
> > that were restarted + on vms created from the 3.5 template after the upgrade.
> > The feature worked only on new vms created after the upgrade from a
> > different template (or from scratch).
> > Failure of memory hotplug didn't produce any message in engine.log + on vm
> > right after upgrade it also doesn't change the memory in UI general tab.	
> > 7. Consoles (case from -Bug 1297404): all cases passed.
> 
> Additionally I would expect a specific graphics-focused test would actually
> fail in 3.6->4.0 case since the "3.6 VM" uses cirrus and "4.0 VM" uses vga
> with different video RAM sizes

I did not test 3.6->4.0 this time, the scope of this bug is 3.5->3.6.

> > Will attach the engine log (note that this run started yesterday afternoon)
> > although not very informative.
> > 
> > Please let me know if this covers what we want to test on 3.6 case and if we
> > want to open a bug on memory hotplug.
> 
> there is no bug, this is what fixing this bug means

Not sure I get what you mean here? That this is not a bug, or that this bug should be fixed (in which case shouldn't we move it back to assigned?)

> > Also if there's any other feature/flow you think that should be covered.

Comment 33 sefi litmanovich 2016-07-25 12:49:08 UTC
This bug is verified based on test results reported on comment 25 as the scope of this bug is working fine.
As for the collateral bugs they will be open in new bugs.

Comment 35 errata-xmlrpc 2016-07-27 14:14:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-1507.html


Note You need to log in before you can comment on or make changes to this bug.