Bug 1275268

Summary: VM on cluster 3.6 that was upgraded from 3.5 will fail to start due to emulated machine
Product: [oVirt] ovirt-engine Reporter: Roy Golan <rgolan>
Component: BLL.VirtAssignee: Arik <ahadas>
Status: CLOSED CURRENTRELEASE QA Contact: Shira Maximov <mshira>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.6.0CC: ahadas, bugs, gchakkar, mavital, michal.skrivanek, sbonazzo
Target Milestone: ovirt-4.0.0-alphaFlags: michal.skrivanek: ovirt-4.0.0?
rule-engine: planning_ack+
michal.skrivanek: devel_ack+
michal.skrivanek: testing_ack?
Target Release: 4.0.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ovirt 4.0.0 alpha1 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-06-13 11:24:32 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Virt RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
vdsm log with vm start failure - look for "-memory" none

Description Roy Golan 2015-10-26 12:24:08 UTC
Created attachment 1086497 [details]
vdsm log with vm start failure - look for "-memory"

Description of problem:

Upgrading a 3.5 to 3.6 cluster will not change the emulated machine of the cluster. This will prevent VMs to start because we send -memory slot|memdev flag to qemu to support mem hot-plug and that feature isn't supported on old emulated machines.

We must reset the emulated machine on cluster version upgrade.

Workaround: Bring all hosts to maintenance, right-click the cluster and choose "Reset emulated machine". Now start a host and it should set the cluster emulated machine to pc-440fx-rhel7.2 and VMs will start.


How reproducible:
100%

Steps to Reproduce:
1.Have cluster 3.5 with emulated machine set. (i.e at least 1 host was up)
2.Upgrade cluster to 3.6
3.start VM

Actual results:

Vm fails with error from vdsm.log:
2015-10-26T07:26:58.280860Z qemu-kvm: "-memory 'slots|maxmem'" is not supported by: rhel6.5.0

Comment 1 Michal Skrivanek 2015-10-26 14:47:21 UTC
Since this is a basic upgrade flow I'm proposing blocking 3.6 GA until this is fixed

Workaround is to use cluster emulated machine override or disable memory hotplug via engine-config

Comment 2 Arik 2015-10-26 21:15:39 UTC
There is a wrong assumption at the update cluster command that if the cluster compatibility version match the supported cluster levels of a host then the host emulated machine is supported as well. Therefore, we first check the compatibility version and if it matches the cluster then we take the emulated machine flag of one the active hosts (if there is active host in the cluster). However, it might be that there is no valid emulated machine for the host in the requested cluster compatibility version. In that case the cluster will still have the previous emulated machine flag set.

The posted patch adds a validation that every active in the cluster that is going to change has a valid emulated machine for the change. It makes sense because if the host was down, it would not be possible to activate it. This validation prevents us for getting conflicting compatibility version and emulated machine simultaneously.

As a side note, the proposed workaround in comment 1 can be simplified - there is no need to put all the hosts into maintenance (for the reset emulate machine) but only to activate one host that has a valid emulated machine - it will fix the emulated machine of the cluster.

Comment 3 Michal Skrivanek 2015-10-27 15:50:13 UTC
after deeper investigation this should not happen on proper setup. The original report was done on 7.1 host with 7.1 machine types in 3.6 which was only used for development and early beta.
For early adopters who do have 7.1 based hosts we can just add release notes; but regular users updating for 3.5 (and latest 3.6 RC) will not hit this

Hence decreasing severity and postponing to 4.0 to solve few headaches the next time we bump up the machine type

Comment 4 Shira Maximov 2016-06-06 15:35:43 UTC
Can't verify this bug  because it was fixed in 4.0,
In 4.0 there is only 3.6 and 4.0 compatibility versions and they both have the same emulated machine (pc-i440fx-rhel7.2.0), therefore there is no way to verify this bug. 

Roy/Arik - I think you should close this bug and open a new one  for 3.6 .

Comment 5 Roy Golan 2016-06-07 11:25:02 UTC
Arik/Michal I think this patch needs cherry pick to 3.6.

Comment 6 Michal Skrivanek 2016-06-13 11:21:16 UTC
we'll be fixing it only for 4.0 (meaning 4.1 as it's not relevant in 4.0 as per comment #4)
If someone requestes a backport to 3.6 we would need to do comment #5, but I see that unlikely with oVirt 4.0 GA behind the door

Comment 7 Shira Maximov 2016-06-13 11:24:32 UTC
closing this bug based on comment #6 and  comment #4