Created attachment 1086497 [details]
vdsm log with vm start failure - look for "-memory"
Description of problem:
Upgrading a 3.5 to 3.6 cluster will not change the emulated machine of the cluster. This will prevent VMs to start because we send -memory slot|memdev flag to qemu to support mem hot-plug and that feature isn't supported on old emulated machines.
We must reset the emulated machine on cluster version upgrade.
Workaround: Bring all hosts to maintenance, right-click the cluster and choose "Reset emulated machine". Now start a host and it should set the cluster emulated machine to pc-440fx-rhel7.2 and VMs will start.
Steps to Reproduce:
1.Have cluster 3.5 with emulated machine set. (i.e at least 1 host was up)
2.Upgrade cluster to 3.6
Vm fails with error from vdsm.log:
2015-10-26T07:26:58.280860Z qemu-kvm: "-memory 'slots|maxmem'" is not supported by: rhel6.5.0
Since this is a basic upgrade flow I'm proposing blocking 3.6 GA until this is fixed
Workaround is to use cluster emulated machine override or disable memory hotplug via engine-config
There is a wrong assumption at the update cluster command that if the cluster compatibility version match the supported cluster levels of a host then the host emulated machine is supported as well. Therefore, we first check the compatibility version and if it matches the cluster then we take the emulated machine flag of one the active hosts (if there is active host in the cluster). However, it might be that there is no valid emulated machine for the host in the requested cluster compatibility version. In that case the cluster will still have the previous emulated machine flag set.
The posted patch adds a validation that every active in the cluster that is going to change has a valid emulated machine for the change. It makes sense because if the host was down, it would not be possible to activate it. This validation prevents us for getting conflicting compatibility version and emulated machine simultaneously.
As a side note, the proposed workaround in comment 1 can be simplified - there is no need to put all the hosts into maintenance (for the reset emulate machine) but only to activate one host that has a valid emulated machine - it will fix the emulated machine of the cluster.
after deeper investigation this should not happen on proper setup. The original report was done on 7.1 host with 7.1 machine types in 3.6 which was only used for development and early beta.
For early adopters who do have 7.1 based hosts we can just add release notes; but regular users updating for 3.5 (and latest 3.6 RC) will not hit this
Hence decreasing severity and postponing to 4.0 to solve few headaches the next time we bump up the machine type
Can't verify this bug because it was fixed in 4.0,
In 4.0 there is only 3.6 and 4.0 compatibility versions and they both have the same emulated machine (pc-i440fx-rhel7.2.0), therefore there is no way to verify this bug.
Roy/Arik - I think you should close this bug and open a new one for 3.6 .
Arik/Michal I think this patch needs cherry pick to 3.6.
we'll be fixing it only for 4.0 (meaning 4.1 as it's not relevant in 4.0 as per comment #4)
If someone requestes a backport to 3.6 we would need to do comment #5, but I see that unlikely with oVirt 4.0 GA behind the door
closing this bug based on comment #6 and comment #4