Description of problem: It's not possible to change cluster cpu type. I have 2 node cluster with Epyc processors. It was originally installed with 4.2 so it chose CPU type as Opteron G3 (no Epyc support back then). In Engine 4.3 Epyc is available as CPU type when I choose Compatibility Version: 4.3. Big problem is that it doesn't allow to upgrade CPU because all hosts are not in maintenance: "Error while executing action: Cannot change Cluster CPU type unless all Hosts attached to this Cluster are in Maintenance". Putting all hosts to maintenance is impossible because Engine is hosted in the cluster. I tried with Global HA maintenance, but that didn't help. Version-Release number of selected component (if applicable): 4.3 How reproducible: Steps to Reproduce: 1. Install oVirt 4.2 on Epyc hardware with self hosted engine 2. Upgrade to 4.3 3. Try change cpu type from Opteron G3->Epyc Actual results: Can't change CPY type because of this: "Error while executing action: Cannot change Cluster CPU type unless all Hosts attached to this Cluster are in Maintenance" Expected results: You could change CPU type. As it stands upgrading hardware still locks you to old CPU. Additional info:
A manual workaround procedure is: * set HE global maintenance mode * set one of the hosted-engine hosts into maintenance mode * move it to a different temporary cluster * shutdown the engine VM * manually restart the engine VM on the host on the temporary cluster directly executing on that host: 'hosted-engine --vm-start' * connect again to the engine * set all the hosts of the initial cluster into maintenance mode * upgrade the cluster * shut down again the engine VM * manually restart the engine VM on one of the hosts of the initial cluster * move back the host that got into a temporary cluster to its initial cluster but this could be a bit challenging on user side. Let's try to see if can automate it with ovirt-ansible-cluster-upgrade
Why was it using Opteron G3? Was it auto detected in 4.2 as that? Weird...
It was autodetected as such. That's why when 4.2.7 started to warn that CPU is going to be deprecated I was surprised. Then I saw that 4.3 release notes had support for Epyc. Now checking things it seems that QEMU is the reason because KVM users have noticed Epyc->Opteron_G3 switch if qemu is too old. Maybe it's fallback in QEMU?
did you upgrade hosts first? what does "virsh -r capabilities" and "vdsm-client Host getCapabilities" return?
Created attachment 1529079 [details] Virsh capabilities
Created attachment 1529080 [details] VDSM host capabilites
Attached the files for asked capabilities from virsh and VDSM. I updated engine first. Tried to update nodes from there but I had to do it from cli.
thanks! that looks...weird. Is that before or after "rm /var/cache/libvirt/qemu/capabilities/*.xml" (as per bug 1674265)? if you haven't done that, could you give it a try and re-run both capability queries? Also, did you check for any microcode updates for your CPU?
I didn't clear any capabilities this is all result of 4.2 install and upgrade to 4.3. I'll try clearing tomorrow. Didn't check any microcode updates but BIOS should be newest for Proliant Gen10 servers.
Ok. Please do. Also try to remove that cache and reboot and rerun both Do you happen to have a non-upgraded server with the same hardware?
I managed to do the test today. I put nodes in maintenance one after another, cleared the cache and restarted libvirtd. I can upload the files but they are identical in node that I already uploaded. Or not totally vdsm version has differences in gc_timer lines.
Just remembered that I did do 'Refresh capabilities' from webadmin after the node updates. Does it do the same operation?
Forgot to comment that I don't have extra server where to test.
hm. Seems fma4 flag added in G4 and G5 was removed in EPYC, so EPYC processors on non-EPYC enabled oVirt gets detected as G3. That's a problem then when we removed G3. I wonder....it could be that adding G3 back is the most easy solution right now.
Ugly... Even with a known workaround, adding G3 back to upstream is probably the nicest suggestion. It means keeping support for a side-by-side vulnerable CPU type for an entire release, but at least unblocks upgrades
Moving to Virt team for re-introducing G3 back
Deployed 4.2 HE over NFS on 2 hosts and attached NFS storage domain. 4.2 components on engine and hosts: ovirt-engine-setup-4.2.8.7-0.1.el7ev.noarch ovirt-hosted-engine-ha-2.2.19-1.el7ev.noarch ovirt-hosted-engine-setup-2.2.34-1.el7ev.noarch Set host cluster to Conroe and checked that it stays Conroe after editing. Upgraded engine to latest bits of 4.3: ovirt-engine-setup-4.3.3.5-0.1.el7.noarch After engine got upgraded, I also upgraded both hosts to latest 4.3 bits: ovirt-hosted-engine-ha-2.3.1-1.el7ev.noarch ovirt-hosted-engine-setup-2.3.7-1.el7ev.noarch Then pumped up host-cluster's level to 4.3 and automatically Conroe got changed to Nehalem, I approved the change and after that checked the CPU family and it got changed to proper Intel SandyBridge IBRS SSBD Family. Moving to verified.