Bug 1672859
Summary: | Cannot correctly upgrade an hosted engine env from 4.2 to 4.3 if the specific CPU type disappeared in 4.3 | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [oVirt] ovirt-engine | Reporter: | Juhani Rautiainen <juhani.rautiainen> | ||||||
Component: | BLL.Virt | Assignee: | Steven Rosenberg <srosenbe> | ||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Nikolai Sednev <nsednev> | ||||||
Severity: | high | Docs Contact: | |||||||
Priority: | high | ||||||||
Version: | 4.3.0 | CC: | alexander, aperotti, bugs, dfediuck, fkust, juhani.rautiainen, lleistne, lsvaty, michal.skrivanek, mtessun, nsednev, rbarry, stirabos | ||||||
Target Milestone: | ovirt-4.3.3-1 | Keywords: | Triaged | ||||||
Target Release: | --- | Flags: | pm-rhel:
ovirt-4.3+
mtessun: planning_ack+ rbarry: devel_ack+ mavital: testing_ack+ |
||||||
Hardware: | x86_64 | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | ovirt-engine-4.3.3.5 | Doc Type: | If docs needed, set a value | ||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | |||||||||
: | 1694787 (view as bug list) | Environment: | |||||||
Last Closed: | 2019-04-29 13:57:43 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | Virt | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | 1699913 | ||||||||
Bug Blocks: | 1694787 | ||||||||
Attachments: |
|
Description
Juhani Rautiainen
2019-02-06 04:10:08 UTC
A manual workaround procedure is: * set HE global maintenance mode * set one of the hosted-engine hosts into maintenance mode * move it to a different temporary cluster * shutdown the engine VM * manually restart the engine VM on the host on the temporary cluster directly executing on that host: 'hosted-engine --vm-start' * connect again to the engine * set all the hosts of the initial cluster into maintenance mode * upgrade the cluster * shut down again the engine VM * manually restart the engine VM on one of the hosts of the initial cluster * move back the host that got into a temporary cluster to its initial cluster but this could be a bit challenging on user side. Let's try to see if can automate it with ovirt-ansible-cluster-upgrade Why was it using Opteron G3? Was it auto detected in 4.2 as that? Weird... It was autodetected as such. That's why when 4.2.7 started to warn that CPU is going to be deprecated I was surprised. Then I saw that 4.3 release notes had support for Epyc. Now checking things it seems that QEMU is the reason because KVM users have noticed Epyc->Opteron_G3 switch if qemu is too old. Maybe it's fallback in QEMU? did you upgrade hosts first? what does "virsh -r capabilities" and "vdsm-client Host getCapabilities" return? Created attachment 1529079 [details]
Virsh capabilities
Created attachment 1529080 [details]
VDSM host capabilites
Attached the files for asked capabilities from virsh and VDSM. I updated engine first. Tried to update nodes from there but I had to do it from cli. thanks! that looks...weird. Is that before or after "rm /var/cache/libvirt/qemu/capabilities/*.xml" (as per bug 1674265)? if you haven't done that, could you give it a try and re-run both capability queries? Also, did you check for any microcode updates for your CPU? I didn't clear any capabilities this is all result of 4.2 install and upgrade to 4.3. I'll try clearing tomorrow. Didn't check any microcode updates but BIOS should be newest for Proliant Gen10 servers. Ok. Please do. Also try to remove that cache and reboot and rerun both Do you happen to have a non-upgraded server with the same hardware? I managed to do the test today. I put nodes in maintenance one after another, cleared the cache and restarted libvirtd. I can upload the files but they are identical in node that I already uploaded. Or not totally vdsm version has differences in gc_timer lines. Just remembered that I did do 'Refresh capabilities' from webadmin after the node updates. Does it do the same operation? Forgot to comment that I don't have extra server where to test. hm. Seems fma4 flag added in G4 and G5 was removed in EPYC, so EPYC processors on non-EPYC enabled oVirt gets detected as G3. That's a problem then when we removed G3. I wonder....it could be that adding G3 back is the most easy solution right now. Ugly... Even with a known workaround, adding G3 back to upstream is probably the nicest suggestion. It means keeping support for a side-by-side vulnerable CPU type for an entire release, but at least unblocks upgrades Moving to Virt team for re-introducing G3 back Deployed 4.2 HE over NFS on 2 hosts and attached NFS storage domain. 4.2 components on engine and hosts: ovirt-engine-setup-4.2.8.7-0.1.el7ev.noarch ovirt-hosted-engine-ha-2.2.19-1.el7ev.noarch ovirt-hosted-engine-setup-2.2.34-1.el7ev.noarch Set host cluster to Conroe and checked that it stays Conroe after editing. Upgraded engine to latest bits of 4.3: ovirt-engine-setup-4.3.3.5-0.1.el7.noarch After engine got upgraded, I also upgraded both hosts to latest 4.3 bits: ovirt-hosted-engine-ha-2.3.1-1.el7ev.noarch ovirt-hosted-engine-setup-2.3.7-1.el7ev.noarch Then pumped up host-cluster's level to 4.3 and automatically Conroe got changed to Nehalem, I approved the change and after that checked the CPU family and it got changed to proper Intel SandyBridge IBRS SSBD Family. Moving to verified. |