Description of problem: After upgrading RHVH 4.4.2 to 4.4.3 host moves into non-operational status with the error message "Host moved to Non-Operational state as host does not meet the cluster's minimum CPU level. Missing CPU features : model_Cascadelake-Server" There are 12 Hosts with RHVH 4.4.2 vesrion with CPU type "Cascadelake-Server-noTSX" and all are UP. But after upgrading a host to RHVH 4.4.3 fails to activate. Version-Release number of selected component (if applicable): RHVH 4.4.4 Additional info: >> RHVH 4.4.2 : # virsh -r capabilities <capabilities> <host> <uuid>ce220ad6-044d-4ee2-95d0-397d25891689</uuid> <cpu> <arch>x86_64</arch> <model>Cascadelake-Server-noTSX</model> <vendor>Intel</vendor> <microcode version='83898371'/> <counter name='tsc' frequency='3092741000' scaling='yes'/> <topology sockets='1' dies='1' cores='18' threads='1'/> <feature name='ds'/> <feature name='acpi'/> <feature name='ss'/> <feature name='ht'/> <feature name='tm'/> <feature name='pbe'/> <feature name='dtes64'/> <feature name='ds_cpl'/> <feature name='vmx'/> <feature name='smx'/> <feature name='est'/> <feature name='tm2'/> <feature name='xtpr'/> <feature name='pdcm'/> <feature name='dca'/> <feature name='osxsave'/> <feature name='tsc_adjust'/> <feature name='cmt'/> <feature name='intel-pt'/> <feature name='pku'/> <feature name='ospke'/> <feature name='md-clear'/> <feature name='stibp'/> <feature name='arch-capabilities'/> <feature name='xsaves'/> <feature name='mbm_total'/> <feature name='mbm_local'/> <feature name='invtsc'/> <feature name='rdctl-no'/> <feature name='ibrs-all'/> <feature name='skip-l1dfl-vmentry'/> <feature name='mds-no'/> <feature name='tsx-ctrl'/> <pages unit='KiB' size='4'/> <pages unit='KiB' size='2048'/> <pages unit='KiB' size='1048576'/> </cpu> <power_management> <suspend_mem/> <suspend_disk/> <suspend_hybrid/> </power_management> <iommu support='no'/> <migration_features> <live/> <uri_transports> <uri_transport>tcp</uri_transport> <uri_transport>rdma</uri_transport> </uri_transports> </migration_features> <topology> <cells num='4'> <cell id='0'> <memory unit='KiB'>381641552</memory> <pages unit='KiB' size='4'>95410388</pages> <pages unit='KiB' size='2048'>0</pages> <pages unit='KiB' size='1048576'>0</pages> <distances> <sibling id='0' value='10'/> <sibling id='1' value='21'/> <sibling id='2' value='21'/> <sibling id='3' value='21'/> </distances> <cpus num='18'> <cpu id='0' socket_id='0' die_id='0' core_id='0' siblings='0'/> <cpu id='1' socket_id='0' die_id='0' core_id='1' siblings='1'/> <cpu id='2' socket_id='0' die_id='0' core_id='2' siblings='2'/> <cpu id='3' socket_id='0' die_id='0' core_id='3' siblings='3'/> <cpu id='4' socket_id='0' die_id='0' core_id='4' siblings='4'/> <cpu id='5' socket_id='0' die_id='0' core_id='8' siblings='5'/> <cpu id='6' socket_id='0' die_id='0' core_id='9' siblings='6'/> <cpu id='7' socket_id='0' die_id='0' core_id='10' siblings='7'/> <cpu id='8' socket_id='0' die_id='0' core_id='11' siblings='8'/> <cpu id='9' socket_id='0' die_id='0' core_id='16' siblings='9'/> <cpu id='10' socket_id='0' die_id='0' core_id='17' siblings='10'/> <cpu id='11' socket_id='0' die_id='0' core_id='18' siblings='11'/> <cpu id='12' socket_id='0' die_id='0' core_id='19' siblings='12'/> <cpu id='13' socket_id='0' die_id='0' core_id='20' siblings='13'/> <cpu id='14' socket_id='0' die_id='0' core_id='24' siblings='14'/> <cpu id='15' socket_id='0' die_id='0' core_id='25' siblings='15'/> <cpu id='16' socket_id='0' die_id='0' core_id='26' siblings='16'/> <cpu id='17' socket_id='0' die_id='0' core_id='27' siblings='17'/> </cpus> </cell> <cell id='1'> <memory unit='KiB'>383831188</memory> <pages unit='KiB' size='4'>95957797</pages> <pages unit='KiB' size='2048'>0</pages> <pages unit='KiB' size='1048576'>0</pages> <distances> <sibling id='0' value='21'/> <sibling id='1' value='10'/> <sibling id='2' value='21'/> <sibling id='3' value='21'/> </distances> <cpus num='18'> <cpu id='18' socket_id='1' die_id='0' core_id='0' siblings='18'/> <cpu id='19' socket_id='1' die_id='0' core_id='1' siblings='19'/> <cpu id='20' socket_id='1' die_id='0' core_id='2' siblings='20'/> <cpu id='21' socket_id='1' die_id='0' core_id='3' siblings='21'/> <cpu id='22' socket_id='1' die_id='0' core_id='4' siblings='22'/> <cpu id='23' socket_id='1' die_id='0' core_id='8' siblings='23'/> <cpu id='24' socket_id='1' die_id='0' core_id='9' siblings='24'/> <cpu id='25' socket_id='1' die_id='0' core_id='10' siblings='25'/> <cpu id='26' socket_id='1' die_id='0' core_id='11' siblings='26'/> <cpu id='27' socket_id='1' die_id='0' core_id='16' siblings='27'/> <cpu id='28' socket_id='1' die_id='0' core_id='17' siblings='28'/> <cpu id='29' socket_id='1' die_id='0' core_id='18' siblings='29'/> <cpu id='30' socket_id='1' die_id='0' core_id='19' siblings='30'/> <cpu id='31' socket_id='1' die_id='0' core_id='20' siblings='31'/> <cpu id='32' socket_id='1' die_id='0' core_id='24' siblings='32'/> <cpu id='33' socket_id='1' die_id='0' core_id='25' siblings='33'/> <cpu id='34' socket_id='1' die_id='0' core_id='26' siblings='34'/> <cpu id='35' socket_id='1' die_id='0' core_id='27' siblings='35'/> </cpus> </cell> <cell id='2'> <memory unit='KiB'>382232060</memory> <pages unit='KiB' size='4'>95558015</pages> <pages unit='KiB' size='2048'>0</pages> <pages unit='KiB' size='1048576'>0</pages> <distances> <sibling id='0' value='21'/> <sibling id='1' value='21'/> <sibling id='2' value='10'/> <sibling id='3' value='21'/> </distances> <cpus num='18'> <cpu id='36' socket_id='2' die_id='0' core_id='0' siblings='36'/> <cpu id='37' socket_id='2' die_id='0' core_id='1' siblings='37'/> <cpu id='38' socket_id='2' die_id='0' core_id='2' siblings='38'/> <cpu id='39' socket_id='2' die_id='0' core_id='3' siblings='39'/> <cpu id='40' socket_id='2' die_id='0' core_id='4' siblings='40'/> <cpu id='41' socket_id='2' die_id='0' core_id='8' siblings='41'/> <cpu id='42' socket_id='2' die_id='0' core_id='9' siblings='42'/> <cpu id='43' socket_id='2' die_id='0' core_id='10' siblings='43'/> <cpu id='44' socket_id='2' die_id='0' core_id='11' siblings='44'/> <cpu id='45' socket_id='2' die_id='0' core_id='16' siblings='45'/> <cpu id='46' socket_id='2' die_id='0' core_id='17' siblings='46'/> <cpu id='47' socket_id='2' die_id='0' core_id='18' siblings='47'/> <cpu id='48' socket_id='2' die_id='0' core_id='19' siblings='48'/> <cpu id='49' socket_id='2' die_id='0' core_id='20' siblings='49'/> <cpu id='50' socket_id='2' die_id='0' core_id='24' siblings='50'/> <cpu id='51' socket_id='2' die_id='0' core_id='25' siblings='51'/> <cpu id='52' socket_id='2' die_id='0' core_id='26' siblings='52'/> <cpu id='53' socket_id='2' die_id='0' core_id='27' siblings='53'/> </cpus> </cell> <cell id='3'> <memory unit='KiB'>383257236</memory> <pages unit='KiB' size='4'>95814309</pages> <pages unit='KiB' size='2048'>0</pages> <pages unit='KiB' size='1048576'>0</pages> <distances> <sibling id='0' value='21'/> <sibling id='1' value='21'/> <sibling id='2' value='21'/> <sibling id='3' value='10'/> </distances> <cpus num='18'> <cpu id='54' socket_id='3' die_id='0' core_id='0' siblings='54'/> <cpu id='55' socket_id='3' die_id='0' core_id='1' siblings='55'/> <cpu id='56' socket_id='3' die_id='0' core_id='2' siblings='56'/> <cpu id='57' socket_id='3' die_id='0' core_id='3' siblings='57'/> <cpu id='58' socket_id='3' die_id='0' core_id='4' siblings='58'/> <cpu id='59' socket_id='3' die_id='0' core_id='8' siblings='59'/> <cpu id='60' socket_id='3' die_id='0' core_id='9' siblings='60'/> <cpu id='61' socket_id='3' die_id='0' core_id='10' siblings='61'/> <cpu id='62' socket_id='3' die_id='0' core_id='11' siblings='62'/> <cpu id='63' socket_id='3' die_id='0' core_id='16' siblings='63'/> <cpu id='64' socket_id='3' die_id='0' core_id='17' siblings='64'/> <cpu id='65' socket_id='3' die_id='0' core_id='18' siblings='65'/> <cpu id='66' socket_id='3' die_id='0' core_id='19' siblings='66'/> <cpu id='67' socket_id='3' die_id='0' core_id='20' siblings='67'/> <cpu id='68' socket_id='3' die_id='0' core_id='24' siblings='68'/> <cpu id='69' socket_id='3' die_id='0' core_id='25' siblings='69'/> <cpu id='70' socket_id='3' die_id='0' core_id='26' siblings='70'/> <cpu id='71' socket_id='3' die_id='0' core_id='27' siblings='71'/> </cpus> </cell> </cells> </topology> <cache> <bank id='0' level='3' type='both' size='25344' unit='KiB' cpus='0-17'/> <bank id='1' level='3' type='both' size='25344' unit='KiB' cpus='18-35'/> <bank id='2' level='3' type='both' size='25344' unit='KiB' cpus='36-53'/> <bank id='3' level='3' type='both' size='25344' unit='KiB' cpus='54-71'/> </cache> <secmodel> <model>selinux</model> <doi>0</doi> <baselabel type='kvm'>system_u:system_r:svirt_t:s0</baselabel> <baselabel type='qemu'>system_u:system_r:svirt_tcg_t:s0</baselabel> </secmodel> <secmodel> <model>dac</model> <doi>0</doi> <baselabel type='kvm'>+107:+107</baselabel> <baselabel type='qemu'>+107:+107</baselabel> </secmodel> </host> <guest> <os_type>hvm</os_type> <arch name='i686'> <wordsize>32</wordsize> <emulator>/usr/libexec/qemu-kvm</emulator> <machine maxCpus='240'>pc-i440fx-rhel7.6.0</machine> <machine canonical='pc-i440fx-rhel7.6.0' maxCpus='240'>pc</machine> <machine maxCpus='240'>pc-i440fx-rhel7.0.0</machine> <machine maxCpus='240'>pc-i440fx-rhel7.5.0</machine> <machine maxCpus='240'>pc-i440fx-rhel7.3.0</machine> <machine maxCpus='512'>pc-q35-rhel8.3.0</machine> <machine canonical='pc-q35-rhel8.3.0' maxCpus='512'>q35</machine> <machine maxCpus='512'>pc-q35-rhel7.6.0</machine> <machine maxCpus='240'>pc-i440fx-rhel7.1.0</machine> <machine maxCpus='512'>pc-q35-rhel8.1.0</machine> <machine maxCpus='512'>pc-q35-rhel7.4.0</machine> <machine maxCpus='240'>pc-i440fx-rhel7.4.0</machine> <machine maxCpus='512'>pc-q35-rhel8.2.0</machine> <machine maxCpus='512'>pc-q35-rhel7.5.0</machine> <machine maxCpus='240'>pc-i440fx-rhel7.2.0</machine> <machine maxCpus='512'>pc-q35-rhel8.0.0</machine> <machine maxCpus='255'>pc-q35-rhel7.3.0</machine> <domain type='qemu'/> <domain type='kvm'/> </arch> <features> <pae/> <nonpae/> <acpi default='on' toggle='yes'/> <apic default='on' toggle='no'/> <cpuselection/> <deviceboot/> <disksnapshot default='on' toggle='no'/> </features> </guest> <guest> <os_type>hvm</os_type> <arch name='x86_64'> <wordsize>64</wordsize> <emulator>/usr/libexec/qemu-kvm</emulator> <machine maxCpus='240'>pc-i440fx-rhel7.6.0</machine> <machine canonical='pc-i440fx-rhel7.6.0' maxCpus='240'>pc</machine> <machine maxCpus='240'>pc-i440fx-rhel7.0.0</machine> <machine maxCpus='240'>pc-i440fx-rhel7.5.0</machine> <machine maxCpus='240'>pc-i440fx-rhel7.3.0</machine> <machine maxCpus='512'>pc-q35-rhel8.3.0</machine> <machine canonical='pc-q35-rhel8.3.0' maxCpus='512'>q35</machine> <machine maxCpus='512'>pc-q35-rhel7.6.0</machine> <machine maxCpus='240'>pc-i440fx-rhel7.1.0</machine> <machine maxCpus='512'>pc-q35-rhel8.1.0</machine> <machine maxCpus='512'>pc-q35-rhel7.4.0</machine> <machine maxCpus='240'>pc-i440fx-rhel7.4.0</machine> <machine maxCpus='512'>pc-q35-rhel8.2.0</machine> <machine maxCpus='512'>pc-q35-rhel7.5.0</machine> <machine maxCpus='240'>pc-i440fx-rhel7.2.0</machine> <machine maxCpus='512'>pc-q35-rhel8.0.0</machine> <machine maxCpus='255'>pc-q35-rhel7.3.0</machine> <domain type='qemu'/> <domain type='kvm'/> </arch> <features> <acpi default='on' toggle='yes'/> <apic default='on' toggle='no'/> <cpuselection/> <deviceboot/> <disksnapshot default='on' toggle='no'/> </features> </guest> </capabilities> >>> RHVH 4.4.3 : # virsh -r capabilities <capabilities> <host> <uuid>5c6d6826-e1df-4889-a123-8072b4d6e50e</uuid> <cpu> <arch>x86_64</arch> <model>Cascadelake-Server-noTSX</model> <vendor>Intel</vendor> <microcode version='83898371'/> <counter name='tsc' frequency='2494106000' scaling='yes'/> <topology sockets='1' dies='1' cores='20' threads='1'/> <feature name='ds'/> <feature name='acpi'/> <feature name='ss'/> <feature name='ht'/> <feature name='tm'/> <feature name='pbe'/> <feature name='dtes64'/> <feature name='ds_cpl'/> <feature name='vmx'/> <feature name='smx'/> <feature name='est'/> <feature name='tm2'/> <feature name='xtpr'/> <feature name='pdcm'/> <feature name='dca'/> <feature name='osxsave'/> <feature name='tsc_adjust'/> <feature name='cmt'/> <feature name='intel-pt'/> <feature name='pku'/> <feature name='ospke'/> <feature name='md-clear'/> <feature name='stibp'/> <feature name='arch-capabilities'/> <feature name='xsaves'/> <feature name='mbm_total'/> <feature name='mbm_local'/> <feature name='invtsc'/> <feature name='rdctl-no'/> <feature name='ibrs-all'/> <feature name='skip-l1dfl-vmentry'/> <feature name='mds-no'/> <feature name='tsx-ctrl'/> <pages unit='KiB' size='4'/> <pages unit='KiB' size='2048'/> <pages unit='KiB' size='1048576'/> </cpu> <power_management> <suspend_mem/> <suspend_disk/> <suspend_hybrid/> </power_management> <iommu support='no'/> <migration_features> <live/> <uri_transports> <uri_transport>tcp</uri_transport> <uri_transport>rdma</uri_transport> </uri_transports> </migration_features> <topology> <cells num='2'> <cell id='0'> <memory unit='KiB'>251195376</memory> <pages unit='KiB' size='4'>62798844</pages> <pages unit='KiB' size='2048'>0</pages> <pages unit='KiB' size='1048576'>0</pages> <distances> <sibling id='0' value='10'/> <sibling id='1' value='21'/> </distances> <cpus num='20'> <cpu id='0' socket_id='0' die_id='0' core_id='0' siblings='0'/> <cpu id='1' socket_id='0' die_id='0' core_id='1' siblings='1'/> <cpu id='2' socket_id='0' die_id='0' core_id='2' siblings='2'/> <cpu id='3' socket_id='0' die_id='0' core_id='3' siblings='3'/> <cpu id='4' socket_id='0' die_id='0' core_id='4' siblings='4'/> <cpu id='5' socket_id='0' die_id='0' core_id='8' siblings='5'/> <cpu id='6' socket_id='0' die_id='0' core_id='9' siblings='6'/> <cpu id='7' socket_id='0' die_id='0' core_id='10' siblings='7'/> <cpu id='8' socket_id='0' die_id='0' core_id='11' siblings='8'/> <cpu id='9' socket_id='0' die_id='0' core_id='12' siblings='9'/> <cpu id='10' socket_id='0' die_id='0' core_id='16' siblings='10'/> <cpu id='11' socket_id='0' die_id='0' core_id='17' siblings='11'/> <cpu id='12' socket_id='0' die_id='0' core_id='18' siblings='12'/> <cpu id='13' socket_id='0' die_id='0' core_id='19' siblings='13'/> <cpu id='14' socket_id='0' die_id='0' core_id='20' siblings='14'/> <cpu id='15' socket_id='0' die_id='0' core_id='24' siblings='15'/> <cpu id='16' socket_id='0' die_id='0' core_id='25' siblings='16'/> <cpu id='17' socket_id='0' die_id='0' core_id='26' siblings='17'/> <cpu id='18' socket_id='0' die_id='0' core_id='27' siblings='18'/> <cpu id='19' socket_id='0' die_id='0' core_id='28' siblings='19'/> </cpus> </cell> <cell id='1'> <memory unit='KiB'>252916580</memory> <pages unit='KiB' size='4'>63229145</pages> <pages unit='KiB' size='2048'>0</pages> <pages unit='KiB' size='1048576'>0</pages> <distances> <sibling id='0' value='21'/> <sibling id='1' value='10'/> </distances> <cpus num='20'> <cpu id='20' socket_id='1' die_id='0' core_id='0' siblings='20'/> <cpu id='21' socket_id='1' die_id='0' core_id='1' siblings='21'/> <cpu id='22' socket_id='1' die_id='0' core_id='2' siblings='22'/> <cpu id='23' socket_id='1' die_id='0' core_id='3' siblings='23'/> <cpu id='24' socket_id='1' die_id='0' core_id='4' siblings='24'/> <cpu id='25' socket_id='1' die_id='0' core_id='8' siblings='25'/> <cpu id='26' socket_id='1' die_id='0' core_id='9' siblings='26'/> <cpu id='27' socket_id='1' die_id='0' core_id='10' siblings='27'/> <cpu id='28' socket_id='1' die_id='0' core_id='11' siblings='28'/> <cpu id='29' socket_id='1' die_id='0' core_id='12' siblings='29'/> <cpu id='30' socket_id='1' die_id='0' core_id='16' siblings='30'/> <cpu id='31' socket_id='1' die_id='0' core_id='17' siblings='31'/> <cpu id='32' socket_id='1' die_id='0' core_id='18' siblings='32'/> <cpu id='33' socket_id='1' die_id='0' core_id='19' siblings='33'/> <cpu id='34' socket_id='1' die_id='0' core_id='20' siblings='34'/> <cpu id='35' socket_id='1' die_id='0' core_id='24' siblings='35'/> <cpu id='36' socket_id='1' die_id='0' core_id='25' siblings='36'/> <cpu id='37' socket_id='1' die_id='0' core_id='26' siblings='37'/> <cpu id='38' socket_id='1' die_id='0' core_id='27' siblings='38'/> <cpu id='39' socket_id='1' die_id='0' core_id='28' siblings='39'/> </cpus> </cell> </cells> </topology> <cache> <bank id='0' level='3' type='both' size='28160' unit='KiB' cpus='0-19'/> <bank id='1' level='3' type='both' size='28160' unit='KiB' cpus='20-39'/> </cache> <secmodel> <model>selinux</model> <doi>0</doi> <baselabel type='kvm'>system_u:system_r:svirt_t:s0</baselabel> <baselabel type='qemu'>system_u:system_r:svirt_tcg_t:s0</baselabel> </secmodel> <secmodel> <model>dac</model> <doi>0</doi> <baselabel type='kvm'>+107:+107</baselabel> <baselabel type='qemu'>+107:+107</baselabel> </secmodel> </host> <guest> <os_type>hvm</os_type> <arch name='i686'> <wordsize>32</wordsize> <emulator>/usr/libexec/qemu-kvm</emulator> <machine maxCpus='240'>pc-i440fx-rhel7.6.0</machine> <machine canonical='pc-i440fx-rhel7.6.0' maxCpus='240'>pc</machine> <machine maxCpus='240'>pc-i440fx-rhel7.0.0</machine> <machine maxCpus='240'>pc-i440fx-rhel7.5.0</machine> <machine maxCpus='240'>pc-i440fx-rhel7.3.0</machine> <machine maxCpus='512'>pc-q35-rhel8.3.0</machine> <machine canonical='pc-q35-rhel8.3.0' maxCpus='512'>q35</machine> <machine maxCpus='512'>pc-q35-rhel7.6.0</machine> <machine maxCpus='240'>pc-i440fx-rhel7.1.0</machine> <machine maxCpus='512'>pc-q35-rhel8.1.0</machine> <machine maxCpus='512'>pc-q35-rhel7.4.0</machine> <machine maxCpus='240'>pc-i440fx-rhel7.4.0</machine> <machine maxCpus='512'>pc-q35-rhel8.2.0</machine> <machine maxCpus='512'>pc-q35-rhel7.5.0</machine> <machine maxCpus='240'>pc-i440fx-rhel7.2.0</machine> <machine maxCpus='512'>pc-q35-rhel8.0.0</machine> <machine maxCpus='255'>pc-q35-rhel7.3.0</machine> <domain type='qemu'/> <domain type='kvm'/> </arch> <features> <pae/> <nonpae/> <acpi default='on' toggle='yes'/> <apic default='on' toggle='no'/> <cpuselection/> <deviceboot/> <disksnapshot default='on' toggle='no'/> </features> </guest> <guest> <os_type>hvm</os_type> <arch name='x86_64'> <wordsize>64</wordsize> <emulator>/usr/libexec/qemu-kvm</emulator> <machine maxCpus='240'>pc-i440fx-rhel7.6.0</machine> <machine canonical='pc-i440fx-rhel7.6.0' maxCpus='240'>pc</machine> <machine maxCpus='240'>pc-i440fx-rhel7.0.0</machine> <machine maxCpus='240'>pc-i440fx-rhel7.5.0</machine> <machine maxCpus='240'>pc-i440fx-rhel7.3.0</machine> <machine maxCpus='512'>pc-q35-rhel8.3.0</machine> <machine canonical='pc-q35-rhel8.3.0' maxCpus='512'>q35</machine> <machine maxCpus='512'>pc-q35-rhel7.6.0</machine> <machine maxCpus='240'>pc-i440fx-rhel7.1.0</machine> <machine maxCpus='512'>pc-q35-rhel8.1.0</machine> <machine maxCpus='512'>pc-q35-rhel7.4.0</machine> <machine maxCpus='240'>pc-i440fx-rhel7.4.0</machine> <machine maxCpus='512'>pc-q35-rhel8.2.0</machine> <machine maxCpus='512'>pc-q35-rhel7.5.0</machine> <machine maxCpus='240'>pc-i440fx-rhel7.2.0</machine> <machine maxCpus='512'>pc-q35-rhel8.0.0</machine> <machine maxCpus='255'>pc-q35-rhel7.3.0</machine> <domain type='qemu'/> <domain type='kvm'/> </arch> <features> <acpi default='on' toggle='yes'/> <apic default='on' toggle='no'/> <cpuselection/> <deviceboot/> <disksnapshot default='on' toggle='no'/> </features> </guest> </capabilities>
That shouldn't normally happen - the cluster's settings should remain the same after the upgrade. Could you please shed more light into the upgrade process - did the compatibility level of the cluster changed to '4.5'? were all hosts put into maintenance and then activated after the upgrade maybe? That can explain why the hosts are "matched" against the new CPU settings. But regardless of what lead to this, changing the CPU type of the cluster to Secure Intel Cascadelake Server Family would likely fix it.
The Host was upgraded from 4.4.2 to 4.4.3 using "Installation -> Upgrade" in the RHV-M webui. After the Host was upgraded and rebooted the error message "Host X.X.X moved to Non-Operational state as host does not meet the cluster's minimum CPU level. Missing CPU features : model_Skylake-Server" appeared when the Host was activated. The Cluster settings did not change.
Thanks, can you please provide the output of: select name, cpu_name, cpu_flags, compatibility_version from cluster;
What has changed is the microcode for many CPUs in 8.3, namely the removal of TSX from Cascadelake and Skylake, and libvirt changes to accomodate that. This results in model_Cascadelake-Server not being reported anymore(since the CPU now doesn't have TSX). "Secure Intel Cascadelake Server" now requires model_Cascadelake-Server-noTSX to be reported. If it is not (note you need to provide virsh domcapabilities, not capabilities to be able to tell what's reported), then the cluster is no longer using a secure type and needs to be changed to "regular"/insecure Cascadelake or CPU microcode needs to be updated Similarly for Skylake
and vice versa, unfortunately, due to the limitations of CPU definitions if the CPU has updated microcode and reports -noTSX type we can't use it with "insecure" type definition. So you cannot use "Cascadelake-Server" Cluster type with Cascadelake-Server-noTSX CPUs
The information requested from me has been provided through support.
(In reply to Sigbjorn Customer from comment #14) > The information requested from me has been provided through support. Why was the needinfo flag from michal.skrivanek removed and the needinfo flag for me kept? Adding needinfo back for michal.skrivanek
We investigated the issue further and found out that this could happen if the upgrade of the engine (from 4.4.2 to 4.4.3) has been performed while some of the hosts in the cluster were not in the UP state. This would cause that the cluster would use the old configuration (not supported in RHEL 8.3 [1]) If that happens, there is a workaround: 1. make sure all hosts in the cluster are UP (either by removing those that are not or by moving them to a different cluster) 2. you can either restart the engine OR put one of the active hosts in the maintanance and activate again this should cause the cluster CPU configuration to be updated and you can move the 8.3 hosts back to the cluster. However, there is a different bug that causes that it is not possible to migrate currently running VMs from 8.2 hosts to 8.3 unless they are restarted [2]. To prevent the situation in the future, we'll change the updating of the cluster cpu configuration so that it is updated even if some hosts are not UP. Furthermore, we should check during Upgrade host action that the cluster cpu configuration is up to date before continuing with the upgrade. 1: https://www.phoronix.com/scan.php?page=news_item&px=Red-Hat-RHEL-8.3 2: https://bugzilla.redhat.com/show_bug.cgi?id=1907973
Restarting ovirt-engine did not change cluster CPU configuration, even when all hosts in the cluster we're UP. I have tested several times. Manually configuring the cluster CPU configuration to "Secure Intel Skylake Server Family" allows me to activate the 8.3 host in the existing cluster. (ALL hosts have CPU type "Secure Intel Cascadelake Server Family" listed in RHV Manager) VMs *with pending* next_run_config because of the CPU configuration change to the cluster, cannot be migrated to the 8.3 host as per your description above. VMs rebooted and *no longer* having next_run_config because of the CPU configuration change to the cluster, can be migrated between 8.2 and 8.3 hosts, regardless of the VM being initially started on a 8.2 host or a 8.3 host. None of these upgrade paths allows us to upgrade hypervisors without downtime to the running VMs.
Verified with: ovirt-engine-4.4.5.7-0.1.el8ev.noarch ovirt-engine-4.4.2.6-0.2.el8ev.noarch Red Hat Virtualization Host 4.4.2 (el8.2)(rhvh-4.4.2.1-0.20200929.0) Host cpu info (only has Cascadelake machines) [root@janus04 ~]# lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit CPU family: 6 Model: 85 Model name: Intel(R) Xeon(R) Gold 5218 CPU @ 2.30GHz Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single intel_ppin ssbd mba ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts pku ospke avx512_vnni md_clear flush_l1d arch_capabilities Steps and results: Part 1, test Secure Intel Cascadelake Server Family cpu type 1. Create 4.4.2 engine 2. Create two 4.4.2 RHVH hosts [root@janus04 ~]# virsh domcapabilities <cpu> <mode name='host-passthrough' supported='yes'/> <mode name='host-model' supported='yes'> <model fallback='forbid'>Cascadelake-Server</model> <vendor>Intel</vendor> 3. Create a cluster cluster_secure with Secure Intel Cascadelake Server Family cpu type, check cluster cpu configuration: - cluster cpu type configuration on UI is "Cascadelake-Server,+md-clear,+mds-no,-hle,-rtm,+tsx-ctrl,+arch-capabilities" - cluster cpu info in DB: engine=# select name, cpu_name, cpu_flags, compatibility_version from cluster; select name, cpu_name, cpu_flags, compatibility_version from cluster; name | cpu_name | cpu_flags | compatibility_version ----------------+----------------------------------------+----------------------------------------------+----------------------- cluster_secure | Secure Intel Cascadelake Server Family | vmx,mds-no,md-clear,model_Cascadelake-Server | 4.4 4. Add the two 4.4.2 RHVH hosts to cluster_secure 5. Make one host to nonresponsive status(stop vdsmd on host) 6. Upgrade engine to 4.4.5, check cluster info after upgrade: - cluster cpu type configuration on UI is still "Cascadelake-Server,+md-clear,+mds-no,-hle,-rtm,+tsx-ctrl,+arch-capabilities" - cluster cpu info in DB: engine=# select name, cpu_name, cpu_flags, compatibility_version from cluster; select name, cpu_name, cpu_flags, compatibility_version from cluster; name | cpu_name | cpu_flags | compatibility_version ----------------+----------------------------------------+----------------------------------------------+----------------------- cluster_secure | Secure Intel Cascadelake Server Family | vmx,mds-no,md-clear,model_Cascadelake-Server | 4.4 - on cluster list page, there is a warning "The cluster CPU configuration is outdated, VMs are running in a degraded mode." - on cluster details page, there is a warning "The cluster CPU configuration is outdated, VMs are running in a degraded mode with 'Cascadelake-Server,+md-clear,+mds-no,-hle,-rtm,+tsx-ctrl,+arch-capabilities' configuration. Please make sure all hosts in the cluster are in the Up state and do support the new configuration 'Cascadelake-Server-noTSX'. Then either restart the engine or bring one of the hosts into the Maintenance and Up again." 7. Try to upgrade the up 4.4.2 host Upgrade is prevented, there is a warning "Cannot upgrade Host. The cluster CPU configuration is outdated, Please make sure all hosts in the cluster are in the Up state and do support the new configuration. Then either restart the engine or bring one of the hosts into the Maintenance and Up again." 8. Bring the nonresponsive 4.4.2 host up - warnings on cluster list and cluster details page disappear. - cluster cpu type configuration on UI changes to Cascadelake-Server-noTSX - cluster cpu info in DB: engine=# select name, cpu_name, cpu_flags, compatibility_version from cluster; select name, cpu_name, cpu_flags, compatibility_version from cluster; name | cpu_name | cpu_flags | compatibility_version ----------------+----------------------------------------+------------------------------------+----------------------- cluster_secure | Secure Intel Cascadelake Server Family | vmx,model_Cascadelake-Server-noTSX | 4.4 9. Try to upgrade 4.4.2 host again Upgrade is allowed, no "Cannot upgrade host.." warning. Part 2, test Intel Cascadelake Server Family cpu type 1. Create 4.4.2 engine 2. Create two 4.4.2 RHVH hosts [root@janus04 ~]# virsh domcapabilities <cpu> <mode name='host-passthrough' supported='yes'/> <mode name='host-model' supported='yes'> <model fallback='forbid'>Cascadelake-Server</model> <vendor>Intel</vendor> 3. Create a cluster cluster_insecure with Intel Cascadelake Server Family cpu type, check cluster cpu configuration: - cluster cpu type configuration on UI is "Cascadelake-Server,-hle,-rtm,+arch-capabilities" - cluster cpu info in DB: engine=# select name, cpu_name, cpu_flags, compatibility_version from cluster; select name, cpu_name, cpu_flags, compatibility_version from cluster; name | cpu_name | cpu_flags | compatibility_version ------------------+---------------------------------+------------------------------+----------------------- cluster_insecure | Intel Cascadelake Server Family | vmx,model_Cascadelake-Server | 4.4 4. Add the two 4.4.2 RHVH hosts to cluster_insecure 5. Make one host to nonresponsive status(stop vdsmd on host) 6. Upgrade engine to 4.4.5, check cluster info after upgrade: - cluster cpu type configuration on UI is still "Cascadelake-Server,-hle,-rtm,+arch-capabilities" - cluster cpu info in DB: engine=# select name, cpu_name, cpu_flags, compatibility_version from cluster; select name, cpu_name, cpu_flags, compatibility_version from cluster; name | cpu_name | cpu_flags | compatibility_version ------------------+---------------------------------+------------------------------+----------------------- cluster_insecure | Intel Cascadelake Server Family | vmx,model_Cascadelake-Server | 4.4 - on cluster list page, there is a warning "The cluster CPU configuration is outdated, VMs are running in a degraded mode." - on cluster details page, there is a warning "The cluster CPU configuration is outdated, VMs are running in a degraded mode with 'Cascadelake-Server,-hle,-rtm,+arch-capabilities' configuration. Please make sure all hosts in the cluster are in the Up state and do support the new configuration 'Cascadelake-Server,-hle,-rtm'. Then either restart the engine or bring one of the hosts into the Maintenance and Up again." 7. Try to upgrade the up 4.4.2 host - there is a warning "The cluster CPU type definition requires the TSX cpu feature that is disabled on hosts 8.3 and above. It is recommended to change the cluster CPU type to secure variant before continuing with the host upgrade." - upgrade is allowed. 8. Bring the nonresponsive 4.4.2 host up - warnings on cluster list and cluster details page disappear. - cluster cpu type configuration on UI changes to "Cascadelake-Server,-hle,-rtm" - cluster cpu info in DB: engine=# select name, cpu_name, cpu_flags, compatibility_version from cluster; select name, cpu_name, cpu_flags, compatibility_version from cluster; name | cpu_name | cpu_flags | compatibility_version ------------------+---------------------------------+------------------------------+----------------------- cluster_insecure | Intel Cascadelake Server Family | vmx,model_Cascadelake-Server | 4.4 9. Try o upgrade 4.4.2 host - has the same warning as in step7. - upgrade is allowed. According to the above testing, on 4.4.5 engine, host upgrade is prevented when cluster cpu type is set with Secure Intel Cascadelake Server Family, but has no "noTSX" configuration. Host upgrade is allowed with a warning when cluster cpu type is set with Intel Cascadelake Server Family. Warnings on cluster list page and cluster details page work as expected.
The documentation text flag should only be set after 'doc text' field is provided. Please provide the documentation text and set the flag to '?' again.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: RHV Manager (ovirt-engine) 4.4.z [ovirt-4.4.5] security, bug fix, enhancement), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:1169