Bug 1830872 - In UI HE reports for pending CPU type changes after restart, but they never happen.
Summary: In UI HE reports for pending CPU type changes after restart, but they never h...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: General
Version: 4.4.0
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: ovirt-4.4.1
: 4.4.1.5
Assignee: Lucia Jelinkova
QA Contact: Nikolai Sednev
URL:
Whiteboard:
: 1833604 1856684 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-05-04 07:40 UTC by Nikolai Sednev
Modified: 2020-07-21 13:45 UTC (History)
15 users (show)

Fixed In Version: ovirt-engine-4.4.1.5
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-07-08 08:26:47 UTC
oVirt Team: Virt
Embargoed:
pm-rhel: ovirt-4.4+
mtessun: planning_ack+
ahadas: devel_ack+
mavital: testing_ack+


Attachments (Terms of Use)
UI engine Pending Virtual Machine changes (4.50 KB, image/png)
2020-05-04 07:40 UTC, Nikolai Sednev
no flags Details
UI engine Pending Virtual Machine changes (67.16 KB, image/png)
2020-05-04 07:41 UTC, Nikolai Sednev
no flags Details
in 4.3 there is no triangle symbol in UI (62.56 KB, image/png)
2020-05-17 10:54 UTC, Nikolai Sednev
no flags Details
In 4.3 HE configured as SandyBridge IBRS (110.52 KB, image/png)
2020-05-17 10:55 UTC, Nikolai Sednev
no flags Details
Screenshot from 2020-05-18 14-15-36.png (90.76 KB, image/png)
2020-05-18 11:24 UTC, Nikolai Sednev
no flags Details
"!" symbol screenshot (109.67 KB, image/png)
2020-05-19 11:59 UTC, Nikolai Sednev
no flags Details
Snapshots screenshot (70.21 KB, image/png)
2020-05-19 12:03 UTC, Nikolai Sednev
no flags Details
HE general tab (108.10 KB, image/png)
2020-05-19 13:25 UTC, Nikolai Sednev
no flags Details
Host cluster general tab (72.92 KB, image/png)
2020-05-19 13:26 UTC, Nikolai Sednev
no flags Details
Video of CPU type change error for host cluster (585.97 KB, application/x-matroska)
2020-05-19 13:29 UTC, Nikolai Sednev
no flags Details
engine log for CLUSTER_CANNOT_UPDATE_VM_COMPATIBILITY_VERSION (1.40 MB, text/plain)
2020-05-20 10:02 UTC, Nikolai Sednev
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 109222 0 master MERGED webadmin: Update error message 2020-09-09 07:42:03 UTC
oVirt gerrit 109224 0 master MERGED webadmin: Change Guest CPU value for HE 2020-09-09 07:42:04 UTC

Description Nikolai Sednev 2020-05-04 07:40:03 UTC
Created attachment 1684708 [details]
UI engine Pending Virtual Machine changes

Description of problem:
In UI HE reports for pending CPU type changes after restart, but they never happen.
My host's type is:
alma03 ~]# lscpu
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              4
On-line CPU(s) list: 0-3
Thread(s) per core:  1
Core(s) per socket:  4
Socket(s):           1
NUMA node(s):        1
Vendor ID:           GenuineIntel
CPU family:          6
Model:               62
Model name:          Intel(R) Xeon(R) CPU E5-2603 v2 @ 1.80GHz
Stepping:            4
CPU MHz:             1800.158
CPU max MHz:         1800.0000
CPU min MHz:         1200.0000
BogoMIPS:            3599.95
Virtualization:      VT-x
L1d cache:           32K
L1i cache:           32K
L2 cache:            256K
L3 cache:            10240K
NUMA node0 CPU(s):   0-3
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm cpuid_fault epb pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm arat pln pts md_clear flush_l1d

Engine reports it's CPU type as:
nsednev-he-1 ~]# lscpu
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              4
On-line CPU(s) list: 0-3
Thread(s) per core:  1
Core(s) per socket:  4
Socket(s):           1
NUMA node(s):        1
Vendor ID:           GenuineIntel
CPU family:          6
Model:               58
Model name:          Intel Xeon E3-12xx v2 (Ivy Bridge)
Stepping:            9
CPU MHz:             1799.998
BogoMIPS:            3599.99
Hypervisor vendor:   KVM
Virtualization type: full
L1d cache:           32K
L1i cache:           32K
L2 cache:            4096K
L3 cache:            16384K
NUMA node0 CPU(s):   0-3
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_tsc rep_good nopl xtopology cpuid tsc_known_freq pni pclmulqdq ssse3 cx16 sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm cpuid_fault pti fsgsbase smep erms xsaveopt arat

In engine log I see as follows:
[root@nsednev-he-1 ~]# less /var/log/ovirt-engine/engine.log | grep cpu
  <vcpu placement='static'>4</vcpu>
  <cpu mode='custom' match='exact' check='full'>
  </cpu>
  <vcpu placement='static' current='4'>64</vcpu>
  <cpu mode='custom' match='exact' check='full'>
      <cell id='0' cpus='0-63' memory='16777216' unit='KiB'/>
  </cpu>
  <vcpu placement='static' current='4'>64</vcpu>
  <cpu mode='custom' match='exact' check='full'>
      <cell id='0' cpus='0-63' memory='16777216' unit='KiB'/>
  </cpu>
  <vcpu placement='static' current='4'>64</vcpu>
  <cpu mode='custom' match='exact' check='full'>
      <cell id='0' cpus='0-63' memory='16777216' unit='KiB'/>
  </cpu>
  <vcpu placement='static' current='4'>64</vcpu>
  <cpu mode='custom' match='exact' check='full'>
      <cell id='0' cpus='0-63' memory='16777216' unit='KiB'/>
  </cpu>
  <vcpu placement='static' current='4'>64</vcpu>
  <cpu mode='custom' match='exact' check='full'>
      <cell id='0' cpus='0-63' memory='16777216' unit='KiB'/>
  </cpu>
[root@nsednev-he-1 ~]# less /var/log/ovirt-engine/engine.log | grep cluster
2020-04-20 16:35:47,749+03 INFO  [org.ovirt.engine.core.bll.ServiceLoader] (ServerService Thread Pool -- 53) [] Start org.ovirt.engine.core.bll.network.cluster.ExternalNetworkSyncService@4fbb71f9 
2020-04-20 16:38:08,062+03 INFO  [org.ovirt.engine.core.bll.ServiceLoader] (ServerService Thread Pool -- 50) [] Start org.ovirt.engine.core.bll.network.cluster.ExternalNetworkSyncService@3e6940d0 
2020-04-20 16:43:53,620+03 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-1) [5b4c3961] EVENT_ID: ANSIBLE_RUNNER_EVENT_NOTIFICATION(559), Installing Host alma04.qa.lab.tlv.redhat.com. Apply cluster specific firewalld rules.
2020-04-20 16:44:37,389+03 INFO  [org.ovirt.engine.core.bll.hostdeploy.InstallVdsInternalCommand] (EE-ManagedThreadFactory-engine-Thread-1) [5b4c3961] Skipping Reconfigure gluster since cluster does not support gluster
2020-04-20 16:44:43,099+03 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.HostSetupNetworksVDSCommand] (EE-ManagedThreadFactory-engine-Thread-1) [248abd65] START, HostSetupNetworksVDSCommand(HostName = alma04.qa.lab.tlv.redhat.com, HostSetupNetworksVdsCommandParameters:{hostId='30a14650-348c-475b-b4fa-767b9f46e27f', vds='Host[alma04.qa.lab.tlv.redhat.com,30a14650-348c-475b-b4fa-767b9f46e27f]', rollbackOnFailure='true', commitOnSuccess='true', connectivityTimeout='120', networks='[HostNetwork:{defaultRoute='true', bonding='false', networkName='ovirtmgmt', vdsmName='ovirtmgmt', nicName='enp3s0f0', vlan='null', vmNetwork='true', stp='false', properties='null', ipv4BootProtocol='DHCP', ipv4Address='null', ipv4Netmask='null', ipv4Gateway='null', ipv6BootProtocol='POLY_DHCP_AUTOCONF', ipv6Address='null', ipv6Prefix='null', ipv6Gateway='null', nameServers='null'}]', removedNetworks='[]', bonds='[]', removedBonds='[]', clusterSwitchType='LEGACY', managementNetworkChanged='true'}), log id: 252ee336
2020-04-20 16:44:58,368+03 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-19) [3b91cf90] EVENT_ID: SYSTEM_UPDATE_CLUSTER(835), Host cluster Default was updated by system
2020-04-20 17:08:42,971+03 INFO  [org.ovirt.engine.core.bll.ServiceLoader] (ServerService Thread Pool -- 43) [2f41a8f] Start org.ovirt.engine.core.bll.network.cluster.ExternalNetworkSyncService@723e5c93 
    <ovirt-vm:clusterVersion>4.4</ovirt-vm:clusterVersion>
2020-04-20 18:25:28,027+03 INFO  [org.ovirt.engine.core.bll.ServiceLoader] (ServerService Thread Pool -- 59) [23b5ca4e] Start org.ovirt.engine.core.bll.network.cluster.ExternalNetworkSyncService@24ff6b9f 
    <ovirt-vm:clusterVersion>4.4</ovirt-vm:clusterVersion>
2020-04-21 12:09:34,661+03 INFO  [org.ovirt.engine.core.bll.ServiceLoader] (ServerService Thread Pool -- 61) [7049c104] Start org.ovirt.engine.core.bll.network.cluster.ExternalNetworkSyncService@41490020 
2020-04-21 12:09:42,595+03 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-63) [33297c5a] EVENT_ID: SYSTEM_UPDATE_CLUSTER(835), Host cluster Default was updated by system
    <ovirt-vm:clusterVersion>4.4</ovirt-vm:clusterVersion>
2020-04-21 16:06:34,559+03 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-1881) [57c13259] EVENT_ID: ANSIBLE_RUNNER_EVENT_NOTIFICATION(559), Installing Host alma03.qa.lab.tlv.redhat.com. Apply cluster specific firewalld rules.
2020-04-21 16:07:32,930+03 INFO  [org.ovirt.engine.core.bll.hostdeploy.InstallVdsInternalCommand] (EE-ManagedThreadFactory-engine-Thread-1881) [57c13259] Skipping Reconfigure gluster since cluster does not support gluster
2020-04-21 16:07:36,889+03 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.HostSetupNetworksVDSCommand] (EE-ManagedThreadFactory-engine-Thread-1881) [1474dcb7] START, HostSetupNetworksVDSCommand(HostName = alma03.qa.lab.tlv.redhat.com, HostSetupNetworksVdsCommandParameters:{hostId='e816393b-1af0-48e0-8940-cbc120466cd4', vds='Host[alma03.qa.lab.tlv.redhat.com,e816393b-1af0-48e0-8940-cbc120466cd4]', rollbackOnFailure='true', commitOnSuccess='true', connectivityTimeout='120', networks='[HostNetwork:{defaultRoute='true', bonding='false', networkName='ovirtmgmt', vdsmName='ovirtmgmt', nicName='enp5s0f0', vlan='null', vmNetwork='true', stp='false', properties='null', ipv4BootProtocol='DHCP', ipv4Address='null', ipv4Netmask='null', ipv4Gateway='null', ipv6BootProtocol='POLY_DHCP_AUTOCONF', ipv6Address='null', ipv6Prefix='null', ipv6Gateway='null', nameServers='null'}]', removedNetworks='[]', bonds='[]', removedBonds='[]', clusterSwitchType='LEGACY', managementNetworkChanged='true'}), log id: 1920861
2020-04-21 16:07:47,692+03 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-19) [437234ef] EVENT_ID: SYSTEM_UPDATE_CLUSTER(835), Host cluster Default was updated by system
    <ovirt-vm:clusterVersion>4.4</ovirt-vm:clusterVersion>
2020-04-21 16:38:52,418+03 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-30) [3db64824] EVENT_ID: SYSTEM_UPDATE_CLUSTER(835), Host cluster Default was updated by system
    <ovirt-vm:clusterVersion>4.4</ovirt-vm:clusterVersion>

I looks wired that it reports several times about "Host cluster Default was updated by system" and that its "clusterSwitchType='LEGACY'".

Host cluster in engine's UI reported as IvyBridge Family.

Version-Release number of selected component (if applicable):
Software Version:4.4.0-0.33.master.el8ev
ovirt-hosted-engine-ha-2.4.2-1.el8ev.noarch
ovirt-hosted-engine-setup-2.4.4-1.el8ev.noarch


How reproducible:
100%

Steps to Reproduce:
1.Deploy HE on a clean host.
2.Check engine VM symbol.

Actual results:
Engine reports that 
"Server with newer configuration for next run 
Hosted Engine VM
Pending Virtual Machine changes:
*Cluster CPU Type"

Expected results:
There should be no message after deployment is complete.

Additional info:
I tried to restart the engine and id didn't helped to resolve the warning as depicted in attachment.

Comment 1 Nikolai Sednev 2020-05-04 07:41:13 UTC
Created attachment 1684709 [details]
UI engine Pending Virtual Machine changes

Comment 2 Nikolai Sednev 2020-05-04 07:56:57 UTC
Why engine's VM being recognized as Ivy Bridge, while host's CPU is E5-2603 v2, which are 32 nm Lithography Sandy Bridge EP?
Ivy Bridge is the "third generation" of the Intel Core processors and it uses a 22 nanometer die shrink known as the tick–tock model.

Comment 3 Sandro Bonazzola 2020-05-04 08:09:26 UTC
(In reply to Nikolai Sednev from comment #2)
> Why engine's VM being recognized as Ivy Bridge, while host's CPU is E5-2603
> v2, which are 32 nm Lithography Sandy Bridge EP?
> Ivy Bridge is the "third generation" of the Intel Core processors and it
> uses a 22 nanometer die shrink known as the tick–tock model.

No idea, that's why I think this is a virt bug

Comment 4 Ryan Barry 2020-05-05 01:14:42 UTC
Nikolai, was the cluster CPU type set to Ivy by hand, as part of detection during HE install (prompted data), or autodetection during install?

Comment 5 Nikolai Sednev 2020-05-05 07:12:12 UTC
(In reply to Ryan Barry from comment #4)
> Nikolai, was the cluster CPU type set to Ivy by hand, as part of detection
> during HE install (prompted data), or autodetection during install?

Ivy was set automatically after deployment was complete by the system as part of the detection during HE install.
I did not changed a thing manually.

Comment 6 Sandro Bonazzola 2020-05-14 07:26:01 UTC
*** Bug 1833604 has been marked as a duplicate of this bug. ***

Comment 7 Michael Burman 2020-05-14 14:14:55 UTC
Hi

This bug is quite bad and blocking automation. 
We have did a temporary WA, but it's not good for the long run.
The HE set the default's CPU type cluster with Ivi instead of Nehalem. 
The HE must set the cpu type the nehalem as default, as htis is the most lowest cput type. Setting Ivy is wrong.

Comment 10 Ryan Barry 2020-05-14 14:16:52 UTC
(In reply to Michael Burman from comment #7)
> Hi
> 
> This bug is quite bad and blocking automation. 
> We have did a temporary WA, but it's not good for the long run.
> The HE set the default's CPU type cluster with Ivi instead of Nehalem. 
> The HE must set the cpu type the nehalem as default, as htis is the most
> lowest cput type. Setting Ivy is wrong.

HE should use CPU capabilities detection like all other new host installs, not default to a minimum model.

Steven, can you please take a look at this?

Comment 13 Michael Burman 2020-05-14 14:19:10 UTC
It's totally break our HE deploy automation and now we need to set Ivy specifically, without it the deploy failing with:

An exception occurred during task execution. To see the full traceback, use -vvv. The error was: ovirtsdk4.Error: Fault reason is "Operation Failed". Fault detail is "[Update of cluster compatibility version failed because there are VMs/Templates [HostedEngine] with incorrect configuration. To fix the issue, please go to each of them, edit, change the Custom Compatibility Version of the VM/Template to the cluster level you want to update the cluster to and press OK. If the save does not pass, fix the dialog validation. After successful cluster update, you can revert your Custom Compatibility Version change.]". HTTP response code is 500.

Comment 24 Michal Skrivanek 2020-05-15 08:53:56 UTC
looks like you're talking about two issues here
- UI issue when cluster cpu changes. Not a blocker, HE has its own config
- HE being set to IvyBridge instead of Nehalem - that is a known behavior by design. Not really a bug


keeping open for the first one, lowering sev. Also I don't believe this is a regression, it always worked like this.

Comment 25 Nikolai Sednev 2020-05-17 10:53:13 UTC
(In reply to Michal Skrivanek from comment #24)
> looks like you're talking about two issues here
> - UI issue when cluster cpu changes. Not a blocker, HE has its own config
> - HE being set to IvyBridge instead of Nehalem - that is a known behavior by
> design. Not really a bug
> 
> 
> keeping open for the first one, lowering sev. Also I don't believe this is a
> regression, it always worked like this.

I'm talking about single issue here, which is a regression. HE deployments never had this issue before, now you can't even bump up the host-cluster's version.
I never seen that HE-VM got Ivy-Bridge before, it was SandyBridge IBRS in 4.3 and the engine had never that symbol in UI. 
Please see attachments from 4.3.(Software Version:4.3.10.1-0.1.master.el7).
What do you mean is by design here?

Comment 26 Nikolai Sednev 2020-05-17 10:54:12 UTC
Created attachment 1689368 [details]
in 4.3 there is no triangle symbol in UI

Comment 27 Nikolai Sednev 2020-05-17 10:55:57 UTC
Created attachment 1689369 [details]
In 4.3 HE configured as SandyBridge IBRS

Comment 28 Steven Rosenberg 2020-05-18 08:49:31 UTC
The Ivy Bridge CPU type was added to 4.4, it was not included in 4.3 which is why 4.4 detects it and 4.3 does not.

It does look like we removed the nx flag checking in the configuration for Secure Intel CPU Types and in my testing the CPU flags for an Ivy Bridge CPU was not giving the vmx flag [1]

The question is why was the nx flag removed [2]?

[1] lscpu
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              1
On-line CPU(s) list: 0
Thread(s) per core:  1
Core(s) per socket:  1
Socket(s):           1
NUMA node(s):        1
Vendor ID:           GenuineIntel
CPU family:          6
Model:               58
Model name:          Intel Xeon E3-12xx v2 (Ivy Bridge)
Stepping:            9
CPU MHz:             2099.998
BogoMIPS:            4199.99
Hypervisor vendor:   KVM
Virtualization type: full
L1d cache:           32K
L1i cache:           32K
L2 cache:            4096K
L3 cache:            16384K
NUMA node0 CPU(s):   0
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx rdtscp lm constant_tsc rep_good nopl xtopology cpuid tsc_known_freq pni pclmulqdq ssse3 cx16 sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm cpuid_fault pti fsgsbase smep erms xsaveopt arat

[2] https://gerrit.ovirt.org/#/c/101913/10/packaging/dbscripts/upgrade/pre_upgrade/0000_config.sql

Comment 29 Lucia Jelinkova 2020-05-18 10:51:44 UTC
The template for the 4.4 configuration was the 4.3 configuration taken from fn_db_update_config_value (not fn_db_add_config_value). That configuration does not contain 'nx' flag either. It was introduced like that a year ago so I do not think it is a root cause.

https://github.com/oVirt/ovirt-engine/commit/9ef6091d589bb5daf45fadfcc84dac26f81a0b3f#diff-56a49a6a2c97b5a4f9b061cd704cec32

To gain more information, could you please provide the following information?

From the UI:
- screenshot of the popup message that appears when you roll over the "!" sign on VM?
- screenshot of the Host detail -> General tab -> info icon popup for CPU Type field?
- are there any warnings for the host on Host -> General tab? 
- are there any warnings on the cluster list table for the cluster?

From DB if possible:
- for VM: cpu_name from vm_dynamic table
- for Host: cpu_flags from vds_dynamic table
- for Cluster: cpu_name, cpu_flags, cpu_verb from cluster table

Thank you

Comment 30 Nikolai Sednev 2020-05-18 11:24:10 UTC
(In reply to Lucia Jelinkova from comment #29)
> The template for the 4.4 configuration was the 4.3 configuration taken from
> fn_db_update_config_value (not fn_db_add_config_value). That configuration
> does not contain 'nx' flag either. It was introduced like that a year ago so
> I do not think it is a root cause.
> 
> https://github.com/oVirt/ovirt-engine/commit/
> 9ef6091d589bb5daf45fadfcc84dac26f81a0b3f#diff-
> 56a49a6a2c97b5a4f9b061cd704cec32
> 
> To gain more information, could you please provide the following information?
> 
> From the UI:
> - screenshot of the popup message that appears when you roll over the "!"

Look at "UI engine Pending Virtual Machine changes".

> sign on VM?
> - screenshot of the Host detail -> General tab -> info icon popup for CPU
> Type field?
> - are there any warnings for the host on Host -> General tab? 
> - are there any warnings on the cluster list table for the cluster?
> 
> From DB if possible:
> - for VM: cpu_name from vm_dynamic table
> - for Host: cpu_flags from vds_dynamic table
> - for Cluster: cpu_name, cpu_flags, cpu_verb from cluster table
> 
> Thank you

Its not only that CPU type is incorrect, also OS type is wrong for VM, I'm getting RHEL7.x instead of RHEL8.x via UI. PSA Screenshot from 2020-05-18 14-15-36.png.
Currently my connectivity is really slow, I'll add farther more data.

Comment 31 Nikolai Sednev 2020-05-18 11:24:32 UTC
Created attachment 1689570 [details]
Screenshot from 2020-05-18 14-15-36.png

Comment 32 Nikolai Sednev 2020-05-18 11:40:31 UTC
Comment on attachment 1689570 [details]
Screenshot from 2020-05-18 14-15-36.png

Taken from wrong version, should be from 4.4, taken from 4.3. Here it is in separate bug:
https://bugzilla.redhat.com/show_bug.cgi?id=1821314

Comment 33 Lucia Jelinkova 2020-05-18 12:18:37 UTC
Thanks, I am interested at the message that appears when you roll over the "!" sign in the first column of the vm (not the next run). 

Could you also please check if there are any snapshots on the Snapshots tab?

Comment 34 Michal Skrivanek 2020-05-18 13:56:03 UTC
Nikolai, again, please stop mixing several things up, it makes the bug difficult to follow

- "OS type is wrong", you perhaps refer to bug 1821314? Sounds unrelated to anything mentioned in this bug so far
- IvyBridge was introduced by bug 1674420.
- Steven, "nx" is irrelevant

since your machine is apparently ivybridge (E5-2603 v2 - https://ark.intel.com/content/www/us/en/ark/products/76157/intel-xeon-processor-e5-2603-v2-10m-cache-1-80-ghz.html
) the hosted engine VM and its cluster is set to ivybridge. That's the design.

Comment 35 Nikolai Sednev 2020-05-18 14:35:25 UTC
(In reply to Michal Skrivanek from comment #34)
> Nikolai, again, please stop mixing several things up, it makes the bug
> difficult to follow
> 
> - "OS type is wrong", you perhaps refer to bug 1821314? Sounds unrelated to
> anything mentioned in this bug so far
> - IvyBridge was introduced by bug 1674420.
> - Steven, "nx" is irrelevant
> 
> since your machine is apparently ivybridge (E5-2603 v2 -
> https://ark.intel.com/content/www/us/en/ark/products/76157/intel-xeon-
> processor-e5-2603-v2-10m-cache-1-80-ghz.html
> ) the hosted engine VM and its cluster is set to ivybridge. That's the
> design.

Sorry if confusing, I wanted to mention this here and I wrote that its covered in different bug in comment #32.

It seems like correctly detected CPU type according to Intel specification from https://ark.intel.com/content/www/us/en/ark/products/codename/68926/ivy-bridge-ep.html, although in 4.3 Ivy Bridge EP was missing according to comment #28, so in 4.3 its detected as SandyBridge IBRS as mentioned in comment #27.

Why do we still getting the issue with the "triangle on HE-VM" in 4.4 and can't change CPU family for host-cluster? What so introduced this issue if not new CPU family type, which did not existed in 4.3 and now is in 4.4?

Comment 36 Lucia Jelinkova 2020-05-19 08:21:19 UTC
The CPU flags handling has been changed a lot when introducing the Secure CPU concept in 4.4

https://www.ovirt.org/develop/release-management/features/virt/secure-cpus.html

As for the UI issue (the triangle), I can look into that but I need the information I've asked for yesterday.

What do you mean you cannot change CPU Family?

Comment 37 Nikolai Sednev 2020-05-19 08:25:52 UTC
(In reply to Lucia Jelinkova from comment #36)
> The CPU flags handling has been changed a lot when introducing the Secure
> CPU concept in 4.4
> 
> https://www.ovirt.org/develop/release-management/features/virt/secure-cpus.
> html
> 
> As for the UI issue (the triangle), I can look into that but I need the
> information I've asked for yesterday.

I will provide you with the environment if you wish later on Today. 

> 
> What do you mean you cannot change CPU Family?

You can't change CPU family type for host-cluster manually via UI. I'll supply you with more data later on Today.

Comment 38 Nikolai Sednev 2020-05-19 11:59:51 UTC
Created attachment 1689859 [details]
"!" symbol screenshot

Comment 39 Nikolai Sednev 2020-05-19 12:02:45 UTC
(In reply to Lucia Jelinkova from comment #33)
> Thanks, I am interested at the message that appears when you roll over the
> "!" sign in the first column of the vm (not the next run). 
> 
> Could you also please check if there are any snapshots on the Snapshots tab?

Please see the attachment.
Snapshots are empty, it's HE-VM, there should be no snapshots. See also additionally attached screenshot for snapshots.

Comment 40 Nikolai Sednev 2020-05-19 12:03:24 UTC
Created attachment 1689860 [details]
Snapshots screenshot

Comment 41 Lucia Jelinkova 2020-05-19 12:36:25 UTC
Thank you, both screenshots helped to narrow down the issue to the following code:

https://github.com/oVirt/ovirt-engine/blob/48cfe5bad08be755058dc17c3bc455da23dab033/frontend/webadmin/modules/webadmin/src/main/java/org/ovirt/engine/ui/webadmin/widget/table/column/VmTypeColumn.java#L94

The cluster CPU verb is not the same as the VM CPU name (which is actually not name but also verb). 

Could you please make 2 more screenshots - VM General tab (while VM is running) + Cluster General tab? Thanks you.

Comment 42 Nikolai Sednev 2020-05-19 13:24:28 UTC
(In reply to Lucia Jelinkova from comment #41)
> Thank you, both screenshots helped to narrow down the issue to the following
> code:
> 
> https://github.com/oVirt/ovirt-engine/blob/
> 48cfe5bad08be755058dc17c3bc455da23dab033/frontend/webadmin/modules/webadmin/
> src/main/java/org/ovirt/engine/ui/webadmin/widget/table/column/VmTypeColumn.
> java#L94
> 
> The cluster CPU verb is not the same as the VM CPU name (which is actually
> not name but also verb). 
> 
> Could you please make 2 more screenshots - VM General tab (while VM is
> running) + Cluster General tab? Thanks you.

Its taken from different setup with pair of Model name: Intel(R) Xeon(R) CPU E5645  @ 2.40GHz, which are Products formerly Westmere EP https://ark.intel.com/content/www/us/en/ark/products/48768/intel-xeon-processor-e5645-12m-cache-2-40-ghz-5-86-gt-s-intel-qpi.html.

Please see the attached screenshots.

Comment 43 Nikolai Sednev 2020-05-19 13:25:20 UTC
Created attachment 1689885 [details]
HE general tab

Comment 44 Nikolai Sednev 2020-05-19 13:26:46 UTC
Created attachment 1689887 [details]
Host cluster general tab

Comment 45 Nikolai Sednev 2020-05-19 13:29:35 UTC
Created attachment 1689889 [details]
Video of CPU type change error for host cluster

Comment 46 Nikolai Sednev 2020-05-19 13:30:56 UTC
> What do you mean you cannot change CPU Family?

Please see attached video and following error:
Error while executing action: Update of cluster compatibility version failed because there are VMs/Templates [HostedEngine] with incorrect configuration. To fix the issue, please go to each of them, edit, change the Custom Compatibility Version of the VM/Template to the cluster level you want to update the cluster to and press OK. If the save does not pass, fix the dialog validation. After successful cluster update, you can revert your Custom Compatibility Version change.

Comment 47 Lucia Jelinkova 2020-05-20 09:35:41 UTC
The error message is misleading, we should rephrase it. It assumes the update VM is called only when changing compatibility version but it is called also when changing the CPU name (like in this case). Regardless the message, something goes wrong there. Could you please attach / check the engine.log for the time you're changing the CPU type? There should be an audit log with the key CLUSTER_CANNOT_UPDATE_VM_COMPATIBILITY_VERSION there.

Comment 48 Nikolai Sednev 2020-05-20 10:01:16 UTC
(In reply to Lucia Jelinkova from comment #47)
> The error message is misleading, we should rephrase it. It assumes the
> update VM is called only when changing compatibility version but it is
> called also when changing the CPU name (like in this case). Regardless the
> message, something goes wrong there. Could you please attach / check the
> engine.log for the time you're changing the CPU type? There should be an
> audit log with the key CLUSTER_CANNOT_UPDATE_VM_COMPATIBILITY_VERSION there.

nsednev-he-4 ~]# less /var/log/ovirt-engine/engine.log | grep CLUSTER_CANNOT_UPDATE_VM_COMPATIBILITY_VERSION
2020-05-19 16:28:10,532+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-5) [2b148cb] EVENT_ID: CLUSTER_CANNOT_UPDATE_VM_COMPATIBILITY_VERSION(12,005), Cannot update compatibility version of Vm/Template: [HostedEngine], Message: [No Message]
2020-05-19 16:28:36,920+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-4) [7986512a] EVENT_ID: CLUSTER_CANNOT_UPDATE_VM_COMPATIBILITY_VERSION(12,005), Cannot update compatibility version of Vm/Template: [HostedEngine], Message: [No Message]
(In reply to Lucia Jelinkova from comment #47)
> The error message is misleading, we should rephrase it. It assumes the
> update VM is called only when changing compatibility version but it is
> called also when changing the CPU name (like in this case). Regardless the
> message, something goes wrong there. Could you please attach / check the
> engine.log for the time you're changing the CPU type? There should be an
> audit log with the key CLUSTER_CANNOT_UPDATE_VM_COMPATIBILITY_VERSION there.

Comment 49 Nikolai Sednev 2020-05-20 10:02:25 UTC
Created attachment 1690155 [details]
engine log for CLUSTER_CANNOT_UPDATE_VM_COMPATIBILITY_VERSION

Comment 50 Nikolai Sednev 2020-05-20 10:08:50 UTC
2020-05-19 16:28:10,321+03 INFO  [org.ovirt.engine.core.bll.UpdateClusterCommand] (default task-5) [97556da5-2fb0-4732
-98c1-dbc6f164029f] Running command: UpdateClusterCommand internal: false. Entities affected :  ID: b5b207da-99bc-11ea
-9052-00163e7bb860 Type: ClusterAction group EDIT_CLUSTER_CONFIGURATION with role type ADMIN
2020-05-19 16:28:10,522+03 WARN  [org.ovirt.engine.core.bll.UpdateVmCommand] (default task-5) [2b148cb] Validation of 
action 'UpdateVm' failed for user admin@internal-authz. Reasons: VAR__ACTION__UPDATE,VAR__TYPE__VM,VM_CANNOT_UPDATE_HOSTED_ENGINE_FIELD
2020-05-19 16:28:10,532+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-5) [2b148cb] EVENT_ID: CLUSTER_CANNOT_UPDATE_VM_COMPATIBILITY_VERSION(12,005), Cannot update compatibility version of Vm/Template: [HostedEngine], Message: [No Message]
2020-05-19 16:28:10,559+03 INFO  [org.ovirt.engine.core.bll.CommandCompensator] (default task-5) [2b148cb] Command [id=83ba1add-b955-4230-916e-d79a2a0a64ca]: Compensating UPDATED_ONLY_ENTITY of org.ovirt.engine.core.common.businessentities.Cluster; snapshot: Cluster [Default].
2020-05-19 16:28:10,567+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-5) [2b148cb] EVENT_ID: USER_UPDATE_CLUSTER_FAILED(812), Failed to update Host cluster (User: admin@internal-authz)
2020-05-19 16:28:10,568+03 INFO  [org.ovirt.engine.core.bll.UpdateClusterCommand] (default task-5) [2b148cb] Lock freed to object 'EngineLock:{exclusiveLocks='[]', sharedLocks='[20fafcd0-1baa-45b7-807b-3e09fa0b560d=VM]'}'
2020-05-19 16:28:36,780+03 INFO  [org.ovirt.engine.core.bll.UpdateClusterCommand] (default task-4) [c4f16b54-4152-4e61-a418-50cf4398cf1d] Lock Acquired to object 'EngineLock:{exclusiveLocks='[]', sharedLocks='[20fafcd0-1baa-45b7-807b-3e09fa0b560d=VM]'}'
2020-05-19 16:28:36,825+03 INFO  [org.ovirt.engine.core.bll.UpdateClusterCommand] (default task-4) [c4f16b54-4152-4e61-a418-50cf4398cf1d] Running command: UpdateClusterCommand internal: false. Entities affected :  ID: b5b207da-99bc-11ea-9052-00163e7bb860 Type: ClusterAction group EDIT_CLUSTER_CONFIGURATION with role type ADMIN
2020-05-19 16:28:36,914+03 WARN  [org.ovirt.engine.core.bll.UpdateVmCommand] (default task-4) [7986512a] Validation of action 'UpdateVm' failed for user admin@internal-authz. Reasons: VAR__ACTION__UPDATE,VAR__TYPE__VM,VM_CANNOT_UPDATE_HOSTED_ENGINE_FIELD
2020-05-19 16:28:36,920+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-4) [7986512a] EVENT_ID: CLUSTER_CANNOT_UPDATE_VM_COMPATIBILITY_VERSION(12,005), Cannot update compatibility version of Vm/Template: [HostedEngine], Message: [No Message]
2020-05-19 16:28:36,925+03 INFO  [org.ovirt.engine.core.bll.CommandCompensator] (default task-4) [7986512a] Command [id=1a3b002c-2e97-4082-9fd2-b8b36b43f5f5]: Compensating UPDATED_ONLY_ENTITY of org.ovirt.engine.core.common.businessentities.Cluster; snapshot: Cluster [Default].
2020-05-19 16:28:36,933+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-4) [7986512a] EVENT_ID: USER_UPDATE_CLUSTER_FAILED(812), Failed to update Host cluster (User: admin@internal-authz)
2020-05-19 16:28:36,933+03 INFO  [org.ovirt.engine.core.bll.UpdateClusterCommand] (default task-4) [7986512a] Lock freed to object 'EngineLock:{exclusiveLocks='[]', sharedLocks='[20fafcd0-1baa-45b7-807b-3e09fa0b560d=VM]'}'

Comment 51 Lucia Jelinkova 2020-05-20 13:43:54 UTC
It looks like this issue: https://bugzilla.redhat.com/show_bug.cgi?id=1828944

I wonder if it is not in your build yet or it still does not work.

As for the UI issue, is it possible to get the following values from the DB?

- cpu_name from vm_dynamic table
- cpu_name, cpu_flags, cpu_verb from cluster table

Comment 53 Lucia Jelinkova 2020-05-22 14:26:49 UTC
I'll try to sum up what has been discussed so far:

- setting of the CPU type as Ivy Bridge - this was not a bug, but a feature of 4.4
- cannot change the cluster CPU type - that should be fixed - https://bugzilla.redhat.com/show_bug.cgi?id=1828944<
- UI engine Pending Virtual Machine changes - the code expected that all running VMs have the cpu_name in the vm_dynamic table but that is not the case for HE. It seems that we cannot detect the actual cpu of the running HE so should probably disable the warning for the HE

Is there anything else that I've missed and breaks your automation?

Comment 54 Nikolai Sednev 2020-05-24 09:03:20 UTC
Please provide your input forth to comment #53.

Comment 55 Michael Burman 2020-05-24 09:10:39 UTC
(In reply to Lucia Jelinkova from comment #53)
> I'll try to sum up what has been discussed so far:
> 
> - setting of the CPU type as Ivy Bridge - this was not a bug, but a feature
> of 4.4
> - cannot change the cluster CPU type - that should be fixed -
> https://bugzilla.redhat.com/show_bug.cgi?id=1828944<
> - UI engine Pending Virtual Machine changes - the code expected that all
> running VMs have the cpu_name in the vm_dynamic table but that is not the
> case for HE. It seems that we cannot detect the actual cpu of the running HE
> so should probably disable the warning for the HE
> 

> Is there anything else that I've missed and breaks your automation?

I think that we should be OK now. tnx

Comment 56 Nikolai Sednev 2020-07-06 13:11:02 UTC
Works for me on latest Software Version:4.4.1.7-0.3.el8ev.
ovirt-hosted-engine-ha-2.4.4-1.el8ev.noarch
ovirt-hosted-engine-setup-2.4.5-1.el8ev.noarch
Linux 4.18.0-193.12.1.el8_2.x86_64 #1 SMP Thu Jul 2 15:48:14 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux release 8.2 (Ootpa)

Reported issue no longer exists.

Comment 57 Sandro Bonazzola 2020-07-08 08:26:47 UTC
This bugzilla is included in oVirt 4.4.1 release, published on July 8th 2020.

Since the problem described in this bug report should be resolved in oVirt 4.4.1 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.

Comment 58 Arik 2020-07-21 13:45:12 UTC
*** Bug 1856684 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.