Bug 1585986 - [HE] When lowering the cluster compatibility, we need to force update the HE storage OVF store to ensure it can start up (migration will not work).
Summary: [HE] When lowering the cluster compatibility, we need to force update the HE ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 2.1.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ovirt-4.4.0
: ---
Assignee: Andrej Krejcir
QA Contact: Nikolai Sednev
URL:
Whiteboard:
Depends On: 1691562
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-06-05 08:54 UTC by Israel Pinto
Modified: 2020-08-04 13:16 UTC (History)
20 users (show)

Fixed In Version: rhv-4.4.0-29
Doc Type: Bug Fix
Doc Text:
Previously, if you lowered the cluster compatibility version, the change did not propagate to the self-hosted engine virtual machine. As a result, the self-hosted engine virtual machine was not compatible with the new cluster version; you could not start or migrate it to another host in the cluster. The current release fixes this issue: The lower cluster compatibility version propagates to the self-hosted engine virtual machine; you can start and migrate it.
Clone Of:
Environment:
Last Closed: 2020-08-04 13:16:11 UTC
oVirt Team: Virt
Target Upstream Version:
Embargoed:
lsvaty: testing_plan_complete-


Attachments (Terms of Use)
engine log (413.43 KB, application/x-xz)
2018-06-05 08:57 UTC, Israel Pinto
no flags Details
source host log (680.45 KB, application/x-xz)
2018-06-05 09:01 UTC, Israel Pinto
no flags Details
vdsm log HE VM first started (576.53 KB, application/x-xz)
2018-06-07 12:59 UTC, Israel Pinto
no flags Details
logs_11_6_2018 (494.39 KB, application/x-xz)
2018-06-11 12:58 UTC, Israel Pinto
no flags Details
trigger OVF_STORE update (106.51 KB, image/png)
2019-03-21 22:27 UTC, Simone Tiraboschi
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2020:3247 0 None None None 2020-08-04 13:16:36 UTC
oVirt gerrit 93157 0 'None' ABANDONED bll: Refresh hosted engine OVF when cluster is updated 2021-02-18 16:50:11 UTC

Description Israel Pinto 2018-06-05 08:54:07 UTC
Description of problem:
Failed to migration HE VM from RHEL host to RHVH host

Version-Release number of selected component (if applicable):
Engine:4.2.4.1-0.1.el7
Hosts:
1. RHEL - 7.5 - 8.el7
Kernel Version:3.10.0 - 862.3.2.el7.x86_64
KVM Version:2.10.0 - 21.el7_5.3
LIBVIRT Version:libvirt-3.9.0-14.el7_5.5
VDSM Version:vdsm-4.20.29-1.el7ev

2. RHV-H
OS Version:RHEL - 7.5 - 3.1.el7
OS Description:Red Hat Virtualization Host 4.2.3 (el7.5)
Kernel Version:3.10.0 - 862.3.2.el7.x86_64
KVM Version:2.10.0 - 21.el7_5.3
LIBVIRT Version:libvirt-3.9.0-14.el7_5.5
VDSM Version:vdsm-4.20.27.2-1.el7ev


Steps to Reproduce:
Migrate HE VM from RHEL to RHVH

vdsm log:
018-06-05 05:50:46,803+0300 ERROR (migsrc/1f41e617) [virt.vm] (vmId='1f41e617-5e95-4086-aa86-d93205bf482e') operation failed: guest CPU doesn't match specification: missing features: spec-ctrl (migration:290)
2018-06-05 05:50:46,916+0300 INFO  (jsonrpc/4) [jsonrpc.JsonRpcServer] RPC call Host.ping2 succeeded in 0.00 seconds (__init__:573)
2018-06-05 05:50:46,920+0300 DEBUG (jsonrpc/3) [storage.TaskManager.Task] (Task='47035432-9b48-4896-90f5-76b123d9fe17') moving from state finished -> state preparing (task:602)
2018-06-05 05:50:46,920+0300 INFO  (jsonrpc/3) [vdsm.api] START repoStats(domains=['a2a6f15e-b73b-4f3d-81bb-e5ccb5a5376b']) from=::1,52926, task_id=47035432-9b48-4896-90f5-76b123d9fe17 (api:46)
2018-06-05 05:50:46,920+0300 INFO  (jsonrpc/3) [vdsm.api] FINISH repoStats return={'a2a6f15e-b73b-4f3d-81bb-e5ccb5a5376b': {'code': 0, 'actual': True, 'version': 4, 'acquired': True, 'delay': '0.000288341', 'lastCheck': '0.5', 'valid': True}} from=::1,52926, task_id=47035432-9b48-4896-90f5-76b123d9fe17 (api:52)
2018-06-05 05:50:46,920+0300 DEBUG (jsonrpc/3) [storage.TaskManager.Task] (Task='47035432-9b48-4896-90f5-76b123d9fe17') finished: {'a2a6f15e-b73b-4f3d-81bb-e5ccb5a5376b': {'code': 0, 'actual': True, 'version': 4, 'acquired': True, 'delay': '0.000288341', 'lastCheck': '0.5', 'valid': True}} (task:1201)
2018-06-05 05:50:46,920+0300 DEBUG (jsonrpc/3) [storage.TaskManager.Task] (Task='47035432-9b48-4896-90f5-76b123d9fe17') moving from state finished -> state finished (task:602)
2018-06-05 05:50:46,920+0300 DEBUG (jsonrpc/3) [storage.ResourceManager.Owner] Owner.releaseAll requests {} resources {} (resourceManager:910)
2018-06-05 05:50:46,921+0300 DEBUG (jsonrpc/3) [storage.ResourceManager.Owner] Owner.cancelAll requests {} (resourceManager:947)
2018-06-05 05:50:46,921+0300 DEBUG (jsonrpc/3) [storage.TaskManager.Task] (Task='47035432-9b48-4896-90f5-76b123d9fe17') ref 0 aborting False (task:1002)
2018-06-05 05:50:46,921+0300 INFO  (jsonrpc/3) [jsonrpc.JsonRpcServer] RPC call Host.getStorageRepoStats succeeded in 0.00 seconds (__init__:573)
2018-06-05 05:50:46,930+0300 INFO  (jsonrpc/7) [jsonrpc.JsonRpcServer] RPC call Host.ping2 succeeded in 0.00 seconds (__init__:573)
2018-06-05 05:50:47,893+0300 ERROR (migsrc/1f41e617) [virt.vm] (vmId='1f41e617-5e95-4086-aa86-d93205bf482e') Failed to migrate (migration:455)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/virt/migration.py", line 437, in _regular_run
    self._startUnderlyingMigration(time.time())
  File "/usr/lib/python2.7/site-packages/vdsm/virt/migration.py", line 511, in _startUnderlyingMigration
    self._perform_with_downtime_thread(duri, muri)
  File "/usr/lib/python2.7/site-packages/vdsm/virt/migration.py", line 580, in _perform_with_downtime_thread
    self._perform_migration(duri, muri)
  File "/usr/lib/python2.7/site-packages/vdsm/virt/migration.py", line 529, in _perform_migration
    self._migration_flags)
  File "/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line 98, in f
    ret = attr(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/common/libvirtconnection.py", line 130, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/common/function.py", line 92, in wrapper
    return func(inst, *args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1746, in migrateToURI3
    if ret == -1: raise libvirtError ('virDomainMigrateToURI3() failed', dom=self)
libvirtError: operation failed: guest CPU doesn't match specification: missing features: spec-ctrl

engine log (correlation-id: vms_syncAction_3d82143b-da9f-4d63):
2018-06-05 05:46:19,570+03 INFO  [org.ovirt.engine.core.bll.MigrateVmToServerCommand] (default task-19) [vms_syncAction_3d82143b-da9f-4d63] Lock Acquired to object 'EngineLock:{exclusiveLocks='[1f41e617-5e95-4086-aa86-d93205bf482e=VM]', sharedLocks=''}'
2018-06-05 05:46:19,982+03 INFO  [org.ovirt.engine.core.bll.MigrateVmToServerCommand] (default task-19) [vms_syncAction_3d82143b-da9f-4d63] Running command: MigrateVmToServerCommand internal: false. Entities affected :  ID: 1f41e617-5e95-4086-aa86-d93205bf482e Type: VMAction group MIGRATE_VM with role type USER
2018-06-05 05:46:20,084+03 INFO  [org.ovirt.engine.core.vdsbroker.MigrateVDSCommand] (default task-19) [vms_syncAction_3d82143b-da9f-4d63] START, MigrateVDSCommand( MigrateVDSCommandParameters:{hostId='75a05865-47f1-4b6f-bcca-350e3369931e', vmId='1f41e617-5e95-4086-aa86-d93205bf482e', srcHost='lynx16.lab.eng.tlv2.redhat.com', dstVdsId='896e262b-9f5b-411e-8320-10e22689101e', dstHost='lynx17.lab.eng.tlv2.redhat.com:54321', migrationMethod='ONLINE', tunnelMigration='false', migrationDowntime='0', autoConverge='false', migrateCompressed='false', consoleAddress='null', maxBandwidth='5000', enableGuestEvents='false', maxIncomingMigrations='2', maxOutgoingMigrations='2', convergenceSchedule='null', dstQemu='10.46.16.32'}), log id: 5e1758bf
2018-06-05 05:46:20,087+03 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateBrokerVDSCommand] (default task-19) [vms_syncAction_3d82143b-da9f-4d63] START, MigrateBrokerVDSCommand(HostName = host_mixed_2, MigrateVDSCommandParameters:{hostId='75a05865-47f1-4b6f-bcca-350e3369931e', vmId='1f41e617-5e95-4086-aa86-d93205bf482e', srcHost='lynx16.lab.eng.tlv2.redhat.com', dstVdsId='896e262b-9f5b-411e-8320-10e22689101e', dstHost='lynx17.lab.eng.tlv2.redhat.com:54321', migrationMethod='ONLINE', tunnelMigration='false', migrationDowntime='0', autoConverge='false', migrateCompressed='false', consoleAddress='null', maxBandwidth='5000', enableGuestEvents='false', maxIncomingMigrations='2', maxOutgoingMigrations='2', convergenceSchedule='null', dstQemu='10.46.16.32'}), log id: 24987c0b
2018-06-05 05:46:21,094+03 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateBrokerVDSCommand] (default task-19) [vms_syncAction_3d82143b-da9f-4d63] FINISH, MigrateBrokerVDSCommand, log id: 24987c0b
2018-06-05 05:46:21,099+03 INFO  [org.ovirt.engine.core.vdsbroker.MigrateVDSCommand] (default task-19) [vms_syncAction_3d82143b-da9f-4d63] FINISH, MigrateVDSCommand, return: MigratingFrom, log id: 5e1758bf
2018-06-05 05:46:21,114+03 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-19) [vms_syncAction_3d82143b-da9f-4d63] EVENT_ID: VM_MIGRATION_START(62), Migration started (VM: HostedEngine, Source: host_mixed_2, Destination: host_mixed_3, User: admin@internal-authz). 
2018-06-05 05:46:23,058+03 INFO  [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-8) [] VM '1f41e617-5e95-4086-aa86-d93205bf482e' was reported as Down on VDS '896e262b-9f5b-411e-8320-10e22689101e'(host_mixed_3)
2018-06-05 05:46:23,060+03 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (ForkJoinPool-1-worker-8) [] START, DestroyVDSCommand(HostName = host_mixed_3, DestroyVmVDSCommandParameters:{hostId='896e262b-9f5b-411e-8320-10e22689101e', vmId='1f41e617-5e95-4086-aa86-d93205bf482e', secondsToWait='0', gracefully='false', reason='', ignoreNoVm='true'}), log id: 46797286
2018-06-05 05:46:24,108+03 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (ForkJoinPool-1-worker-8) [] Failed to destroy VM '1f41e617-5e95-4086-aa86-d93205bf482e' because VM does not exist, ignoring
2018-06-05 05:46:24,108+03 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (ForkJoinPool-1-worker-8) [] FINISH, DestroyVDSCommand, log id: 46797286
2018-06-05 05:46:24,108+03 INFO  [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-8) [] VM '1f41e617-5e95-4086-aa86-d93205bf482e'(HostedEngine) was unexpectedly detected as 'Down' on VDS '896e262b-9f5b-411e-8320-10e22689101e'(host_mixed_3) (expected on '75a05865-47f1-4b6f-bcca-350e3369931e')
2018-06-05 05:46:24,108+03 ERROR [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-8) [] Migration of VM 'HostedEngine' to host 'host_mixed_3' failed: VM destroyed during the startup.


Additional info:
Host info CPU and capabilities (source rhel host)
#cat /proc/cpuinfo 
processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 62
model name	: Intel(R) Xeon(R) CPU E5-2603 v2 @ 1.80GHz
stepping	: 4
microcode	: 0x42c
cpu MHz		: 1800.219
cache size	: 10240 KB
physical id	: 0
siblings	: 4
core id		: 0
cpu cores	: 4
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm epb tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt ibpb ibrs stibp dtherm arat pln pts spec_ctrl intel_stibp
bogomips	: 3599.99
clflush size	: 64
cache_alignment	: 64
address sizes	: 46 bits physical, 48 bits virtual
power management:

virsh -r capabilities
<capabilities>

  <host>
    <uuid>c322929e-3a01-4af0-aba3-45e4af67c073</uuid>
    <cpu>
      <arch>x86_64</arch>
      <model>IvyBridge-IBRS</model>
      <vendor>Intel</vendor>
      <microcode version='1068'/>
      <topology sockets='1' cores='4' threads='1'/>
      <feature name='ds'/>
      <feature name='acpi'/>
      <feature name='ss'/>
      <feature name='ht'/>
      <feature name='tm'/>
      <feature name='pbe'/>
      <feature name='dtes64'/>
      <feature name='monitor'/>
      <feature name='ds_cpl'/>
      <feature name='vmx'/>
      <feature name='smx'/>
      <feature name='est'/>
      <feature name='tm2'/>
      <feature name='xtpr'/>
      <feature name='pdcm'/>
      <feature name='pcid'/>
      <feature name='dca'/>
      <feature name='osxsave'/>
      <feature name='arat'/>
      <feature name='stibp'/>
      <feature name='xsaveopt'/>
      <feature name='pdpe1gb'/>
      <feature name='invtsc'/>
      <pages unit='KiB' size='4'/>
      <pages unit='KiB' size='1048576'/>
    </cpu>
    <power_management>
      <suspend_mem/>
      <suspend_disk/>
      <suspend_hybrid/>
    </power_management>
    <migration_features>
      <live/>
      <uri_transports>
        <uri_transport>tcp</uri_transport>
        <uri_transport>rdma</uri_transport>
      </uri_transports>
    </migration_features>
    <topology>
      <cells num='2'>
        <cell id='0'>
          <memory unit='KiB'>33503860</memory>
          <pages unit='KiB' size='4'>8375965</pages>
          <pages unit='KiB' size='1048576'>4</pages>
          <distances>
            <sibling id='0' value='10'/>
            <sibling id='1' value='21'/>
          </distances>
          <cpus num='4'>
            <cpu id='0' socket_id='0' core_id='0' siblings='0'/>
            <cpu id='1' socket_id='0' core_id='1' siblings='1'/>
            <cpu id='2' socket_id='0' core_id='2' siblings='2'/>
            <cpu id='3' socket_id='0' core_id='3' siblings='3'/>
          </cpus>
        </cell>
        <cell id='1'>
          <pages unit='KiB' size='4'>0</pages>
          <distances>
            <sibling id='0' value='21'/>
            <sibling id='1' value='10'/>
          </distances>
          <cpus num='4'>
            <cpu id='4' socket_id='1' core_id='0' siblings='4'/>
            <cpu id='5' socket_id='1' core_id='1' siblings='5'/>
            <cpu id='6' socket_id='1' core_id='2' siblings='6'/>
            <cpu id='7' socket_id='1' core_id='3' siblings='7'/>
          </cpus>
        </cell>
      </cells>
    </topology>
    <cache>
      <bank id='0' level='3' type='both' size='10' unit='MiB' cpus='0-3'/>
      <bank id='1' level='3' type='both' size='10' unit='MiB' cpus='4-7'/>
    </cache>
    <secmodel>
      <model>selinux</model>
      <doi>0</doi>
      <baselabel type='kvm'>system_u:system_r:svirt_t:s0</baselabel>
      <baselabel type='qemu'>system_u:system_r:svirt_tcg_t:s0</baselabel>
    </secmodel>
    <secmodel>
      <model>dac</model>
      <doi>0</doi>
      <baselabel type='kvm'>+107:+107</baselabel>
      <baselabel type='qemu'>+107:+107</baselabel>
    </secmodel>
  </host>

  <guest>
    <os_type>hvm</os_type>
    <arch name='i686'>
      <wordsize>32</wordsize>
      <emulator>/usr/libexec/qemu-kvm</emulator>
      <machine maxCpus='240'>pc-i440fx-rhel7.5.0</machine>
      <machine canonical='pc-i440fx-rhel7.5.0' maxCpus='240'>pc</machine>
      <machine maxCpus='240'>pc-i440fx-rhel7.0.0</machine>
      <machine maxCpus='240'>rhel6.3.0</machine>
      <machine maxCpus='240'>rhel6.4.0</machine>
      <machine maxCpus='240'>rhel6.0.0</machine>
      <machine maxCpus='240'>pc-i440fx-rhel7.1.0</machine>
      <machine maxCpus='240'>pc-i440fx-rhel7.2.0</machine>
      <machine maxCpus='255'>pc-q35-rhel7.3.0</machine>
      <machine maxCpus='240'>rhel6.5.0</machine>
      <machine maxCpus='384'>pc-q35-rhel7.4.0</machine>
      <machine maxCpus='240'>rhel6.6.0</machine>
      <machine maxCpus='240'>rhel6.1.0</machine>
      <machine maxCpus='240'>rhel6.2.0</machine>
      <machine maxCpus='240'>pc-i440fx-rhel7.3.0</machine>
      <machine maxCpus='240'>pc-i440fx-rhel7.4.0</machine>
      <machine maxCpus='384'>pc-q35-rhel7.5.0</machine>
      <machine canonical='pc-q35-rhel7.5.0' maxCpus='384'>q35</machine>
      <domain type='qemu'/>
      <domain type='kvm'>
        <emulator>/usr/libexec/qemu-kvm</emulator>
      </domain>
    </arch>
    <features>
      <cpuselection/>
      <deviceboot/>
      <disksnapshot default='on' toggle='no'/>
      <acpi default='on' toggle='yes'/>
      <apic default='on' toggle='no'/>
      <pae/>
      <nonpae/>
    </features>
  </guest>

  <guest>
    <os_type>hvm</os_type>
    <arch name='x86_64'>
      <wordsize>64</wordsize>
      <emulator>/usr/libexec/qemu-kvm</emulator>
      <machine maxCpus='240'>pc-i440fx-rhel7.5.0</machine>
      <machine canonical='pc-i440fx-rhel7.5.0' maxCpus='240'>pc</machine>
      <machine maxCpus='240'>pc-i440fx-rhel7.0.0</machine>
      <machine maxCpus='240'>rhel6.3.0</machine>
      <machine maxCpus='240'>rhel6.4.0</machine>
      <machine maxCpus='240'>rhel6.0.0</machine>
      <machine maxCpus='240'>pc-i440fx-rhel7.1.0</machine>
      <machine maxCpus='240'>pc-i440fx-rhel7.2.0</machine>
      <machine maxCpus='255'>pc-q35-rhel7.3.0</machine>
      <machine maxCpus='240'>rhel6.5.0</machine>
      <machine maxCpus='384'>pc-q35-rhel7.4.0</machine>
      <machine maxCpus='240'>rhel6.6.0</machine>
      <machine maxCpus='240'>rhel6.1.0</machine>
      <machine maxCpus='240'>rhel6.2.0</machine>
      <machine maxCpus='240'>pc-i440fx-rhel7.3.0</machine>
      <machine maxCpus='240'>pc-i440fx-rhel7.4.0</machine>
      <machine maxCpus='384'>pc-q35-rhel7.5.0</machine>
      <machine canonical='pc-q35-rhel7.5.0' maxCpus='384'>q35</machine>
      <domain type='qemu'/>
      <domain type='kvm'>
        <emulator>/usr/libexec/qemu-kvm</emulator>
      </domain>
    </arch>
    <features>
      <cpuselection/>
      <deviceboot/>
      <disksnapshot default='on' toggle='no'/>
      <acpi default='on' toggle='yes'/>
      <apic default='on' toggle='no'/>
    </features>
  </guest>

</capabilities>


Host info CPU and capabilities (destination rhvh host)

# cat /proc/cpuinfo | more
processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 62
model name	: Intel(R) Xeon(R) CPU E5-2603 v2 @ 1.80GHz
stepping	: 4
microcode	: 0x428
cpu MHz		: 1800.000
cache size	: 10240 KB
physical id	: 0
siblings	: 4
core id		: 0
cpu cores	: 4
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perf
mon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadli
ne_timer aes xsave avx f16c rdrand lahf_lm epb tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm arat pln pts
bogomips	: 3600.02
clflush size	: 64
cache_alignment	: 64
address sizes	: 46 bits physical, 48 bits virtual
power management:

# virsh -r capabilities
<capabilities>

  <host>
    <uuid>705895e1-e886-4358-9c25-d84c1bfc9f47</uuid>
    <cpu>
      <arch>x86_64</arch>
      <model>IvyBridge</model>
      <vendor>Intel</vendor>
      <microcode version='1064'/>
      <topology sockets='1' cores='4' threads='1'/>
      <feature name='ds'/>
      <feature name='acpi'/>
      <feature name='ss'/>
      <feature name='ht'/>
      <feature name='tm'/>
      <feature name='pbe'/>
      <feature name='dtes64'/>
      <feature name='monitor'/>
      <feature name='ds_cpl'/>
      <feature name='vmx'/>
      <feature name='smx'/>
      <feature name='est'/>
      <feature name='tm2'/>
      <feature name='xtpr'/>
      <feature name='pdcm'/>
      <feature name='pcid'/>
      <feature name='dca'/>
      <feature name='osxsave'/>
      <feature name='arat'/>
      <feature name='xsaveopt'/>
      <feature name='pdpe1gb'/>
      <feature name='invtsc'/>
      <pages unit='KiB' size='4'/>
      <pages unit='KiB' size='2048'/>
      <pages unit='KiB' size='1048576'/>
    </cpu>
    <power_management>
      <suspend_mem/>
      <suspend_disk/>
      <suspend_hybrid/>
    </power_management>
    <migration_features>
      <live/>
      <uri_transports>
        <uri_transport>tcp</uri_transport>
        <uri_transport>rdma</uri_transport>
      </uri_transports>
    </migration_features>
    <topology>
      <cells num='2'>
        <cell id='0'>
          <memory unit='KiB'>33503860</memory>
          <pages unit='KiB' size='4'>8375965</pages>
          <pages unit='KiB' size='2048'>0</pages>
          <pages unit='KiB' size='1048576'>4</pages>
          <distances>
            <sibling id='0' value='10'/>
            <sibling id='1' value='21'/>
          </distances>
          <cpus num='4'>
            <cpu id='0' socket_id='0' core_id='0' siblings='0'/>
            <cpu id='1' socket_id='0' core_id='1' siblings='1'/>
            <cpu id='2' socket_id='0' core_id='2' siblings='2'/>
            <cpu id='3' socket_id='0' core_id='3' siblings='3'/>
          </cpus>
        </cell>
        <cell id='1'>
          <pages unit='KiB' size='4'>0</pages>
          <distances>
            <sibling id='0' value='21'/>
            <sibling id='1' value='10'/>
          </distances>
          <cpus num='4'>
            <cpu id='4' socket_id='1' core_id='0' siblings='4'/>
            <cpu id='5' socket_id='1' core_id='1' siblings='5'/>
            <cpu id='6' socket_id='1' core_id='2' siblings='6'/>
            <cpu id='7' socket_id='1' core_id='3' siblings='7'/>
          </cpus>
        </cell>
      </cells>
    </topology>
    <cache>
      <bank id='0' level='3' type='both' size='10' unit='MiB' cpus='0-3'/>
      <bank id='1' level='3' type='both' size='10' unit='MiB' cpus='4-7'/>
    </cache>
    <secmodel>
      <model>selinux</model>
      <doi>0</doi>
      <baselabel type='kvm'>system_u:system_r:svirt_t:s0</baselabel>
      <baselabel type='qemu'>system_u:system_r:svirt_tcg_t:s0</baselabel>
    </secmodel>
    <secmodel>
      <model>dac</model>
      <doi>0</doi>
      <baselabel type='kvm'>+107:+107</baselabel>
      <baselabel type='qemu'>+107:+107</baselabel>
    </secmodel>
  </host>

  <guest>
    <os_type>hvm</os_type>
    <arch name='i686'>
      <wordsize>32</wordsize>
      <emulator>/usr/libexec/qemu-kvm</emulator>
      <machine maxCpus='240'>pc-i440fx-rhel7.5.0</machine>
      <machine canonical='pc-i440fx-rhel7.5.0' maxCpus='240'>pc</machine>
      <machine maxCpus='240'>pc-i440fx-rhel7.0.0</machine>
      <machine maxCpus='240'>rhel6.3.0</machine>
      <machine maxCpus='240'>rhel6.4.0</machine>
      <machine maxCpus='240'>rhel6.0.0</machine>
      <machine maxCpus='240'>pc-i440fx-rhel7.1.0</machine>
      <machine maxCpus='240'>pc-i440fx-rhel7.2.0</machine>
      <machine maxCpus='255'>pc-q35-rhel7.3.0</machine>
      <machine maxCpus='240'>rhel6.5.0</machine>
      <machine maxCpus='384'>pc-q35-rhel7.4.0</machine>
      <machine maxCpus='240'>rhel6.6.0</machine>
      <machine maxCpus='240'>rhel6.1.0</machine>
      <machine maxCpus='240'>rhel6.2.0</machine>
      <machine maxCpus='240'>pc-i440fx-rhel7.3.0</machine>
      <machine maxCpus='240'>pc-i440fx-rhel7.4.0</machine>
      <machine maxCpus='384'>pc-q35-rhel7.5.0</machine>
      <machine canonical='pc-q35-rhel7.5.0' maxCpus='384'>q35</machine>
      <domain type='qemu'/>
      <domain type='kvm'>
        <emulator>/usr/libexec/qemu-kvm</emulator>
      </domain>
    </arch>
    <features>
      <cpuselection/>
      <deviceboot/>
      <disksnapshot default='on' toggle='no'/>
      <acpi default='on' toggle='yes'/>
      <apic default='on' toggle='no'/>
      <pae/>
      <nonpae/>
    </features>
  </guest>

  <guest>
    <os_type>hvm</os_type>
    <arch name='x86_64'>
      <wordsize>64</wordsize>
      <emulator>/usr/libexec/qemu-kvm</emulator>
      <machine maxCpus='240'>pc-i440fx-rhel7.5.0</machine>
      <machine canonical='pc-i440fx-rhel7.5.0' maxCpus='240'>pc</machine>
      <machine maxCpus='240'>pc-i440fx-rhel7.0.0</machine>
      <machine maxCpus='240'>rhel6.3.0</machine>
      <machine maxCpus='240'>rhel6.4.0</machine>
      <machine maxCpus='240'>rhel6.0.0</machine>
      <machine maxCpus='240'>pc-i440fx-rhel7.1.0</machine>
      <machine maxCpus='240'>pc-i440fx-rhel7.2.0</machine>
      <machine maxCpus='255'>pc-q35-rhel7.3.0</machine>
      <machine maxCpus='240'>rhel6.5.0</machine>
      <machine maxCpus='384'>pc-q35-rhel7.4.0</machine>
      <machine maxCpus='240'>rhel6.6.0</machine>
      <machine maxCpus='240'>rhel6.1.0</machine>
      <machine maxCpus='240'>rhel6.2.0</machine>
      <machine maxCpus='240'>pc-i440fx-rhel7.3.0</machine>
      <machine maxCpus='240'>pc-i440fx-rhel7.4.0</machine>
      <machine maxCpus='384'>pc-q35-rhel7.5.0</machine>
      <machine canonical='pc-q35-rhel7.5.0' maxCpus='384'>q35</machine>
      <domain type='qemu'/>
      <domain type='kvm'>
        <emulator>/usr/libexec/qemu-kvm</emulator>
      </domain>
    </arch>
    <features>
      <cpuselection/>
      <deviceboot/>
      <disksnapshot default='on' toggle='no'/>
      <acpi default='on' toggle='yes'/>
      <apic default='on' toggle='no'/>
    </features>
  </guest>

</capabilities>

Comment 1 Israel Pinto 2018-06-05 08:57:39 UTC
Created attachment 1447761 [details]
engine log

Comment 2 Israel Pinto 2018-06-05 09:01:01 UTC
Created attachment 1447762 [details]
source host log

Comment 4 Yaniv Kaul 2018-06-05 13:12:12 UTC
Israel, has the destination CPU been patched with IBRS support?

Comment 5 Israel Pinto 2018-06-05 13:28:12 UTC
See output of 'virsh -r capabilities' for both host:      
The source host is with IBRS: <model>IvyBridge-IBRS</model>
The destination host (the RHEVH) is without IBRS: <model>IvyBridge</model>

Comment 6 Israel Pinto 2018-06-05 13:30:30 UTC
(In reply to Israel Pinto from comment #5)
> See output of 'virsh -r capabilities' for both host:      
> The source host is with IBRS: <model>IvyBridge-IBRS</model>
> The destination host (the RHEVH) is without IBRS: <model>IvyBridge</model>

Cluster is 'Intel Westmere Family'

Comment 7 Yaniv Kaul 2018-06-05 13:33:13 UTC
(In reply to Israel Pinto from comment #5)
> See output of 'virsh -r capabilities' for both host:      
> The source host is with IBRS: <model>IvyBridge-IBRS</model>
> The destination host (the RHEVH) is without IBRS: <model>IvyBridge</model>

Yes, that was clear from the CPU flags:
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm epb tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt ibpb ibrs stibp dtherm arat pln pts spec_ctrl intel_stibp

vs.:
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm epb tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm arat pln pts

(see the last flags).

But the issue is with the HE VM - how is it configured. Can you check the shared configuration?

Comment 8 Israel Pinto 2018-06-05 13:42:34 UTC
From the guest i see that 'spec_ctrl' is set and the model is like the host,
But from the UI the Guest CPU Type is: Intel Westmere Family 
(is it also BZ ?)
[root@hosted-engine-02 ~]# cat /proc/cpuinfo 
processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 42
model name	: Intel Xeon E312xx (Sandy Bridge)
stepping	: 1
microcode	: 0x1
cpu MHz		: 1799.999
cache size	: 16384 KB
physical id	: 0
siblings	: 4
core id		: 0
cpu cores	: 4
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_tsc rep_good nopl xtopology eagerfpu pni pclmulqdq ssse3 cx16 sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx hypervisor lahf_lm xsaveopt ibpb ibrs arat spec_ctrl
bogomips	: 3599.99
clflush size	: 64
cache_alignment	: 64
address sizes	: 46 bits physical, 48 bits virtual
power management:

Comment 9 Israel Pinto 2018-06-05 14:05:30 UTC
[root@lynx14 ~]# cat /run/ovirt-hosted-engine-ha/vm.conf
# Editing the hosted engine VM is only possible via the manager UI\API
# This file was generated at Tue Jun  5 17:04:36 2018

cpuType=Westmere
emulatedMachine=pc-i440fx-rhel7.5.0
vmId=1f41e617-5e95-4086-aa86-d93205bf482e
smp=4
memSize=16384
maxVCpus=64
spiceSecureChannels=smain,sdisplay,sinputs,scursor,splayback,srecord,ssmartcard,susbredir
xmlBase64=PD94bWwgdmVyc2lvbj0nMS4wJyBlbmNvZGluZz0nVVRGLTgnPz4KPGRvbWFpbiB4bWxuczpvdmlydC10dW5lPSJodHRwOi8vb3ZpcnQub3JnL3ZtL3R1bmUvMS4wIiB4bWxuczpvdmlydC12bT0iaHR0cDovL292aXJ0Lm9yZy92bS8xLjAiIHR5cGU9Imt2bSI+PG5hbWU+SG9zdGVkRW5naW5lPC9uYW1lPjx1dWlkPjFmNDFlNjE3LTVlOTUtNDA4Ni1hYTg2LWQ5MzIwNWJmNDgyZTwvdXVpZD48bWVtb3J5PjE2Nzc3MjE2PC9tZW1vcnk+PGN1cnJlbnRNZW1vcnk+MTY3NzcyMTY8L2N1cnJlbnRNZW1vcnk+PG1heE1lbW9yeSBzbG90cz0iMTYiPjY3MTA4ODY0PC9tYXhNZW1vcnk+PHZjcHUgY3VycmVudD0iNCI+NjQ8L3ZjcHU+PHN5c2luZm8gdHlwZT0ic21iaW9zIj48c3lzdGVtPjxlbnRyeSBuYW1lPSJtYW51ZmFjdHVyZXIiPm9WaXJ0PC9lbnRyeT48ZW50cnkgbmFtZT0icHJvZHVjdCI+T1MtTkFNRTo8L2VudHJ5PjxlbnRyeSBuYW1lPSJ2ZXJzaW9uIj5PUy1WRVJTSU9OOjwvZW50cnk+PGVudHJ5IG5hbWU9InNlcmlhbCI+SE9TVC1TRVJJQUw6PC9lbnRyeT48ZW50cnkgbmFtZT0idXVpZCI+MWY0MWU2MTctNWU5NS00MDg2LWFhODYtZDkzMjA1YmY0ODJlPC9lbnRyeT48L3N5c3RlbT48L3N5c2luZm8+PGNsb2NrIG9mZnNldD0idmFyaWFibGUiIGFkanVzdG1lbnQ9IjAiPjx0aW1lciBuYW1lPSJydGMiIHRpY2twb2xpY3k9ImNhdGNodXAiLz48dGltZXIgbmFtZT0icGl0IiB0aWNrcG9saWN5PSJkZWxheSIvPjx0aW1lciBuYW1lPSJocGV0IiBwcmVzZW50PSJubyIvPjwvY2xvY2s+PGZlYXR1cmVzPjxhY3BpLz48dm1jb3JlaW5mby8+PC9mZWF0dXJlcz48Y3B1IG1hdGNoPSJleGFjdCI+PG1vZGVsPldlc3RtZXJlPC9tb2RlbD48dG9wb2xvZ3kgY29yZXM9IjQiIHRocmVhZHM9IjEiIHNvY2tldHM9IjE2Ii8+PG51bWE+PGNlbGwgaWQ9IjAiIGNwdXM9IjAsMSwyLDMiIG1lbW9yeT0iMTY3NzcyMTYiLz48L251bWE+PC9jcHU+PGNwdXR1bmUvPjxkZXZpY2VzPjxpbnB1dCB0eXBlPSJtb3VzZSIgYnVzPSJwczIiLz48Y2hhbm5lbCB0eXBlPSJ1bml4Ij48dGFyZ2V0IHR5cGU9InZpcnRpbyIgbmFtZT0ib3ZpcnQtZ3Vlc3QtYWdlbnQuMCIvPjxzb3VyY2UgbW9kZT0iYmluZCIgcGF0aD0iL3Zhci9saWIvbGlidmlydC9xZW11L2NoYW5uZWxzLzFmNDFlNjE3LTVlOTUtNDA4Ni1hYTg2LWQ5MzIwNWJmNDgyZS5vdmlydC1ndWVzdC1hZ2VudC4wIi8+PC9jaGFubmVsPjxjaGFubmVsIHR5cGU9InVuaXgiPjx0YXJnZXQgdHlwZT0idmlydGlvIiBuYW1lPSJvcmcucWVtdS5ndWVzdF9hZ2VudC4wIi8+PHNvdXJjZSBtb2RlPSJiaW5kIiBwYXRoPSIvdmFyL2xpYi9saWJ2aXJ0L3FlbXUvY2hhbm5lbHMvMWY0MWU2MTctNWU5NS00MDg2LWFhODYtZDkzMjA1YmY0ODJlLm9yZy5xZW11Lmd1ZXN0X2FnZW50LjAiLz48L2NoYW5uZWw+PGNvbnRyb2xsZXIgdHlwZT0ic2NzaSIgbW9kZWw9InZpcnRpby1zY3NpIiBpbmRleD0iMCI+PGFsaWFzIG5hbWU9InVhLTAzYmM3ZWRiLTJiYTQtNDhhYy04YzVkLWU4Njk5NGRiZWI2NyIvPjxhZGRyZXNzIGJ1cz0iMHgwMCIgZG9tYWluPSIweDAwMDAiIGZ1bmN0aW9uPSIweDAiIHNsb3Q9IjB4MDUiIHR5cGU9InBjaSIvPjwvY29udHJvbGxlcj48Z3JhcGhpY3MgdHlwZT0ic3BpY2UiIHBvcnQ9Ii0xIiBhdXRvcG9ydD0ieWVzIiBwYXNzd2Q9IioqKioqIiBwYXNzd2RWYWxpZFRvPSIxOTcwLTAxLTAxVDAwOjAwOjAxIiB0bHNQb3J0PSItMSI+PGNoYW5uZWwgbmFtZT0ibWFpbiIgbW9kZT0ic2VjdXJlIi8+PGNoYW5uZWwgbmFtZT0iaW5wdXRzIiBtb2RlPSJzZWN1cmUiLz48Y2hhbm5lbCBuYW1lPSJjdXJzb3IiIG1vZGU9InNlY3VyZSIvPjxjaGFubmVsIG5hbWU9InBsYXliYWNrIiBtb2RlPSJzZWN1cmUiLz48Y2hhbm5lbCBuYW1lPSJyZWNvcmQiIG1vZGU9InNlY3VyZSIvPjxjaGFubmVsIG5hbWU9ImRpc3BsYXkiIG1vZGU9InNlY3VyZSIvPjxjaGFubmVsIG5hbWU9InNtYXJ0Y2FyZCIgbW9kZT0ic2VjdXJlIi8+PGNoYW5uZWwgbmFtZT0idXNicmVkaXIiIG1vZGU9InNlY3VyZSIvPjxsaXN0ZW4gdHlwZT0ibmV0d29yayIgbmV0d29yaz0idmRzbS1vdmlydG1nbXQiLz48L2dyYXBoaWNzPjxjb250cm9sbGVyIHR5cGU9ImlkZSIgaW5kZXg9IjAiPjxhZGRyZXNzIGJ1cz0iMHgwMCIgZG9tYWluPSIweDAwMDAiIGZ1bmN0aW9uPSIweDEiIHNsb3Q9IjB4MDEiIHR5cGU9InBjaSIvPjwvY29udHJvbGxlcj48Y29udHJvbGxlciB0eXBlPSJ2aXJ0aW8tc2VyaWFsIiBpbmRleD0iMCIgcG9ydHM9IjE2Ij48YWxpYXMgbmFtZT0idWEtN2I1NTg0NjMtZTAyZC00Y2JlLWI5NGUtYzM4ZWMzZGJjYTI0Ii8+PGFkZHJlc3MgYnVzPSIweDAwIiBkb21haW49IjB4MDAwMCIgZnVuY3Rpb249IjB4MCIgc2xvdD0iMHgwNiIgdHlwZT0icGNpIi8+PC9jb250cm9sbGVyPjxncmFwaGljcyB0eXBlPSJ2bmMiIHBvcnQ9Ii0xIiBhdXRvcG9ydD0ieWVzIiBwYXNzd2Q9IioqKioqIiBwYXNzd2RWYWxpZFRvPSIxOTcwLTAxLTAxVDAwOjAwOjAxIiBrZXltYXA9ImVuLXVzIj48bGlzdGVuIHR5cGU9Im5ldHdvcmsiIG5ldHdvcms9InZkc20tb3ZpcnRtZ210Ii8+PC9ncmFwaGljcz48cm5nIG1vZGVsPSJ2aXJ0aW8iPjxiYWNrZW5kIG1vZGVsPSJyYW5kb20iPi9kZXYvdXJhbmRvbTwvYmFja2VuZD48YWxpYXMgbmFtZT0idWEtODIyNTgwOGItZjQxNy00YWEyLWIwNjAtYzkyZWJhOGZiMDc4Ii8+PC9ybmc+PHNvdW5kIG1vZGVsPSJpY2g2Ij48YWxpYXMgbmFtZT0idWEtODJjMzlkOTEtNmM0YS00NTUxLWFmMmMtODk2ODg2M2M1MmM0Ii8+PGFkZHJlc3MgYnVzPSIweDAwIiBkb21haW49IjB4MDAwMCIgZnVuY3Rpb249IjB4MCIgc2xvdD0iMHgwNCIgdHlwZT0icGNpIi8+PC9zb3VuZD48Y29udHJvbGxlciB0eXBlPSJ1c2IiIG1vZGVsPSJwaWl4My11aGNpIiBpbmRleD0iMCI+PGFkZHJlc3MgYnVzPSIweDAwIiBkb21haW49IjB4MDAwMCIgZnVuY3Rpb249IjB4MiIgc2xvdD0iMHgwMSIgdHlwZT0icGNpIi8+PC9jb250cm9sbGVyPjx2aWRlbz48bW9kZWwgdHlwZT0icXhsIiB2cmFtPSIzMjc2OCIgaGVhZHM9IjEiIHJhbT0iNjU1MzYiIHZnYW1lbT0iMTYzODQiLz48YWxpYXMgbmFtZT0idWEtYTkzYzNkMjYtYzMwZi00YzA4LWJjNDMtYjAzODdiZGZmNDgyIi8+PGFkZHJlc3MgYnVzPSIweDAwIiBkb21haW49IjB4MDAwMCIgZnVuY3Rpb249IjB4MCIgc2xvdD0iMHgwMiIgdHlwZT0icGNpIi8+PC92aWRlbz48bWVtYmFsbG9vbiBtb2RlbD0idmlydGlvIj48c3RhdHMgcGVyaW9kPSI1Ii8+PGFsaWFzIG5hbWU9InVhLWRmZjdkYjcyLWJjNGEtNGM2Mi1hZGY2LTQwZDUxOTgwNWEzMCIvPjxhZGRyZXNzIGJ1cz0iMHgwMCIgZG9tYWluPSIweDAwMDAiIGZ1bmN0aW9uPSIweDAiIHNsb3Q9IjB4MDgiIHR5cGU9InBjaSIvPjwvbWVtYmFsbG9vbj48Y2hhbm5lbCB0eXBlPSJzcGljZXZtYyI+PHRhcmdldCB0eXBlPSJ2aXJ0aW8iIG5hbWU9ImNvbS5yZWRoYXQuc3BpY2UuMCIvPjwvY2hhbm5lbD48aW50ZXJmYWNlIHR5cGU9ImJyaWRnZSI+PG1vZGVsIHR5cGU9InZpcnRpbyIvPjxsaW5rIHN0YXRlPSJ1cCIvPjxzb3VyY2UgYnJpZGdlPSJvdmlydG1nbXQiLz48YWxpYXMgbmFtZT0idWEtNWQzOGNkYzEtOTAyMy00ODA4LTg1MjktZTk1ZTY5MmZlYjViIi8+PGFkZHJlc3MgYnVzPSIweDAwIiBkb21haW49IjB4MDAwMCIgZnVuY3Rpb249IjB4MCIgc2xvdD0iMHgwMyIgdHlwZT0icGNpIi8+PG1hYyBhZGRyZXNzPSIwMDoxNjozZTo3YjplMDowNyIvPjxmaWx0ZXJyZWYgZmlsdGVyPSJ2ZHNtLW5vLW1hYy1zcG9vZmluZyIvPjxiYW5kd2lkdGgvPjwvaW50ZXJmYWNlPjxkaXNrIHR5cGU9ImZpbGUiIGRldmljZT0iY2Ryb20iIHNuYXBzaG90PSJubyI+PGRyaXZlciBuYW1lPSJxZW11IiB0eXBlPSJyYXciIGVycm9yX3BvbGljeT0icmVwb3J0Ii8+PHNvdXJjZSBmaWxlPSIiIHN0YXJ0dXBQb2xpY3k9Im9wdGlvbmFsIi8+PHRhcmdldCBkZXY9ImhkYyIgYnVzPSJpZGUiLz48cmVhZG9ubHkvPjxhbGlhcyBuYW1lPSJ1YS1jY2NiODIyNS1mZWZlLTQzMGEtODhlMS0wNjMyODgwZWY5ZmQiLz48YWRkcmVzcyBidXM9IjEiIGNvbnRyb2xsZXI9IjAiIHVuaXQ9IjAiIHR5cGU9ImRyaXZlIiB0YXJnZXQ9IjAiLz48L2Rpc2s+PGRpc2sgc25hcHNob3Q9Im5vIiB0eXBlPSJmaWxlIiBkZXZpY2U9ImRpc2siPjx0YXJnZXQgZGV2PSJ2ZGEiIGJ1cz0idmlydGlvIi8+PHNvdXJjZSBmaWxlPSIvcmhldi9kYXRhLWNlbnRlci8wMDAwMDAwMC0wMDAwLTAwMDAtMDAwMC0wMDAwMDAwMDAwMDAvYTJhNmYxNWUtYjczYi00ZjNkLTgxYmItZTVjY2I1YTUzNzZiL2ltYWdlcy84NGFmYjBlNS1iNGExLTQ5MjYtODI3NC0yNmJmYmE4ZjUwNmYvNThmNTQ5ZDAtNTQ2Yy00Mzg5LWEwODUtYTcyODc1NTBiZDcxIi8+PGRyaXZlciBuYW1lPSJxZW11IiBpbz0idGhyZWFkcyIgdHlwZT0icmF3IiBlcnJvcl9wb2xpY3k9InN0b3AiIGNhY2hlPSJub25lIi8+PGFsaWFzIG5hbWU9InVhLTg0YWZiMGU1LWI0YTEtNDkyNi04Mjc0LTI2YmZiYThmNTA2ZiIvPjxhZGRyZXNzIGJ1cz0iMHgwMCIgZG9tYWluPSIweDAwMDAiIGZ1bmN0aW9uPSIweDAiIHNsb3Q9IjB4MDciIHR5cGU9InBjaSIvPjxzZXJpYWw+ODRhZmIwZTUtYjRhMS00OTI2LTgyNzQtMjZiZmJhOGY1MDZmPC9zZXJpYWw+PC9kaXNrPjxsZWFzZT48a2V5PjU4ZjU0OWQwLTU0NmMtNDM4OS1hMDg1LWE3Mjg3NTUwYmQ3MTwva2V5Pjxsb2Nrc3BhY2U+YTJhNmYxNWUtYjczYi00ZjNkLTgxYmItZTVjY2I1YTUzNzZiPC9sb2Nrc3BhY2U+PHRhcmdldCBvZmZzZXQ9IkxFQVNFLU9GRlNFVDo1OGY1NDlkMC01NDZjLTQzODktYTA4NS1hNzI4NzU1MGJkNzE6YTJhNmYxNWUtYjczYi00ZjNkLTgxYmItZTVjY2I1YTUzNzZiIiBwYXRoPSJMRUFTRS1QQVRIOjU4ZjU0OWQwLTU0NmMtNDM4OS1hMDg1LWE3Mjg3NTUwYmQ3MTphMmE2ZjE1ZS1iNzNiLTRmM2QtODFiYi1lNWNjYjVhNTM3NmIiLz48L2xlYXNlPjwvZGV2aWNlcz48cG0+PHN1c3BlbmQtdG8tZGlzayBlbmFibGVkPSJubyIvPjxzdXNwZW5kLXRvLW1lbSBlbmFibGVkPSJubyIvPjwvcG0+PG9zPjx0eXBlIGFyY2g9Ing4Nl82NCIgbWFjaGluZT0icGMtaTQ0MGZ4LXJoZWw3LjUuMCI+aHZtPC90eXBlPjxzbWJpb3MgbW9kZT0ic3lzaW5mbyIvPjwvb3M+PG1ldGFkYXRhPjxvdmlydC10dW5lOnFvcy8+PG92aXJ0LXZtOnZtPjxtaW5HdWFyYW50ZWVkTWVtb3J5TWIgdHlwZT0iaW50Ij4xMDI0PC9taW5HdWFyYW50ZWVkTWVtb3J5TWI+PGNsdXN0ZXJWZXJzaW9uPjQuMjwvY2x1c3RlclZlcnNpb24+PG92aXJ0LXZtOmN1c3RvbS8+PG92aXJ0LXZtOmRldmljZSBtYWNfYWRkcmVzcz0iMDA6MTY6M2U6N2I6ZTA6MDciPjxvdmlydC12bTpjdXN0b20vPjwvb3ZpcnQtdm06ZGV2aWNlPjxvdmlydC12bTpkZXZpY2UgZGV2dHlwZT0iZGlzayIgbmFtZT0idmRhIj48b3ZpcnQtdm06cG9vbElEPjAwMDAwMDAwLTAwMDAtMDAwMC0wMDAwLTAwMDAwMDAwMDAwMDwvb3ZpcnQtdm06cG9vbElEPjxvdmlydC12bTp2b2x1bWVJRD41OGY1NDlkMC01NDZjLTQzODktYTA4NS1hNzI4NzU1MGJkNzE8L292aXJ0LXZtOnZvbHVtZUlEPjxvdmlydC12bTpzaGFyZWQ+ZXhjbHVzaXZlPC9vdmlydC12bTpzaGFyZWQ+PG92aXJ0LXZtOmltYWdlSUQ+ODRhZmIwZTUtYjRhMS00OTI2LTgyNzQtMjZiZmJhOGY1MDZmPC9vdmlydC12bTppbWFnZUlEPjxvdmlydC12bTpkb21haW5JRD5hMmE2ZjE1ZS1iNzNiLTRmM2QtODFiYi1lNWNjYjVhNTM3NmI8L292aXJ0LXZtOmRvbWFpbklEPjwvb3ZpcnQtdm06ZGV2aWNlPjxsYXVuY2hQYXVzZWQ+ZmFsc2U8L2xhdW5jaFBhdXNlZD48cmVzdW1lQmVoYXZpb3I+YXV0b19yZXN1bWU8L3Jlc3VtZUJlaGF2aW9yPjwvb3ZpcnQtdm06dm0+PC9tZXRhZGF0YT48L2RvbWFpbj4=
vmName=HostedEngine
display=qxl
devices={index:0,iface:virtio,format:raw,bootOrder:1,address:{type:pci,slot:0x07,bus:0x00,domain:0x0000,function:0x0},volumeID:58f549d0-546c-4389-a085-a7287550bd71,imageID:84afb0e5-b4a1-4926-8274-26bfba8f506f,readonly:false,domainID:a2a6f15e-b73b-4f3d-81bb-e5ccb5a5376b,deviceId:84afb0e5-b4a1-4926-8274-26bfba8f506f,poolID:00000000-0000-0000-0000-000000000000,device:disk,shared:exclusive,propagateErrors:off,type:disk}
devices={nicModel:pv,macAddr:00:16:3e:7b:e0:07,linkActive:true,network:ovirtmgmt,deviceId:5d38cdc1-9023-4808-8529-e95e692feb5b,address:{type:pci,slot:0x03,bus:0x00,domain:0x0000,function:0x0},device:bridge,type:interface}
devices={alias:video0,specParams:{vram:32768,vgamem:16384,heads:1,ram:65536},deviceId:a93c3d26-c30f-4c08-bc43-b0387bdff482,address:{type:pci,slot:0x02,bus:0x00,domain:0x0000,function:0x0},device:qxl,type:video}
devices={device:spice,type:graphics,deviceId:2705e98a-7610-4069-aae7-6a88ef47738b,address:None}
devices={device:vnc,type:graphics,deviceId:7eb9ded3-473f-488e-a676-73c17ec62f57,address:None}
devices={index:2,iface:ide,shared:false,readonly:true,deviceId:8c3179ac-b322-4f5c-9449-c52e3665e0ae,address:{controller:0,target:0,unit:0,bus:1,type:drive},device:cdrom,path:,type:disk}
devices={device:ide,specParams:{index:0},type:controller,deviceId:279a65b6-3489-44c0-abc8-111828062e34,address:{type:pci,slot:0x01,bus:0x00,domain:0x0000,function:0x1}}
devices={alias:ua-8225808b-f417-4aa2-b060-c92eba8fb078,specParams:{source:urandom},deviceId:8225808b-f417-4aa2-b060-c92eba8fb078,address:{type:pci,slot:0x09,bus:0x00,domain:0x0000,function:0x0},device:virtio,model:virtio,type:rng}
devices={device:usb,specParams:{index:0,model:piix3-uhci},type:controller,deviceId:91f25294-76be-4082-b544-58be835d9638,address:{type:pci,slot:0x01,bus:0x00,domain:0x0000,function:0x2}}
devices={device:scsi,model:virtio-scsi,type:controller,deviceId:03bc7edb-2ba4-48ac-8c5d-e86994dbeb67,address:{type:pci,slot:0x05,bus:0x00,domain:0x0000,function:0x0}}
devices={device:virtio-serial,type:controller,deviceId:7b558463-e02d-4cbe-b94e-c38ec3dbca24,address:{type:pci,slot:0x06,bus:0x00,domain:0x0000,function:0x0}}
devices={device:console,type:console}

Comment 10 Yaniv Kaul 2018-06-05 14:46:20 UTC
Simone, how do we launch the HE? Where did it get the extra feature?
(BTW, we should probably close it as a documentation item, but I'd like to understand the details first)

Comment 11 Simone Tiraboschi 2018-06-05 15:14:55 UTC
With up-to-date 4.2 rpms on the host, we directly launch it with the XML for libvirt generated by the engine and saved by the engine in the OVF_STORE volumes.
That XML is base64 encoded and saved in vm.conf in xmlBase64 field.
4.1 hosts are instead consuming vm.conf ignoring xmlBase64 field.

So in the XML from xmlBase64 field on https://bugzilla.redhat.com/show_bug.cgi?id=1585986#c9 I read:
   <cpu match="exact">
      <model>Westmere</model>
      <topology cores="4" threads="1" sockets="16" />
      <numa>
         <cell id="0" cpus="0,1,2,3" memory="16777216" />
      </numa>
   </cpu>

Comment 12 Sandro Bonazzola 2018-06-06 08:06:30 UTC
Moving back to virt team since on hosted engine we don't touch such settings.
Are the 2 hosts on the same cluster?

Comment 13 Israel Pinto 2018-06-06 08:11:38 UTC
1. Both hosts are on the same cluster.
2. The cluster CPU family type is older then the hosts
   Host are IvyBridge , Cluster is: Westmere

Comment 14 Michal Skrivanek 2018-06-06 18:05:24 UTC
logs only contain HostedEngine VM as incoming migration, and it's coming with SandyBridge already. Please attach logs with HE parameters when it originally starts

Comment 15 Israel Pinto 2018-06-07 12:55:45 UTC
I see first record of HE VM is started on lynx14:
The agnet and broker on this host are started after it:
MainThread::INFO::2018-06-04 22:40:26,567::hosted_engine::491::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop) Current state EngineUp (score: 3400)

In the vm.conf.fallback you can see that the cpu-type is SandyBridge-IBRS
-rw-r--r--. 1 vdsm kvm 1391 Jun  5 17:03 /run/ovirt-hosted-engine-ha/vm.conf.fallback

vmId=1f41e617-5e95-4086-aa86-d93205bf482e
memSize=16384
display=vnc
devices={index:2,iface:ide,address:{ controller:0, target:0,unit:0, bus:1, type:drive},specParams:{},readonly:true,deviceId:dff26b50-1111-4c60-bde8-7d017e60d99c,path:,device:cdrom,shared:false,type:disk}
devices={index:0,iface:virtio,format:raw,poolID:00000000-0000-0000-0000-000000000000,volumeID:58f549d0-546c-4389-a085-a7287550bd71,imageID:84afb0e5-b4a1-4926-8274-26bfba8f506f,specParams:{},readonly:false,domainID:a2a6f15e-b73b-4f3d-81bb-e5ccb5a5376b,optional:false,deviceId:58f549d0-546c-4389-a085-a7287550bd71,address:{bus:0x00, slot:0x06, domain:0x0000, type:pci, function:0x0},device:disk,shared:exclusive,propagateErrors:off,type:disk,bootOrder:1}
devices={device:scsi,model:virtio-scsi,type:controller}
devices={nicModel:pv,macAddr:00:16:3e:7b:e0:07,linkActive:true,network:ovirtmgmt,specParams:{},deviceId:bf42fa4a-13eb-4ec1-b864-1e4f792b8693,address:{bus:0x00, slot:0x03, domain:0x0000, type:pci, function:0x0},device:bridge,type:interface}
devices={device:console,type:console}
devices={device:vga,alias:video0,type:video}
devices={device:vnc,type:graphics}
vmName=HostedEngine
spiceSecureChannels=smain,sdisplay,sinputs,scursor,splayback,srecord,ssmartcard,susbredir
smp=4
maxVCpus=4
cpuType=SandyBridge-IBRS
emulatedMachine=rhel6.5.0
devices={device:virtio,specParams:{source:urandom},model:virtio,type:rng}

vdsm log:
2018-06-05 18:01:11,895+0300 INFO  (jsonrpc/6) [jsonrpc.JsonRpcServer] RPC call Host.ping2 succeeded in 0.00 seconds (__init__:573)
2018-06-05 18:01:11,899+0300 INFO  (jsonrpc/0) [api.virt] START getStats() from=::1,41404, vmId=1f41e617-5e95-4086-aa86-d93205bf482e (api:46)
2018-06-05 18:01:11,901+0300 INFO  (jsonrpc/0) [api.virt] FINISH getStats return={'status': {'message': 'Done', 'code': 0}, 'statsList': [{'vcpuCount': '4', 'memUsage': '25', 'acpiEnable': 'true', 'displayInfo': [{'tlsPort': '5901', 'ipAddress': '10.46.16.29', 'type': 'spice', 'port': '5900'}, {'tlsPort': '-1', 'ipAddress': '10.46.16.29', 'type': 'vnc', 'port': '5902'}], 'guestFQDN': u'hosted-engine-02.lab.eng.tlv2.redhat.com', 'vmId': '1f41e617-5e95-4086-aa86-d93205bf482e', 'session': 'Unknown', 'vmType': 'kvm', 'timeOffset': '-1', 'balloonInfo': {'balloon_max': '16777216', 'balloon_min': '1048576', 'balloon_target': '16777216', 'balloon_cur': '16777216'}, 'disksUsage': [{u'path': u'/', u'total': '7638876160', u'fs': u'xfs', u'used': '3544621056'}, {u'path': u'/boot', u'total': '1063256064', u'fs': u'xfs', u'used': '178348032'}, {u'path': u'/home', u'total': '1063256064', u'fs': u'xfs', u'used': '33783808'}, {u'path': u'/var', u'total': '21464350720', u'fs': u'xfs', u'used': '414855168'}, {u'path': u'/var/log', u'total': '10726932480', u'fs': u'xfs', u'used': '66744320'}, {u'path': u'/var/log/audit', u'total': '1063256064', u'fs': u'xfs', u'used': '34189312'}], 'network': {'vnet0': {'macAddr': '00:16:3e:7b:e0:07', 'rxDropped': '0', 'tx': '246041090', 'rxErrors': '0', 'txDropped': '0', 'rx': '125043285', 'txErrors': '0', 'state': 'unknown', 'sampleTime': 4367026.95, 'speed': '1000', 'name': 'vnet0'}}, 'vmJobs': {}, 'cpuUser': '5.40', 'elapsedTime': '72790', 'memoryStats': {'swap_out': 0, 'majflt': 0, 'minflt': 186, 'mem_cached': '583088', 'mem_free': '11905848', 'mem_buffers': '0', 'swap_in': 0, 'pageflt': 186, 'mem_total': '16258532', 'mem_unused': '11905848'}, 'cpuSys': '1.20', 'appsList': (u'kernel-3.10.0-862.3.2.el7', u'ovirt-guest-agent-common-1.0.14-3.el7ev', u'cloud-init-0.7.9-24.el7'), 'guestOs': u'3.10.0-862.3.2.el7.x86_64', 'vmName': 'HostedEngine', 'status': 'Up', 'clientIp': '', 'hash': '-4550103827432548493', 'guestCPUCount': 4, 'cpuUsage': '1619630000000', 'vcpuPeriod': 100000L, 'guestTimezone': {u'zone': u'Asia/Jerusalem', u'offset': 120}, 'vcpuQuota': '-1', 'statusTime': '4367026950', 'kvmEnable': 'true', 'disks': {'vda': {'readLatency': '0', 'flushLatency': '84263', 'readRate': '0.0', 'writeRate': '39526.4', 'writtenBytes': '1435334656', 'truesize': '4398194688', 'apparentsize': '62277025792', 'readOps': '7162', 'writeLatency': '601236', 'imageID': '84afb0e5-b4a1-4926-8274-26bfba8f506f', 'readBytes': '175622656', 'writeOps': '119159'}, 'hdc': {'readLatency': '0', 'flushLatency': '0', 'readRate': '0.0', 'writeRate': '0.0', 'writtenBytes': '0', 'truesize': '0', 'apparentsize': '0', 'readOps': '0', 'writeLatency': '0', 'readBytes': '0', 'writeOps': '0'}}, 'monitorResponse': '0', 'guestOsInfo': {u'kernel': u'3.10.0-862.3.2.el7.x86_64', u'arch': u'x86_64', u'version': u'7.5', u'distribution': u'Red Hat Enterprise Linux Server', u'type': u'linux', u'codename': u'Maipo'}, 'username': u'None', 'guestName': u'hosted-engine-02.lab.eng.tlv2.redhat.com', 'lastLogin': 1528208020.854908, 'guestIPs': u'10.46.16.197', 'guestContainers': [], 'netIfaces': [{u'inet6': [], u'hw': u'00:16:3e:7b:e0:07', u'inet': [u'10.46.16.197'], u'name': u'eth0'}]}]} from=::1,41404, vmId=1f41e617-5e95-4086-aa86-d93205bf482e (api:52)

Comment 16 Israel Pinto 2018-06-07 12:59:47 UTC
Created attachment 1448712 [details]
vdsm log HE VM first started

Comment 17 Michal Skrivanek 2018-06-07 13:49:01 UTC
(In reply to Israel Pinto from comment #16)
> Created attachment 1448712 [details]
> vdsm log HE VM first started

I do not see any VM start in that attached log file. Please doublecheck

Comment 19 Israel Pinto 2018-06-07 13:58:42 UTC
If i want to reproduce it, i need to start the HE VM first on the host with the IBRS. And them migration it to host without IBRS?

Comment 20 Michal Skrivanek 2018-06-07 14:01:38 UTC
(In reply to Israel Pinto from comment #19)
> If i want to reproduce it, i need to start the HE VM first on the host with
> the IBRS. And them migration it to host without IBRS?

I do not know, you opened the bug:) From your description it indeed sounds as a way how to reproduce it.

Comment 21 Michal Skrivanek 2018-06-07 14:04:38 UTC
Israel, 
in the original report you said the type is Westmere, and Simone pointed out the configuration looks like that. But in comment #15 you said the configuration contains SandyBridge-IBRS (which corresponds to the actual type the VM is running with) - so...if you started with that type and later changed configuration and didn't restart it then it is still running with the original CPU which is not going to migrate to a "lower" host

Comment 22 Israel Pinto 2018-06-07 14:35:52 UTC
(In reply to Michal Skrivanek from comment #21)
> Israel, 
> in the original report you said the type is Westmere, and Simone pointed out
> the configuration looks like that. But in comment #15 you said the
> configuration contains SandyBridge-IBRS (which corresponds to the actual
> type the VM is running with) - so...if you started with that type and later
> changed configuration and didn't restart it then it is still running with
> the original CPU which is not going to migrate to a "lower" host

Yes the cluster is set with Westmere and 2 host are with set with SandyBridge-IBRS (the rhel hosts) and one host is with IvyBridge
See  virsh -r capabilities in 
https://bugzilla.redhat.com/show_bug.cgi?id=1585986#c0

The HE VM was migrate from the SandyBridge-IBRS to the IvyBridge and failed.
I understand it since the mis-configuration of the hosts and clusters
But is it really problem we need to handle or the error here here is since the setting is not right?
If so then it just need to be documented (if it not)

Comment 23 Michal Skrivanek 2018-06-08 14:05:42 UTC
it indeed is a problem, I just do not understand how did you get to that state. Please describe exactly how to reproduce this

Comment 24 Nikolai Sednev 2018-06-11 09:22:32 UTC
Deployed SHE over NFS on IBRS capable host rose07 and this is what I see in vm.conf on host:
rose07 ~]# cat /run/ovirt-hosted-engine-ha/vm.conf |grep spec
cpuType=SandyBridge,+spec-ctrl
devices={alias:video0,specParams:{vram:32768,vgamem:16384,heads:1,ram:65536},deviceId:20d306f4-c9f3-42f3-b5f6-5da1e3cca6e2,address:None,device:qxl,type:video}
devices={device:usb,specParams:{index:0,model:piix3-uhci},type:controller,deviceId:aba6e8f6-a48a-4054-9c2b-f45d973ef243,address:None}
devices={alias:None,specParams:{source:urandom},deviceId:8a6dc0d9-17c0-41f0-b9d3-b1a98d8b618d,address:None,device:virtio,model:virtio,type:rng}

rose07 ~]# virsh -r capabilities | head
<capabilities>

  <host>
    <uuid>a8821478-30a0-4281-8552-7d98a34b74bf</uuid>
    <cpu>
      <arch>x86_64</arch>
      <model>IvyBridge-IBRS</model>
      <vendor>Intel</vendor>
      <microcode version='31'/>
      <topology sockets='1' cores='4' threads='2'/>

rose07 ~]# cat /sys/kernel/debug/x86/ibrs_enabled
0
[root@rose07 ~]# cat /sys/kernel/debug/x86/pti_enabled
1
[root@rose07 ~]# cat /sys/kernel/debug/x86/ibpb_enabled
1


On SHE-VM on rose07:
he-1 ~]# cat /sys/kernel/debug/x86/ibrs_enabled
0
[root@nsednev-he-1 ~]# cat /sys/kernel/debug/x86/pti_enabled
1
[root@nsednev-he-1 ~]# cat /sys/kernel/debug/x86/ibpb_enabled
1


This is puma18, host without IBRS capability and this is what I see on in vm.conf of SHE VM that was deployed on it:

puma18 ~]# virsh -r capabilities | headcat /sys/kernel/debug/x86/ibrs_enabled
-bash: headcat: command not found

puma18 ~]# cat /sys/kernel/debug/x86/ibrs_enabled
0
[root@puma18 ~]# cat /sys/kernel/debug/x86/pti_enabled
1
[root@puma18 ~]# cat /sys/kernel/debug/x86/ibpb_enabled
0

puma18 ~]# cat /run/ovirt-hosted-engine-ha/vm.conf | grep spec
devices={alias:video0,specParams:{vram:32768,vgamem:16384,heads:1,ram:65536},deviceId:bac7b5d8-d0f0-4c54-8f3f-b8cd03beb72c,address:{type:pci,slot:0x02,bus:0x00,domain:0x0000,function:0x0},device:qxl,type:video}
devices={device:ide,specParams:{index:0},type:controller,deviceId:4a0c0494-7957-4e7b-b30b-e50d0a3ebae3,address:{type:pci,slot:0x01,bus:0x00,domain:0x0000,function:0x1}}
devices={alias:ua-43a67516-6c0a-422e-a677-d0700038bcc0,specParams:{source:urandom},deviceId:43a67516-6c0a-422e-a677-d0700038bcc0,address:{type:pci,slot:0x09,bus:0x00,domain:0x0000,function:0x0},device:virtio,model:virtio,type:rng}
devices={device:usb,specParams:{index:0,model:piix3-uhci},type:controller,deviceId:d8b569fd-54a4-4333-8510-11970a053700,address:{type:pci,slot:0x01,bus:0x00,domain:0x0000,function:0x2}}

You can clearly see that on SHE-VM that is running over rose07 (IBRS capable host) there is +spec-ctrl capability flag within the configuration of SHE-VM.

Comment 25 Michal Skrivanek 2018-06-11 09:32:48 UTC
Nikolai, what is the question?

What's the cluster's CPU setting?
If there is a mismatch between HE configuration and the Cluster setting then this needs to be solved at the HE VM configuration level, Simone. 
First I'd like a confirmation that it's indeed the case, and how to get to that state.

Comment 26 Nikolai Sednev 2018-06-11 10:36:54 UTC
(In reply to Michal Skrivanek from comment #25)
> Nikolai, what is the question?
> 
> What's the cluster's CPU setting?
> If there is a mismatch between HE configuration and the Cluster setting then
> this needs to be solved at the HE VM configuration level, Simone. 
> First I'd like a confirmation that it's indeed the case, and how to get to
> that state.

CPU type in UI is " Intel SandyBridge IBRS Family", CPU Architecture is x84_64 that is if I'm thinking correctly about the expected answer to your question.

I've set the needinfo to ask Francesco to take a look at what was possible the root cause of the failed migration.

Comment 27 Nikolai Sednev 2018-06-11 11:12:20 UTC
Tested on these components:
ovirt-engine-4.2.4.1-0.1.el7.noarch
ovirt-hosted-engine-ha-2.2.13-1.el7ev.noarch
ovirt-hosted-engine-setup-2.2.22-1.el7ev.noarch
rhvm-appliance-4.2-20180601.0.el7.noarch
Linux 3.10.0-862.6.1.el7.x86_64 #1 SMP Mon Jun 4 15:33:25 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 7.5 (Maipo)

Comment 28 Michal Skrivanek 2018-06-11 12:00:36 UTC
how did you deploy it on puma18 with SandyBridge-IBRS CPU when that host is not IBRS capable - how is it that host operational in a SandyBridge-IBRS cluster?

Comment 29 Simone Tiraboschi 2018-06-11 12:09:23 UTC
(In reply to Michal Skrivanek from comment #25)
> What's the cluster's CPU setting?
> If there is a mismatch between HE configuration and the Cluster setting then
> this needs to be solved at the HE VM configuration level, Simone. 

Hosted-engine-setup isn't going to explicitly set cluster CPU type or VM CPU type. It simply adds the first host letting the engine implicitly choose the rest from there.

Comment 30 Israel Pinto 2018-06-11 12:51:05 UTC
I reproduce it:
1. Deploy HE on XXX-IBRS host (in our case Intel SandyBridge IBRS)
   VM config file have the spec-ctrl cpu flag.   
   cat /run/ovirt-hosted-engine-ha/vm.conf |grep spec
   cpuType=SandyBridge,+spec-ctrl

   After deploy the cluster is XXX IBRS Family in out case:
   Intel SandyBridge IBRS Family

2. Update cluster CPU type to None XXX-IBRS (in out case Intel Conroe Family)
3. Deploy Host which is not XXX-IBRS
4. Migrate VM from XXX-IBRS host to None IBRS host.

Migration failed:
vdsm source host log:
2018-06-11 15:41:57,634+0300 ERROR (migsrc/64875e38) [virt.vm] (vmId='64875e38-0adb-4f43-a2d2-82e0d7372efe') operation failed: guest CPU doesn't match specification: missing features: xsave,avx,spec-ctrl,xsaveopt (migration:290)
2018-06-11 15:41:58,551+0300 ERROR (migsrc/64875e38) [virt.vm] (vmId='64875e38-0adb-4f43-a2d2-82e0d7372efe') Failed to migrate (migration:455)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/virt/migration.py", line 437, in _regular_run
    self._startUnderlyingMigration(time.time())
  File "/usr/lib/python2.7/site-packages/vdsm/virt/migration.py", line 509, in _startUnderlyingMigration
    self._perform_with_conv_schedule(duri, muri)
  File "/usr/lib/python2.7/site-packages/vdsm/virt/migration.py", line 587, in _perform_with_conv_schedule
    self._perform_migration(duri, muri)
  File "/usr/lib/python2.7/site-packages/vdsm/virt/migration.py", line 529, in _perform_migration
    self._migration_flags)
  File "/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line 98, in f
    ret = attr(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/common/libvirtconnection.py", line 130, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/common/function.py", line 92, in wrapper
    return func(inst, *args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1746, in migrateToURI3
    if ret == -1: raise libvirtError ('virDomainMigrateToURI3() failed', dom=self)
libvirtError: operation failed: guest CPU doesn't match specification: missing features: xsave,avx,spec-ctrl,xsaveopt

Comment 31 Israel Pinto 2018-06-11 12:58:51 UTC
Created attachment 1450033 [details]
logs_11_6_2018

Comment 32 Michal Skrivanek 2018-06-11 14:56:28 UTC
(In reply to Simone Tiraboschi from comment #29)
> (In reply to Michal Skrivanek from comment #25)
> > What's the cluster's CPU setting?
> > If there is a mismatch between HE configuration and the Cluster setting then
> > this needs to be solved at the HE VM configuration level, Simone. 
> 
> Hosted-engine-setup isn't going to explicitly set cluster CPU type or VM CPU
> type. It simply adds the first host letting the engine implicitly choose the
> rest from there.

It either need to follow desired Cluster CPU picked at deployment time or the Cluster created during the deployment need to match the first host. IIUC that works fine and the cluster is created with a correct CPU type.
Now if that Cluster configuration is later changed to a "lower" CPU then the HE VM need to be reconfigured accordingly. This is a HE specific logic - there's a different code path for HE VM skipping the Cluster update checks - and a solution suitable for HE VM need to be implemented. Deferring to Integration team as to how.

Comment 33 Simone Tiraboschi 2018-06-11 15:16:09 UTC
(In reply to Michal Skrivanek from comment #32)
> It either need to follow desired Cluster CPU picked at deployment time or
> the Cluster created during the deployment need to match the first host. IIUC
> that works fine and the cluster is created with a correct CPU type.
> Now if that Cluster configuration is later changed to a "lower" CPU then the
> HE VM need to be reconfigured accordingly. This is a HE specific logic -
> there's a different code path for HE VM skipping the Cluster update checks -
> and a solution suitable for HE VM need to be implemented. Deferring to
> Integration team as to how.

We can implement all the upfront checks we want, but they are not going to solve this bug or prevent it: if the user can lower the cluster specs on engine side with no checks or impact on the running hosted-engine VM, this will still be going to happen regardless of any initial check.

Comment 34 Red Hat Bugzilla Rules Engine 2018-06-18 07:40:32 UTC
This bug report has Keywords: Regression or TestBlocker.
Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP.

Comment 36 Israel Pinto 2018-06-21 13:02:35 UTC
Some updates i try the following to see if it solve the cpu flag problem.
1. After adding the new host with lower CPU type, wait for 1 hour for the HE VM to updated with CPU flags. 
Results: Checking the cpu flags on the VM after 1 hour get the same flags no change.
2. Stop HE VM and run it on the host with low CPU.
Results:
VM failed to run, vdsm log, flags issue:
2018-06-21 15:51:27,582+0300 ERROR (vm/8e2ce9a7) [virt.vm] (vmId='8e2ce9a7-d1d7-477b-8b91-f6e6ee48dc62') The vm start process failed (vm:943)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 872, in _startUnderlyingVm
    self._run()
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2876, in _run
    dom.createWithFlags(flags)
  File "/usr/lib/python2.7/site-packages/vdsm/common/libvirtconnection.py", line 130, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/common/function.py", line 92, in wrapper
    return func(inst, *args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1099, in createWithFlags
    if ret == -1: raise libvirtError ('virDomainCreateWithFlags() failed', dom=self)
libvirtError: the CPU is incompatible with host CPU: Host CPU does not provide required features: xsave, avx, spec-ctrl
2018-06-21 15:51:27,583+0300 INFO  (vm/8e2ce9a7) [virt.vm] (vmId='8e2ce9a7-d1d7-477b-8b91-f6e6ee48dc62') Changed state to Down: the CPU is incompatible with host CPU: Host CPU does not provide required features: xsave, avx, spec-ctrl (code=1) (vm:1683)

Comment 37 Martin Sivák 2018-07-19 09:04:03 UTC
I think the issue is simply that the generation id of the VM is not bumped and so the OVF is not recomputed.

Comment 38 Michal Skrivanek 2018-07-19 13:12:13 UTC
I do not see an issue with 1h refresh, you do not move your critical component to different CPU models very often. I'd be fine with closing as WONTFIX, but if you have a patch already, fine as well...

Comment 39 Sandro Bonazzola 2018-09-21 07:08:48 UTC
Not identified as blocker for 4.2.7, moving to 4.2.8

Comment 49 Simone Tiraboschi 2019-03-21 13:37:01 UTC
Bhushan, can you please execute
  sudo -u postgres scl enable rh-postgresql95 -- psql -d engine -c "SELECT vm_name, cluster_name, cluster_cpu_name, cpu_name, custom_cpu_name FROM vms WHERE origin=6"
on the engine VM and share its output?

Comment 53 Simone Tiraboschi 2019-03-21 22:18:55 UTC
Bhushan,
I can only suggest to try editing the hosted-engine VM description or change the number of cores or the memory amount just to try forcing a quick regeneration of the OVF store disks.

And run onm the engine VM
grep "UploadStreamCommand.*3ee26ff5-afb1-412a-89ef-a289ac109fed" /var/log/ovirt-engine/engine.log

until you see something new.

Comment 54 Simone Tiraboschi 2019-03-21 22:27:04 UTC
Another option is try manually forcing an OVF_STOREs update from the storage domains tab: see the attached screenshot.

Comment 55 Simone Tiraboschi 2019-03-21 22:27:55 UTC
Created attachment 1546714 [details]
trigger OVF_STORE update

Comment 56 Simone Tiraboschi 2019-03-21 22:44:18 UTC
I opened a more specific bug here: https://bugzilla.redhat.com/1691562

Comment 57 Ryan Barry 2019-03-21 23:31:21 UTC
I'm not sure that the other bug is valid. The updates are continuing to run. Here's the latest successful one for the hosted engine, but there are 5 days of updates before this (with a regeneration at 13:35 and one at 14:35), and days after with no update for Hosted Engine. Other VMs are updated as normal, though.

2019-03-06 14:35:25,923+01 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.SetVolumeDescriptionVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [11b12d15] START, SetVolumeDescriptionVDSCommand( SetVolumeDescriptionVDSCommandParameters:{storagePoolId='b74b6e90-2fa7-11e9-90a8-00163e1f0044', ignoreFailoverLimit='false', storageDomainId='3ee26ff5-afb1-412a-89ef-a289ac109fed', imageGroupId='556be753-69bd-413c-87e2-e170d3fca9db', imageId='5cd0085a-094d-4806-a251-1af759241b4d'}), log id: 7669d4bc
2019-03-06 14:35:25,923+01 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.SetVolumeDescriptionVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [11b12d15] -- executeIrsBrokerCommand: calling 'setVolumeDescription', parameters:
2019-03-06 14:35:25,923+01 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.SetVolumeDescriptionVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [11b12d15] ++ spUUID=b74b6e90-2fa7-11e9-90a8-00163e1f0044
2019-03-06 14:35:25,923+01 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.SetVolumeDescriptionVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [11b12d15] ++ sdUUID=3ee26ff5-afb1-412a-89ef-a289ac109fed
2019-03-06 14:35:25,923+01 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.SetVolumeDescriptionVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [11b12d15] ++ imageGroupGUID=556be753-69bd-413c-87e2-e170d3fca9db
2019-03-06 14:35:25,923+01 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.SetVolumeDescriptionVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [11b12d15] ++ volUUID=5cd0085a-094d-4806-a251-1af759241b4d
2019-03-06 14:35:25,923+01 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.SetVolumeDescriptionVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [11b12d15] ++ description={"Updated":true,"Size":30720,"Last Updated":"Wed Mar 06 14:35:25 CET 2019","Storage Domains":[{"uuid":"3ee26ff5-afb1-412a-89ef-a289ac109fed"}],"Disk Description":"OVF_STORE"}
2019-03-06 14:35:25,971+01 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.SetVolumeDescriptionVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [11b12d15] FINISH, SetVolumeDescriptionVDSCommand, log id: 7669d4bc
2019-03-06 14:35:25,979+01 INFO  [org.ovirt.engine.core.bll.storage.ovfstore.ProcessOvfUpdateForStorageDomainCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [11b12d15] Lock freed to object 'EngineLock:{exclusiveLocks='[3ee26ff5-afb1-412a-89ef-a289ac109fed=STORAGE]', sharedLocks='[b74b6e90-2fa7-11e9-90a8-00163e1f0044=OVF_UPDATE]'}'
2019-03-06 14:35:27,075+01 INFO  [org.ovirt.engine.core.bll.tasks.AsyncTaskManager] (EE-ManagedThreadFactory-engineScheduled-Thread-76) [] Polling and updating Async Tasks: 4 tasks, 4 tasks to poll now
2019-03-06 14:35:27,081+01 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-76) [] SPMAsyncTask::PollTask: Polling task '0e05

However, from an overview of the code, I'm not sure we actually update the timer every hour. I could be wrong here.

Instead, it appears to be updated as part of UpdateVmCommand, and I don't use a USER_UPDATE_VM command for HostedEngine. The cluster level change to sandybridge is also before the log starts...

It is possible that HostedEngine is an exception to these updates, since it's intended to be explicitly managed, and not to violate normal validations. 


Bhushan, have you tried comment#53?

Andrej, please keep me honest here.

Comment 58 Simone Tiraboschi 2019-03-22 08:20:04 UTC
(In reply to Ryan Barry from comment #57)
> It is possible that HostedEngine is an exception to these updates, since
> it's intended to be explicitly managed, and not to violate normal
> validations. 

Please take care that, although we don't recommend it, nothing is really preventing the user from creating regular VMs on the hosted-engine storage domain so it should absolutely behave as a regular storage domain from this point of view.

Comment 59 Andrej Krejcir 2019-03-22 12:17:15 UTC
The OVFs in hosted engine storage domain are updated every hour, same as any other domain. The OVF update is also triggered whenever the HE VM is updated by UpdateVmCommand.

Looking at the engine logs, the OVF update is triggering every hour:

cat engine.log | grep -i 'Successfully updated VM OVFs in Data Center'

2019-03-21 06:52:57,263+01 INFO  [org.ovirt.engine.core.bll.storage.ovfstore.ProcessOvfUpdateForStoragePoolCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-46) [422a9d00] Successfully updated VM OVFs in Data Center 'Default'
2019-03-21 07:52:57,278+01 INFO  [org.ovirt.engine.core.bll.storage.ovfstore.ProcessOvfUpdateForStoragePoolCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-19) [2eb148ea] Successfully updated VM OVFs in Data Center 'Default'
2019-03-21 08:52:57,294+01 INFO  [org.ovirt.engine.core.bll.storage.ovfstore.ProcessOvfUpdateForStoragePoolCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-30) [d63f4e6] Successfully updated VM OVFs in Data Center 'Default'
2019-03-21 09:52:57,310+01 INFO  [org.ovirt.engine.core.bll.storage.ovfstore.ProcessOvfUpdateForStoragePoolCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-12) [58e79de8] Successfully updated VM OVFs in Data Center 'Default'


In the DB dump, the HE OVF also contains the incorrect CPU name: 
<CustomCpuName>Skylake-Server,+spec-ctrl,+ssbd</CustomCpuName>

The problem seems to be that updating the cluster CPU did not increase the VM generation number, so a new OVF is not generated. 

As a workaround, the HE VM generation number can be increased by editing the HE VM somehow, as mentioned in comment#53.

Comment 61 Michal Skrivanek 2019-03-27 08:51:39 UTC
(In reply to Andrej Krejcir from comment #59)

> The problem seems to be that updating the cluster CPU did not increase the
> VM generation number, so a new OVF is not generated. 

Yes, because HE is explicitly skipped during UpdateClusterCommand right here: https://github.com/oVirt/ovirt-engine/blob/6bae27af75ddf187d1b3306cffb847244327c452/backend/manager/modules/bll/src/main/java/org/ovirt/engine/core/bll/UpdateClusterCommand.java#L157

Comment 64 Sandro Bonazzola 2019-07-11 07:02:26 UTC
Re-targeting to 4.3.6 not being identified as blocker for 4.3.5.

Comment 65 Sandro Bonazzola 2019-08-01 14:57:40 UTC
Moving to 4.4 since it depends on bug  #1691562 which is targeted to 4.4

Comment 66 Daniel Gur 2019-08-28 13:14:53 UTC
sync2jira

Comment 67 Daniel Gur 2019-08-28 13:19:56 UTC
sync2jira

Comment 70 Michal Skrivanek 2020-01-07 12:40:01 UTC
comment #59 is still relevant, I believe. without that the HE VM is *not* updated when editing a cluster.

Comment 71 Sandro Bonazzola 2020-01-08 07:54:58 UTC
So 2 things to work on:
- can we get UpdateVmCommand triggered on the vms running in a cluster when the cluster is updated? this will ensure VMs will have the correct values after the cluster update.
- seems like we have a bit of confusion on what's next step, there's also https://bugzilla.redhat.com/show_bug.cgi?id=1691562 which at this point seems a duplicate.

Comment 72 Ryan Barry 2020-01-08 13:25:14 UTC
Not quite a duplicate, both must be done, but it's already properly in depends on

Comment 73 Andrej Krejcir 2020-01-30 08:59:18 UTC
This bug will be fixed by patches in Bug 1691562.

However it is no longer possible to decrease the compatibility version of existing Data Centers through the UI. This was changed in Bug 1753628.

So on new deployments, the API have to be used to decrease the version of DC where the HE VM is running.

Comment 74 Sandro Bonazzola 2020-03-16 13:22:33 UTC
This bug is in modified and targeting 4.4.2. Can we re-target to 4.4.0 and move to qe?

Comment 75 Nikolai Sednev 2020-06-23 11:09:46 UTC
Deployed latest HE on IBRS host and attached regular none-IBRS ha-host to the environment.
Attached additional 2 IBRS ha-hosts.
Tested with Software Version:4.4.1.2-0.10.el8ev
Tried to migrate IBRS to none IBRS ha-host ocelot01.qa.lab.tlv.redhat.com and engine got disconnected me from it during migration, then it failed to get started on ocelot01.

--== Host ocelot01.qa.lab.tlv.redhat.com (id: 2) status ==--

Host ID                            : 2
Host timestamp                     : 5166
Score                              : 3400
Engine status                      : {"vm": "up", "health": "bad", "detail": "Up", "reason": "failed liveliness check"}
Hostname                           : ocelot01.qa.lab.tlv.redhat.com
Local maintenance                  : False
stopped                            : False
crc32                              : f4602df7
conf_on_shared_storage             : True
local_conf_timestamp               : 5166
Status up-to-date                  : True
Extra metadata (valid at timestamp):
        metadata_parse_version=1
        metadata_feature_version=1
        timestamp=5166 (Tue Jun 23 13:52:22 2020)
        host-id=2
        score=3400
        vm_conf_refresh_time=5166 (Tue Jun 23 13:52:22 2020)
        conf_on_shared_storage=True
        maintenance=False
        state=EngineStarting
        stopped=False


Some CPU details about the environment:

Engine CPU appeared as follows:
nsednev-he-1 ~]# cat /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 58
model name      : Intel Xeon E3-12xx v2 (Ivy Bridge)
stepping        : 9
microcode       : 0x1
cpu MHz         : 3292.522
cache size      : 16384 KB
physical id     : 0
siblings        : 4
core id         : 0
cpu cores       : 4
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_tsc rep_good nopl xtopology cpuid tsc_known_freq pni pclmulqdq ssse3 cx16 sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm cpuid_fault pti fsgsbase smep erms xsaveopt arat
bugs            : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit
bogomips        : 6585.04
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

IBRS ha-host's CPUs appeared as follows:
1.
alma07 ~]# cat /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 58
model name      : Intel(R) Xeon(R) CPU E3-1230 V2 @ 3.30GHz
stepping        : 9
microcode       : 0x21
cpu MHz         : 1760.423
cache size      : 8192 KB
physical id     : 0
siblings        : 8
core id         : 0
cpu cores       : 4
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm cpuid_fault pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm arat pln pts md_clear flush_l1d
bugs            : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit srbds
bogomips        : 6585.56
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

2.
alma04 ~]# cat /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 62
model name      : Intel(R) Xeon(R) CPU E5-2603 v2 @ 1.80GHz
stepping        : 4
microcode       : 0x42e
cpu MHz         : 1799.888
cache size      : 10240 KB
physical id     : 0
siblings        : 4
core id         : 0
cpu cores       : 4
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm cpuid_fault epb pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm arat pln pts md_clear flush_l1d
bugs            : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit
bogomips        : 3599.79
clflush size    : 64
cache_alignment : 64
address sizes   : 46 bits physical, 48 bits virtual
power management:

3.
alma03 ~]# cat /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 62
model name      : Intel(R) Xeon(R) CPU E5-2603 v2 @ 1.80GHz
stepping        : 4
microcode       : 0x42e
cpu MHz         : 1799.983
cache size      : 10240 KB
physical id     : 0
siblings        : 4
core id         : 0
cpu cores       : 4
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm cpuid_fault epb pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm arat pln pts md_clear flush_l1d
bugs            : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit
bogomips        : 3599.64
clflush size    : 64
cache_alignment : 64
address sizes   : 46 bits physical, 48 bits virtual
power management:

None IBRS host was:
ocelot01 ~]#  virsh -r capabilities | head
<capabilities>

  <host>
    <uuid>e602cd31-f7b3-4843-b13e-c5553cce84b2</uuid>
    <cpu>
      <arch>x86_64</arch>
      <model>Skylake-Server-IBRS</model>
      <vendor>Intel</vendor>
      <microcode version='33581318'/>
      <counter name='tsc' frequency='2099999000' scaling='yes'/>
ocelot01 ~]# cat /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 85
model name      : Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz
stepping        : 4
microcode       : 0x2006906
cpu MHz         : 1394.002
cache size      : 22528 KB
physical id     : 0
siblings        : 32
core id         : 0
cpu cores       : 16
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 22
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single pti intel_ppin ssbd mba ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts pku ospke md_clear flush_l1d
bugs            : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs taa itlb_multihit
bogomips        : 4200.00
clflush size    : 64
cache_alignment : 64
address sizes   : 46 bits physical, 48 bits virtual
power management:

Neither engine or ocelot ha-host had IBRS flags on their CPUs.


From ha-broker on ocelot I saw:
MainThread::ERROR::2020-06-23 13:00:21,457::hosted_engine::564::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngin
e::(_initialize_broker) Failed to start necessary monitors
MainThread::ERROR::2020-06-23 13:00:21,460::agent::143::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Traceba
ck (most recent call last):
  File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 85, in start_monitor
    response = self._proxy.start_monitor(type, options)
  File "/usr/lib64/python3.6/xmlrpc/client.py", line 1112, in __call__
    return self.__send(self.__name, args)
  File "/usr/lib64/python3.6/xmlrpc/client.py", line 1452, in __request
    verbose=self.__verbose
  File "/usr/lib64/python3.6/xmlrpc/client.py", line 1154, in request
    return self.single_request(host, handler, request_body, verbose)
  File "/usr/lib64/python3.6/xmlrpc/client.py", line 1166, in single_request
    http_conn = self.send_request(host, handler, request_body, verbose)
  File "/usr/lib64/python3.6/xmlrpc/client.py", line 1279, in send_request
    self.send_content(connection, request_body)
  File "/usr/lib64/python3.6/xmlrpc/client.py", line 1309, in send_content
    connection.endheaders(request_body)
  File "/usr/lib64/python3.6/http/client.py", line 1249, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/usr/lib64/python3.6/http/client.py", line 1036, in _send_output
    self.send(msg)
  File "/usr/lib64/python3.6/http/client.py", line 974, in send
    self.connect()
  File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/lib/unixrpc.py", line 74, in connect
    self.sock.connect(base64.b16decode(self.host))
FileNotFoundError: [Errno 2] No such file or directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 131, in _run_agent
    return action(he)
  File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 55, in action_proper
    return he.start_monitoring()
  File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 437, in start_monitoring
    self._initialize_broker()
  File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 561, in _initialize_broker
    m.get('options', {}))
  File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 91, in start_monitor
    ).format(t=type, o=options, e=e)
ovirt_hosted_engine_ha.lib.exceptions.RequestError: brokerlink - failed to start monitor via ovirt-ha-broker: [Errno 2] No such file or directory, [monitor: 'network', options: {'addr': '10.35.95.254', 'network_test': 'dns', 'tcp_t_address': '', 'tcp_t_port': ''}]
MainThread::ERROR::2020-06-23 13:00:21,460::agent::144::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Trying to restart agent
MainThread::INFO::2020-06-23 13:00:21,460::agent::89::ovirt_hosted_engine_ha.agent.agent.Agent::(run) Agent shutting down


Services were up and running:
ocelot01 ~]# systemctl status ovirt-ha-agent -l
● ovirt-ha-agent.service - oVirt Hosted Engine High Availability Monitoring Agent
   Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-agent.service; enabled; vendor preset: disabled)
   Active: active (running) since Tue 2020-06-23 13:00:43 IDT; 55min ago
 Main PID: 12886 (ovirt-ha-agent)
    Tasks: 2 (limit: 788464)
   Memory: 50.3M
   CGroup: /system.slice/ovirt-ha-agent.service
           └─12886 /usr/libexec/platform-python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent

Jun 23 13:00:43 ocelot01.qa.lab.tlv.redhat.com systemd[1]: Started oVirt Hosted Engine High Availability Monitoring A>

ocelot01 ~]# systemctl status ovirt-ha-broker 
● ovirt-ha-broker.service - oVirt Hosted Engine High Availability Communications Broker
   Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-broker.service; enabled; vendor preset: disabled)
   Active: active (running) since Tue 2020-06-23 13:00:21 IDT; 56min ago
 Main PID: 12597 (ovirt-ha-broker)
    Tasks: 14 (limit: 788464)
   Memory: 66.7M
   CGroup: /system.slice/ovirt-ha-broker.service
           ├─12597 /usr/libexec/platform-python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker
           ├─29511 /bin/sh /usr/sbin/hosted-engine --check-liveliness
           └─29512 /usr/bin/python3 -m ovirt_hosted_engine_setup.check_liveliness

Jun 23 13:00:21 ocelot01.qa.lab.tlv.redhat.com systemd[1]: Started oVirt Hosted Engine High Availability Communicatio>
Jun 23 13:00:54 ocelot01.qa.lab.tlv.redhat.com ovirt-ha-broker[12597]: ovirt-ha-broker mgmt_bridge.MgmtBridge ERROR F>
Jun 23 13:00:59 ocelot01.qa.lab.tlv.redhat.com ovirt-ha-broker[12597]: ovirt-ha-broker mgmt_bridge.MgmtBridge ERROR F>
Jun 23 13:01:46 ocelot01.qa.lab.tlv.redhat.com ovirt-ha-broker[12597]: ovirt-ha-broker ovirt_hosted_engine_ha.broker.>
                                                                       Traceback (most recent call last):
                                                                         File "/usr/lib/python3.6/site-packages/ovirt>
                                                                           timeout=float(cfg["smtp-timeout"]))
                                                                         File "/usr/lib64/python3.6/smtplib.py", line>
                                                                           (code, msg) = self.connect(host, port)
                                                                         File "/usr/lib64/python3.6/smtplib.py", line>
                                                                           self.sock = self._get_socket(host, port, s>
                                                                         File "/usr/lib64/python3.6/smtplib.py", line>
                                                                           self.source_address)
                                                                         File "/usr/lib64/python3.6/socket.py", line >
                                                                           raise err
                                                                         File "/usr/lib64/python3.6/socket.py", line >
                                                                           sock.connect(sa)
                                                                       ConnectionRefusedError: [Errno 111] Connection>
Jun 23 13:01:56 ocelot01.qa.lab.tlv.redhat.com ovirt-ha-broker[12597]: ovirt-ha-broker ovirt_hosted_engine_ha.broker.>
                                                                       Traceback (most recent call last):
                                                                         File "/usr/lib/python3.6/site-packages/ovirt>


Network configurations were as follows on source and destination hosts:
Source alma07.qa.lab.tlv.redhat.com 
ovirtmgmt: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.35.92.7  netmask 255.255.252.0  broadcast 10.35.95.255
        inet6 2620:52:0:235c:92e2:baff:fe7d:3638  prefixlen 64  scopeid 0x0<global>
        inet6 fe80::92e2:baff:fe7d:3638  prefixlen 64  scopeid 0x20<link>
        ether 90:e2:ba:7d:36:38  txqueuelen 1000  (Ethernet)
        RX packets 6449423  bytes 51361455338 (47.8 GiB)
        RX errors 0  dropped 79  overruns 0  frame 0
        TX packets 4972215  bytes 70838937200 (65.9 GiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

Destination ocelot01.qa.lab.tlv.redhat.com
ovirtmgmt: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.35.30.1  netmask 255.255.255.0  broadcast 10.35.30.255
        inet6 2620:52:0:231e:ae1f:6bff:fe57:ae82  prefixlen 64  scopeid 0x0<global>
        inet6 fe80::ae1f:6bff:fe57:ae82  prefixlen 64  scopeid 0x20<link>
        ether ac:1f:6b:57:ae:82  txqueuelen 1000  (Ethernet)
        RX packets 968471  bytes 10410120624 (9.6 GiB)
        RX errors 0  dropped 5  overruns 0  frame 0
        TX packets 841870  bytes 91202373 (86.9 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

Migration failed not because of the IBRS flag, but because of different network subnets between source and destination ha-hosts (they are in different subnets, inet 10.35.92.7 netmask 255.255.252.0 broadcast 10.35.95.255 on source and inet 10.35.30.1 netmask 255.255.255.0 broadcast 10.35.30.255 on destination).

After unsuccessful migration attempt, engine got automatically started on alma04 as the ha-host with the best score available.

Moving this bug to verified forth to the fact that engine have no IBRS flag on it's CPU now and that it doesn't inherit IBRS flag from IBRS capable ha-host on which it being deployed.

Please feel free to reopen if it still doesn't work for you.

Tested on:
ovirt-hosted-engine-setup-2.4.4-1.el8ev.noarch
ovirt-hosted-engine-ha-2.4.3-1.el8ev.noarch
Red Hat Enterprise Linux release 8.2 (Ootpa)
Linux 4.18.0-193.10.1.el8_2.x86_64 #1 SMP Fri Jun 19 15:31:45 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
rhvm-appliance.x86_64 2:4.4-20200604.0.el8ev @rhv-4.4.1

Comment 84 errata-xmlrpc 2020-08-04 13:16:11 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: RHV Manager (ovirt-engine) 4.4 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:3247


Note You need to log in before you can comment on or make changes to this bug.