Bug 1779078

Summary: RHVH 4.4: Failed to run VM on 4.3/4.4 engine (Exit message: the CPU is incompatible with host CPU: Host CPU does not provide required features: hle, rtm)
Product: Red Hat Enterprise Linux Advanced Virtualization Reporter: cshao <cshao>
Component: qemu-kvmAssignee: Eduardo Habkost <ehabkost>
qemu-kvm sub component: General QA Contact: Yumei Huang <yuhuang>
Status: CLOSED ERRATA Docs Contact:
Severity: urgent    
Priority: urgent CC: areis, chayang, cshao, ddepaula, ehabkost, jdenemar, jinzhao, jsuchane, juzhang, knoel, lsvaty, mavital, michal.skrivanek, mrezanin, mtessun, nanliu, ncredi, nlevy, peyu, qiyuan, sbonazzo, sgott, shlei, toneata, virt-maint, weiwang, yaniwang, ycui, yturgema, yuhuang
Version: 8.1Keywords: Regression, TestBlocker, ZStream
Target Milestone: rc   
Target Release: 8.2   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: qemu-kvm-4.2.0-10.module+el8.2.0+5740+c3dff59e Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1787291 1788122 (view as bug list) Environment:
Last Closed: 2020-05-05 09:52:05 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1787291, 1788122    
Attachments:
Description Flags
all log info
none
cpuinfo none

Description cshao 2019-12-03 09:02:40 UTC
Created attachment 1641619 [details]
all log info

Description of problem:
RHVH 4.4: Failed to run VM on 4.3 engine (Exit message: the CPU is incompatible with host CPU: Host CPU does not provide required features: hle, rtm)

Version-Release number of selected component (if applicable):
redhat-virtualization-host-4.4.0-20191201.0.el8_1
imgbased-1.2.6-0.1.el8ev.noarch
vdsm-4.40.0-154.git4e13ea9.el8ev.x86_64
ovirt-engine-4.3.7.2-0.1.el7.noarch


How reproducible:
100%

Steps to Reproduce:
1. Install RHVH-4.4-20191201.7-RHVH-x86_64-dvd1.iso via Anaconda GUI.
2. Register RHVH to 4.3 engine
3. Create VM.

Actual results:
RHVH 4.4: Failed to run VM on 4.3 engine:

VM is down with error. Exit message: the CPU is incompatible with host CPU: Host CPU does not provide required features: hle, rtm.

Expected results:
Run VM on 4.3 engine can successful.

Additional info:

Comment 1 cshao 2019-12-04 07:05:25 UTC
Also can reproduce on 4.4 engine.
ovirt-engine-4.4.0-0.6.master.el7.noarch

Comment 12 cshao 2019-12-05 01:30:39 UTC
Created attachment 1642259 [details]
cpuinfo

Comment 22 Michal Skrivanek 2019-12-05 09:25:18 UTC
Thank you. It's clearer now.
Apparently domcaps is saying that Haswell(the one with TSX) is supported guest CPU. Jiri, any idea why? cpas is saying it's noTSX, cpu flags do not have hle,rtm, so why is libvirt saying "Haswell" is a valid type?

Comment 25 Michal Skrivanek 2019-12-05 09:26:58 UTC
@cshao, as a workaround you can easily select Haswell-noTSX as a Cluster CPU type and VMs should work just fine. It's just the autodetection that selects a non-working model by default.

Comment 28 cshao 2019-12-05 10:34:28 UTC
(In reply to Michal Skrivanek from comment #25)
> @cshao, as a workaround you can easily select Haswell-noTSX as a Cluster CPU
> type and VMs should work just fine. It's just the autodetection that selects
> a non-working model by default.

You are right.
VM can run successful after selected Haswell-noTSX as a Cluster CPU type.

Thanks.

Comment 31 Jiri Denemark 2019-12-05 10:39:30 UTC
Sigh, this mess is a result of versioned CPU models introduced in QEMU 4.1.0
without proper support for introspection (see bug 1697663, mainly in comments
4, 5, and 6) and thus no support for this in libvirt. Unfortunately, it seems
the needed machine type parameter for query-cpu-definitions QMP command is not
present even in QEMU v4.2.0-rc4.

Anyway, libvirt probes for supported CPU models and their usability on the
current host by probing QEMU with machine type "none". In this case, QEMU
reports "Haswell" CPU model is in fact an alias of "Haswell-v4"
("Haswell-noTSX-IBRS" using the original naming) and thus it is marked as
runnable as it does not require the TSX features. See below for the QMP log.

If I try to probe QEMU with machine type pc-i440fx-rhel7.6.0, which is the
default machine used when starting a domain, I get a completely different
results (see the second QMP log). The "Haswell" CPU model is not runnable
anymore because of missing "hle" and "rtm" features. Also I don't see any
"alias-of" fields there (currently unused by libvirt, though).

I don't know what the right fix should be here, but until bug 1697663 is fully
implemented and libvirt starts using it, the CPU model introspection is
completely broken.


# /usr/libexec/qemu-kvm -machine none -nodefaults -nographic -qmp unix:/tmp/ble,server &
# socat STDIO UNIX-CONNECT:/tmp/ble
{"QMP": {"version": {"qemu": {"micro": 0, "minor": 1, "major": 4}, "package": "qemu-kvm-4.1.0-14.module+el8.1.0+4548+ed1300f4"}, "capabilities": ["oob"]}}
{"execute": "qmp_capabilities"}
{"return": {}}
{"execute": "query-cpu-definitions"}
{
    "return": [
        ...
        {
            "name": "Haswell-v4",
            "typename": "Haswell-v4-x86_64-cpu",
            "unavailable-features": [
            ],
            "static": false,
            "migration-safe": true
        },
        {
            "name": "Haswell-v3",
            "typename": "Haswell-v3-x86_64-cpu",
            "unavailable-features": [
                "hle",
                "rtm"
            ],
            "static": false,
            "migration-safe": true
        },
        {
            "name": "Haswell-v2",
            "typename": "Haswell-v2-x86_64-cpu",
            "unavailable-features": [
            ],
            "static": false,
            "migration-safe": true
        },
        {
            "name": "Haswell-v1",
            "typename": "Haswell-v1-x86_64-cpu",
            "unavailable-features": [
                "hle",
                "rtm"
            ],
            "static": false,
            "migration-safe": true
        },
        {
            "name": "Haswell-noTSX-IBRS",
            "typename": "Haswell-noTSX-IBRS-x86_64-cpu",
            "unavailable-features": [
            ],
            "alias-of": "Haswell-v4",
            "static": false,
            "migration-safe": true
        },
        {
            "name": "Haswell-noTSX",
            "typename": "Haswell-noTSX-x86_64-cpu",
            "unavailable-features": [
            ],
            "alias-of": "Haswell-v2",
            "static": false,
            "migration-safe": true
        },
        {
            "name": "Haswell-IBRS",
            "typename": "Haswell-IBRS-x86_64-cpu",
            "unavailable-features": [
                "hle",
                "rtm"
            ],
            "alias-of": "Haswell-v3",
            "static": false,
            "migration-safe": true
        },
        {
            "name": "Haswell",
            "typename": "Haswell-x86_64-cpu",
            "unavailable-features": [
            ],
            "alias-of": "Haswell-v4",
            "static": false,
            "migration-safe": true
        },
        ...
    ]
}


# /usr/libexec/qemu-kvm -machine pc-i440fx-rhel7.6.0 -nodefaults -nographic -qmp unix:/tmp/ble,server &
# socat STDIO UNIX-CONNECT:/tmp/ble
{"QMP": {"version": {"qemu": {"micro": 0, "minor": 1, "major": 4}, "package": "qemu-kvm-4.1.0-14.module+el8.1.0+4548+ed1300f4"}, "capabilities": ["oob"]}}
{"execute": "qmp_capabilities"}
{"return": {}}
{"execute": "query-cpu-definitions"}
{
    "return": [
        ...
        {
            "name": "Haswell-v4",
            "typename": "Haswell-v4-x86_64-cpu",
            "unavailable-features": [
            ],
            "static": false,
            "migration-safe": true
        },
        {
            "name": "Haswell-v3",
            "typename": "Haswell-v3-x86_64-cpu",
            "unavailable-features": [
                "hle",
                "rtm"
            ],
            "static": false,
            "migration-safe": true
        },
        {
            "name": "Haswell-v2",
            "typename": "Haswell-v2-x86_64-cpu",
            "unavailable-features": [
            ],
            "static": false,
            "migration-safe": true
        },
        {
            "name": "Haswell-v1",
            "typename": "Haswell-v1-x86_64-cpu",
            "unavailable-features": [
                "hle",
                "rtm"
            ],
            "static": false,
            "migration-safe": true
        },
        {
            "name": "Haswell-noTSX-IBRS",
            "typename": "Haswell-noTSX-IBRS-x86_64-cpu",
            "unavailable-features": [
            ],
            "static": false,
            "migration-safe": true
        },
        {
            "name": "Haswell-noTSX",
            "typename": "Haswell-noTSX-x86_64-cpu",
            "unavailable-features": [
            ],
            "static": false,
            "migration-safe": true
        },
        {
            "name": "Haswell-IBRS",
            "typename": "Haswell-IBRS-x86_64-cpu",
            "unavailable-features": [
                "hle",
                "rtm"
            ],
            "static": false,
            "migration-safe": true
        },
        {
            "name": "Haswell",
            "typename": "Haswell-x86_64-cpu",
            "unavailable-features": [
                "hle",
                "rtm"
            ],
            "static": false,
            "migration-safe": true
        },
        ...
    ]
}

Comment 37 Eduardo Habkost 2019-12-05 22:36:20 UTC
Fix submitted upstream:
https://lore.kernel.org/qemu-devel/20191205223339.764534-1-ehabkost@redhat.com/

Comment 44 Michal Skrivanek 2019-12-06 11:15:07 UTC
      <model usable='yes'>Haswell-noTSX-IBRS</model>
      <model usable='yes'>Haswell-noTSX</model>
      <model usable='no'>Haswell-IBRS</model>
      <model usable='no'>Haswell</model>

seems to work well. Thanks Eduardo!

Comment 45 Danilo de Paula 2019-12-11 14:44:50 UTC
QA_ACK, please?

Comment 64 Ademar Reis 2020-02-05 23:09:26 UTC
QEMU has been recently split into sub-components and as a one-time operation to avoid breakage of tools, we are setting the QEMU sub-component of this BZ to "General". Please review and change the sub-component if necessary the next time you review this BZ. Thanks

Comment 69 Yumei Huang 2020-02-26 04:28:07 UTC
Reproduce:
qemu-kvm-4.2.0-9.module+el8.2.0+5699+b5331ee5
libvirt-client-6.0.0-7.module+el8.2.0+5869+c23fe68b.x86_64
kernel-4.18.0-179.el8.x86_64

host: dell-per730-28.lab.eng.pek2.redhat.com (Intel(R) Xeon(R) CPU E5-2609 v3 @ 1.90GHz)

Haswell is usable in `virsh domcapabilities` while host doesn't support hle and rtm.

# virsh domcapabilities
  <cpu>
    <mode name='host-passthrough' supported='yes'/>
    <mode name='host-model' supported='yes'>
      <model fallback='forbid'>Haswell-noTSX-IBRS</model>
      <vendor>Intel</vendor>
      ...
    </mode>
    <mode name='custom' supported='yes'>
      ...
      <model usable='yes'>Haswell-noTSX-IBRS</model>
      <model usable='yes'>Haswell-noTSX</model>
      <model usable='no'>Haswell-IBRS</model>
      <model usable='yes'>Haswell</model>             ---------> Haswell is usable
      ...
    </mode>
  </cpu>

# /usr/libexec/qemu-kvm -cpu Haswell
qemu-kvm: warning: host doesn't support requested feature: CPUID.07H:EBX.hle [bit 4]
qemu-kvm: warning: host doesn't support requested feature: CPUID.07H:EBX.rtm [bit 11]



Verify:
qemu-kvm-4.2.0-12.module+el8.2.0+5858+afd073bc
libvirt-client-6.0.0-7.module+el8.2.0+5869+c23fe68b.x86_64
kernel-4.18.0-179.el8.x86_64

On same host, Haswell is not usable anymore.

# virsh domcapabilities | grep -A80 '<cpu'
  <cpu>
    <mode name='host-passthrough' supported='yes'/>
    <mode name='host-model' supported='yes'>
      <model fallback='forbid'>Haswell-noTSX-IBRS</model>
      <vendor>Intel</vendor>
      ...
    </mode>
    <mode name='custom' supported='yes'>
      ...
      <model usable='yes'>Haswell-noTSX-IBRS</model>
      <model usable='yes'>Haswell-noTSX</model>
      <model usable='no'>Haswell-IBRS</model>
      <model usable='no'>Haswell</model>      ----------> Haswell is not usable.
      ...
    </mode>
  </cpu>

Comment 71 errata-xmlrpc 2020-05-05 09:52:05 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2017