Bug 1523414 - [POWER guests] Verify compatible CPU & hypervisor capabilities across migration
Summary: [POWER guests] Verify compatible CPU & hypervisor capabilities across migration
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: qemu-kvm-rhev
Version: 7.5
Hardware: ppc64le
OS: Linux
urgent
urgent
Target Milestone: rc
: 7.5
Assignee: David Gibson
QA Contact: xianwang
URL:
Whiteboard:
: 1526266 (view as bug list)
Depends On:
Blocks: 1399177 1476742 1527213 1396114 1517546 1525303 1525599 1526266 1532050
TreeView+ depends on / blocked
 
Reported: 2017-12-07 22:58 UTC by David Gibson
Modified: 2018-04-11 00:54 UTC (History)
17 users (show)

Fixed In Version: qemu-kvm-rhev-2.10.0-18.el7
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-04-11 00:52:14 UTC


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2018:1104 None None None 2018-04-11 00:54:28 UTC
IBM Linux Technology Center 162805 None None None 2019-03-18 05:41:18 UTC

Description David Gibson 2017-12-07 22:58:35 UTC
Description of problem:

At the moment there are some capabilities that qemu conditionally presents to the guests based on host cpu and/or hypervisor capabilities.  That's bad, but is there for historical reasons.  This means that if two hosts have e different capabilities it can break migration between them.

So far we've more or less gotten away with this because the capabilities have always been consistent for all supported hosts.  However, with POWER9 there are limitations in Transactional Memory (TM) which means it's not quite compatible with POWER8 (even in allegedly POWER8 compatible mode).  This means we effectively have a different cpu capability depending on the host, so we now need to care about verifying this.

Bug 1517546 is trying to remove the incompatibility.  But it's not clear that we'll be able to do that in time (it can't be tested until DD2.2 chips become available).  So we need the qemu side checking as a back-up plan.  It also fixes some similar potential problems.

Comment 2 David Gibson 2017-12-13 03:45:44 UTC
I have posted a first draft upstream, and I'm working on a second spin.

Comment 3 Qunfang Zhang 2017-12-21 02:00:05 UTC
Hi, David

Any suggestion on how to verify this? 

Thanks,
Qunfang

Comment 4 David Gibson 2017-12-21 02:45:52 UTC
Qunfang,

Here are some checks that should verify this.

A. It should no longer be possible to start a POWER8 compatibility mode guest on POWER9 with the latest machine type

   1. Start guest with "-M pseries-rhel7.4.0,max-cpu-compat=power8" on a POWER9 host

    before fix: guest will start, but HTM will be disabled unexpectedly
    after fix: guest will start, because POWER9 can't supply HTM

B. A migration from a POWER8 guest with pseries-rhel7.4.0 or earlier machine type to a POWER9 host should fail (leaving the source guest running)

  1. Start a guest with "-M pseries-rhel7.4.0,max-cpu-compat=power8" on a POWER8 host
  2. migrate guest POWER9 (same parameters)

    before fix: guest will appear to migrate ok, but HTM will no longer work properly which could cause problems later
    after fix: migration will fail, leaving guest intact on the source

Comment 5 Qunfang Zhang 2017-12-21 05:37:08 UTC
Thanks David.  And after some irc confirmation, we'll:

(1) Test P8<->P9 migration with latest rhel7.5.0 machine type. This should succeed since 7.5 machine type disable HTM on both P8 and P9.

(2) Test rhel7.4.0 and earlier machine type according to comment 4, which should make migration failed from P8 to P9.

Comment 8 David Gibson 2018-01-18 00:11:52 UTC
Pull request send upstream, waiting for merge.

Comment 9 David Gibson 2018-01-19 01:23:19 UTC
Draft backport made, brewing at:

https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=15049440

Comment 10 David Gibson 2018-01-19 02:28:23 UTC
Had to fix a few problems with backport, successful brew at:

https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=15049550

Comment 12 Miroslav Rezanina 2018-01-23 13:00:05 UTC
Fix included in qemu-kvm-rhev-2.10.0-18.el7

Comment 14 David Gibson 2018-01-28 22:47:55 UTC
*** Bug 1526266 has been marked as a duplicate of this bug. ***

Comment 15 xianwang 2018-01-30 07:23:58 UTC
Bug verification:
version:
4.14.0-33.el7a.ppc64le
qemu-kvm-rhev-2.10.0-18.el7.ppc64le
SLOF-20170724-2.git89f519f.el7.noarch
# ppc64_cpu --smt=off
# echo N > /sys/module/kvm_hv/parameters/indep_threads_mode

machine type:rhel7.5.0
# /usr/libexec/qemu-kvm -M pseries-rhel7.5.0,max-cpu-compat=power8
VNC server running on ::1:5900

machine type:rhel7.4.0
# /usr/libexec/qemu-kvm -M pseries-rhel7.4.0,max-cpu-compat=power8
VNC server running on ::1:5900
qemu-kvm: KVM implementation does not support Transactional Memory, try cap-htm=off

machine type:rhel7.3.0
# /usr/libexec/qemu-kvm -M pseries-rhel7.3.0,max-cpu-compat=power8
VNC server running on ::1:5900
qemu-kvm: KVM implementation does not support Transactional Memory, try cap-htm=off

machine type:rhel7.2.0
# /usr/libexec/qemu-kvm -M pseries-rhel7.2.0,max-cpu-compat=power8
VNC server running on ::1:5900
qemu-kvm: KVM implementation does not support Transactional Memory, try cap-htm=off

So, this bug is fixed.

Comment 17 errata-xmlrpc 2018-04-11 00:52:14 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:1104


Note You need to log in before you can comment on or make changes to this bug.