Bug 1641702
| Summary: | check tsc scaling fea-ture of destination host on migration | |||
|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Marcelo Tosatti <mtosatti> | |
| Component: | libvirt | Assignee: | Jiri Denemark <jdenemar> | |
| Status: | CLOSED ERRATA | QA Contact: | jiyan <jiyan> | |
| Severity: | medium | Docs Contact: | ||
| Priority: | high | |||
| Version: | 7.6 | CC: | ehabkost, jdenemar, jsuchane, kchamart, mtessun, sfroemer, xuzhang, yalzhang, yama | |
| Target Milestone: | rc | Keywords: | Upstream | |
| Target Release: | --- | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | libvirt-4.5.0-21.el7 | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1648273 (view as bug list) | Environment: | ||
| Last Closed: | 2019-08-06 13:14:02 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1600168, 1648273 | |||
|
Description
Marcelo Tosatti
2018-10-22 14:40:09 UTC
Patches sent upstream for review: https://www.redhat.com/archives/libvir-list/2019-May/msg00912.html The patches are pushed upstream now:
commit dd3fc650de8ef8b05b491c9f362b660e07a857fd
Refs: v5.4.0-33-gdd3fc650de
Author: Jiri Denemark <jdenemar>
AuthorDate: Mon Jun 3 13:13:38 2019 +0200
Commit: Jiri Denemark <jdenemar>
CommitDate: Mon Jun 3 18:07:16 2019 +0200
qemu: Make virQEMUCapsProbeHostCPUForEmulator more generic
The function is renamed as virQEMUCapsProbeHostCPU and it does not get
the list of allowed CPU models from qemuCaps anymore. This is
responsibility is moved to the caller. The result is just a very thin
wrapper around virCPUGetHost mostly required mocking in tests.
The generic function is used in place of a direct call to virCPUGetHost
in virQEMUCapsInitHostCPUModel to make sure tests don't accidentally
probe host CPU.
Signed-off-by: Jiri Denemark <jdenemar>
commit 02c1d3a6e1d24a777254f4dceeaf54942db7f871
Refs: v5.4.0-34-g02c1d3a6e1
Author: Jiri Denemark <jdenemar>
AuthorDate: Mon Jun 3 13:15:19 2019 +0200
Commit: Jiri Denemark <jdenemar>
CommitDate: Mon Jun 3 18:07:16 2019 +0200
qemuargv2xmltest: Use mocked virQEMUCapsProbeHostCPU
The qemuTestParseCapabilitiesArch call would eventually lead to the host
CPU being probed via virCPUGetHost. Let's divert this to a mocked
version already used by the qemuxml2argvtest.
Signed-off-by: Jiri Denemark <jdenemar>
commit f0f6faba63becfab38c928905ac6ed79f9a318b8
Refs: v5.4.0-35-gf0f6faba63
Author: Jiri Denemark <jdenemar>
AuthorDate: Thu May 30 16:34:59 2019 +0200
Commit: Jiri Denemark <jdenemar>
CommitDate: Mon Jun 3 18:07:16 2019 +0200
util: Add virHostCPUGetTscInfo
On a KVM x86_64 host which supports invariant TSC this function can be
used to detect the TSC frequency and the availability of TSC scaling.
The magic MSR numbers required to check if VMX scaling is supported on
the host are documented in Volume 3 of the Intel® 64 and IA-32
Architectures Software Developer’s Manual.
Signed-off-by: Jiri Denemark <jdenemar>
commit c277b9ad5c740bb4c4b915754ae74621f93f9d37
Refs: v5.4.0-36-gc277b9ad5c
Author: Jiri Denemark <jdenemar>
AuthorDate: Thu May 30 21:47:49 2019 +0200
Commit: Jiri Denemark <jdenemar>
CommitDate: Mon Jun 3 18:07:16 2019 +0200
conf: Report TSC frequency in host CPU capabilities
This patch adds a new
<counter name='tsc' frequency='N' scaling='on|off'/>
element into the host CPU capabilities XML.
Signed-off-by: Jiri Denemark <jdenemar>
commit 32f577ab10aefda6c4666abd07814c5c39f57788
Refs: v5.4.0-37-g32f577ab10
Author: Jiri Denemark <jdenemar>
AuthorDate: Tue Apr 16 13:24:45 2019 +0200
Commit: Jiri Denemark <jdenemar>
CommitDate: Mon Jun 3 18:07:16 2019 +0200
cpu_x86: Fix placement of *CheckFeature functions
Commit 0a97486e09 moved them outside #ifdef, but after virCPUx86GetHost,
which will start calling them in the following patch.
Signed-off-by: Jiri Denemark <jdenemar>
commit ceb04d15e671b4fea1d674ee43c91410da9fe57d
Refs: v5.4.0-38-gceb04d15e6
Author: Jiri Denemark <jdenemar>
AuthorDate: Thu May 30 21:47:38 2019 +0200
Commit: Jiri Denemark <jdenemar>
CommitDate: Mon Jun 3 18:07:16 2019 +0200
cpu_x86: Probe TSC frequency and scaling support
When the host CPU supports invariant TSC the host CPU definition created
by virCPUx86GetHost will contain (unless probing fails for some reason)
addition TSC related data.
Signed-off-by: Jiri Denemark <jdenemar>
commit 7da62c91f043209e3d40c2dc7655c5e35a4309bf
Refs: v5.4.0-39-g7da62c91f0
Author: Jiri Denemark <jdenemar>
AuthorDate: Fri May 31 00:03:59 2019 +0200
Commit: Jiri Denemark <jdenemar>
CommitDate: Mon Jun 3 18:07:16 2019 +0200
qemu: Check TSC frequency before starting QEMU
When migrating a domain with invtsc CPU feature enabled, the TSC
frequency of the destination host must match the frequency used when the
domain was started on the source host or the destination host has to
support TSC scaling.
If the frequencies do not match and the destination host does not
support TSC scaling, QEMU will fail to set the right TSC frequency when
starting vCPUs on the destination and thus migration will fail. However,
this is quite late since both host might have spent significant time
transferring memory and perhaps even storage data.
By adding the check to libvirt we can let migration fail before any data
starts to be sent over. If for some reason libvirt is unable to detect
the host's TSC frequency or scaling support, we'll just let QEMU try and
the migration will either succeed or fail later.
Luckily, we mandate TSC frequency to be explicitly set in the domain XML
to even allow migration of domains with invtsc. We can just check
whether the requested frequency is compatible with the current host
before starting QEMU.
https://bugzilla.redhat.com/show_bug.cgi?id=1641702
Signed-off-by: Jiri Denemark <jdenemar>
Hi jiri it seems that virsh hypervisor-cpu-baseline/ virsh cpu-baseline can not get the right cpu baseline through 'virsh capabilities' because of "TSC frequency".
Version:
libvirt-4.5.0-20.el7.x86_64
kernel-3.10.0-1053.el7.x86_64
qemu-kvm-rhev-2.12.0-31.el7.x86_64
Steps:
1. Obtain the output of 'virsh capabilities'
# virsh capabilities >> new
# virsh capabilities
<capabilities>
<host>
<uuid>30333735-3938-4e43-4732-323053424b53</uuid>
<cpu>
<arch>x86_64</arch>
<model>Opteron_G3</model>
<vendor>AMD</vendor>
<microcode version='16777433'/>
<counter name='tsc' frequency='2000038000'/>
2. Using 'virsh hypervisor-cpu-baseline'/'virsh cpubaseline' to get the Cpu baseline based on new file
# virsh hypervisor-cpu-baseline 66
error: unsupported configuration: Invalid TSC frequency
# virsh cpu-baseline 66
error: unsupported configuration: Invalid TSC frequency
Actual result:
"virsh hypervisor-cpu-baseline"/"virsh cpu-baseline" failed because of TSC frequency.
Additional info:
These two cmds both can accept the output of "virsh capabilities" as input file to get cpu baseline.
hypervisor-cpu-baseline FILE [virttype] [emulator] [arch] [machine] [--features] [--migratable]
Compute a baseline CPU which will be compatible with all CPUs defined in an XML file and with the CPU the hypervisor is able to provide on the host. (This is different from cpu-
baseline which does not consider any hypervisor abilities when computing the baseline CPU.)
The XML FILE may contain either host or guest CPU definitions describing the host CPU model. The host CPU definition is the <cpu> element and its contents as printed by
capabilities command. The guest CPU definition may be created from the host CPU model found in domain capabilities XML (printed by domcapabilities command). In addition to the
<cpu> elements, this command accepts full capabilities XMLs, or domain capabilities XMLs containing the CPU definitions. For best results, use only the CPU definitions from domain
capabilities.
cpu-baseline FILE [--features] [--migratable]
Compute baseline CPU which will be supported by all host CPUs given in <file>. (See hypervisor-cpu-baseline command to get a CPU which can be provided by a specific hypervisor.)
The list of host CPUs is built by extracting all <cpu> elements from the <file>. Thus, the <file> can contain either a set of <cpu> elements separated by new lines or even a set of
complete <capabilities> elements printed by capabilities command. If --features is specified, then the resulting XML description will explicitly include all features that make up
the CPU, without this option features that are part of the CPU model will not be listed in the XML description. If --migratable is specified, features that block migration will
not be included in the resulting CPU.
Oops, this is a bug in the code which parses CPU definition from capabilities XML. I just sent the fix upstream for review: https://www.redhat.com/archives/libvir-list/2019-June/msg00152.html Fixed upstream by
commit 4d21d4acf2eac961b8c25f1ec49a9c25f3951fdb
Refs: v5.4.0-51-g4d21d4acf2
Author: Jiri Denemark <jdenemar>
AuthorDate: Thu Jun 6 09:29:38 2019 +0200
Commit: Jiri Denemark <jdenemar>
CommitDate: Thu Jun 6 09:40:40 2019 +0200
cpu_conf: Fix XPath for parsing TSC frequency
Due to this bug the following command would fail on any host where TSC
frequency can be probed:
$ virsh capabilities | virsh cpu-baseline /dev/stdin
error: unsupported configuration: Invalid TSC frequency
https://bugzilla.redhat.com/show_bug.cgi?id=1641702
Signed-off-by: Jiri Denemark <jdenemar>
Reviewed-by: Ján Tomko <jtomko>
Verified this bug in libvirt-4.5.0-22.el7.x86_64
Version:
libvirt-4.5.0-22.el7.x86_64
kernel-3.10.0-1053.el7.x86_64
qemu-kvm-rhev-2.12.0-32.el7.x86_64
Steps:
Scenario-1: Check the output of "virsh capabilities" and compare/baseline cpu
S1. Check the output of "virsh capabilities" when scaling=yes and compare/baseline cpu
# virsh capabilities
<counter name='tsc' frequency='2095078000' scaling='yes'/>
# virsh capabilities > cap.xml
# virsh hypervisor-cpu-baseline cap.xml
# virsh hypervisor-cpu-compare cap.xml
# virsh cpu-compare cap.xml
# virsh cpu-baseline cap.xml
==> No innormal err for the cmds above
S2. Check the output of "virsh capabilities" when scaling=no and compare/baseline cpu
# virsh capabilities
<counter name='tsc' frequency='2397223000' scaling='no'/>
# virsh capabilities >> cap.xml
# virsh hypervisor-cpu-baseline cap.xml
# virsh hypervisor-cpu-compare cap.xml
# virsh cpu-compare cap.xml
# virsh cpu-baseline cap.xml
==> No innormal err for the cmds above
Scenario-2: Configure tsc related XML for VM with different value of frequency
S1. Configure the value less than the frequency in the output of "virsh capabilities" when scaling=yes
# virsh capabilities |more
<counter name='tsc' frequency='2095078000' scaling='yes'/>
# virsh domstate vm
shut off
# virsh dumpxml vm --inactive |grep "<clock" -A10
<clock offset='utc'>
...
<timer name='tsc' frequency='1000000000'/>
</clock>
# virsh start vm
Domain vm started
# ps -ef |grep vm
qemu 70339 1 99 22:32 ? 00:02:15 /usr/libexec/qemu-kvm -name guest=vm
...
-cpu Skylake-Server-IBRS,ds=off,acpi=off,ss=on,ht=off,tm=off,pbe=off,dtes64=off,ds_cpl=off,vmx=off,smx=off,est=off,tm2=off,xtpr=off,pdcm=off,dca=off,osxsave=off,tsc_adjust=on,clflushopt=on,intel-pt=off,pku=on,ospke=off,md-clear=on,stibp=on,ssbd=on,xsaves=off,invtsc=on,hypervisor=on,tsc-frequency=1000000000
S2. Configure the value more than the frequency in the output of "virsh capabilities" when scaling=yes
# virsh capabilities |more
<counter name='tsc' frequency='2095078000' scaling='yes'/>
# virsh domstate vm
shut off
# virsh dumpxml vm --inactive |grep "<clock" -A10
<clock offset='utc'>
...
<timer name='tsc' frequency='3000000000'/>
</clock>
# virsh start vm
Domain vm started
# ps -ef |grep vm
qemu 70339 1 99 22:32 ? 00:02:15 /usr/libexec/qemu-kvm -name guest=vm
...
-cpu Skylake-Server-IBRS,ds=off,acpi=off,ss=on,ht=off,tm=off,pbe=off,dtes64=off,ds_cpl=off,vmx=off,smx=off,est=off,tm2=off,xtpr=off,pdcm=off,dca=off,osxsave=off,tsc_adjust=on,clflushopt=on,intel-pt=off,pku=on,ospke=off,md-clear=on,stibp=on,ssbd=on,xsaves=off,invtsc=on,hypervisor=on,tsc-frequency=3000000000
S3. Configure the value equals the frequency in the output of "virsh capabilities" when scaling=yes
# virsh capabilities |more
<counter name='tsc' frequency='2095078000' scaling='yes'/>
# virsh domstate q35771
shut off
# virsh dumpxml q35771 --inactive
<clock offset='utc'>
...
<timer name='tsc' frequency='2095078000'/>
</clock>
# virsh start q35771
Domain q35771 started
# ps -ef |grep q35771
-cpu Skylake-Server-IBRS,ds=on,acpi=on,ss=on,ht=on,tm=on,pbe=on,dtes64=on,monitor=on,ds_cpl=on,vmx=on,smx=on,est=on,tm2=on,xtpr=on,pdcm=on,dca=on,osxsave=on,tsc_adjust=on,clflushopt=on,intel-pt=on,pku=on,ospke=on,md-clear=on,stibp=on,ssbd=on,xsaves=on, ** invtsc=on,tsc-frequency=2095078000 **
S4. Configure the value less or more than the frequency in the output of "virsh capabilities" when scaling=no
# virsh capabilities
<counter name='tsc' frequency='2397223000' scaling='no'/>
# virsh domstate vmq35_771
shut off
# virsh dumpxml vmq35_771 --inactive |grep "<clock" -A10
<clock offset='utc'>
...
<timer name='tsc' frequency='1000000000'/>
</clock>
# virsh start vmq35_771
error: Failed to start domain vmq35_771
error: unsupported configuration: Requested TSC frequency 1000000000 Hz does not match host (2397223000 Hz) and TSC scaling is not supported by the host CPU
# virsh dumpxml vmq35_771 --inactive |grep "<clock" -A10
<clock offset='utc'>
...
<timer name='tsc' frequency='2397223111'/>
</clock>
# virsh start vmq35_771
error: Failed to start domain vmq35_771
error: unsupported configuration: Requested TSC frequency 2397223111 Hz does not match host (2397223000 Hz) and TSC scaling is not supported by the host CPU
S5. Configure the value equals than the frequency in the output of "virsh capabilities" when scaling=no
# virsh capabilities
<counter name='tsc' frequency='2397223000' scaling='no'/>
# virsh domstate vmq35_771
shut off
# virsh dumpxml vmq35_771 --inactive |grep "<clock" -A5
<clock offset='utc'>
...
<timer name='tsc' frequency='2397223000'/>
</clock>
# virsh start vmq35_771
Domain vmq35_771 started
# ps -ef |grep vmq35_771
qemu 8781 1 99 22:51 ? 00:00:10 /usr/libexec/qemu-kvm -name guest=vmq35_771
...
-cpu Penryn,vme=on,ss=on,x2apic=on,tsc-deadline=on,xsave=on,hypervisor=on,arat=on,tsc_adjust=on,tsc-frequency=2397223000
Scenario-3: Migrate VM in RHEL-7.7 host to RHEL-7.7 host with scaling=yes/no (the frequency in src and dst host is different.)
S1: Migrate VM in RHEL-7.7 host to RHEL-7.7 host with scaling=yes (the frequency in src and dst host is different.)
1. Start the VM in src host and migrate the vm to dst host
# virsh capabilities |grep counter
<counter name='tsc' frequency='2095078000' scaling='yes'/>
# virsh domstate vm
shut off
# virsh dumpxml vm --inactive |grep "<clock" -A5
<clock offset='utc'>
...
<timer name='tsc' frequency='2095078000'/>
</clock>
# virsh start vm
Domain vm started
# virsh migrate vm qemu+ssh://dsthost/system --live --postcopy --postcopy-after-precopy --p2p --verbose --copy-storage-all
Migration: [100 %]
2. Check the vm status in dst host
# virsh capabilities |grep counter
<counter name='tsc' frequency='1696014000' scaling='yes'/>
# virsh domstate vm
running
# virsh dumpxml vm |grep "<clock" -A5
<clock offset='utc'>
...
<timer name='tsc' frequency='2095078000'/>
</clock>
# ps -ef |grep vm
qemu 3770 1 99 23:03 ? 00:25:21 /usr/libexec/qemu-kvm -name guest=vm ... -cpu Skylake-Server-IBRS,ds=off,acpi=off,ss=on,ht=off,tm=off,pbe=off,dtes64=off,ds_cpl=off,vmx=off,smx=off,est=off,tm2=off,xtpr=off,pdcm=off,dca=off,osxsave=off,tsc_adjust=on,clflushopt=on,intel-pt=off,pku=on,ospke=off,md-clear=on,stibp=on,ssbd=on,xsaves=off,invtsc=on,hypervisor=on,tsc-frequency=2095078000
S2: Migrate VM in RHEL-7.7 host to RHEL-7.7 host with scaling=no (the frequency in src and dst host is different.)
1. Start the VM in src host and migrate the vm to dst host
(src host info) # virsh capabilities |grep counter
<counter name='tsc' frequency='2095078000' scaling='yes'/>
(dst host info) # virsh capabilities |grep "<counter"
<counter name='tsc' frequency='2397223000' scaling='no'/>
# virsh domstate vm
shut off
# virsh dumpxml vm --inactive |grep "<clock" -A5
<clock offset='utc'>
...
<timer name='tsc' frequency='2095078000'/>
</clock>
# virsh start vm
Domain vm started
# virsh migrate vm qemu+ssh://hp-dl380g9-02.lab.eng.pek2.redhat.com/system --live --postcopy --postcopy-after-precopy --p2p --verbose --copy-storage-all
error: unsupported configuration: Requested TSC frequency 2095078000 Hz does not match host (2397223000 Hz) and TSC scaling is not supported by the host CPU
In Scenario-3, the vm is configured with cpu feature "invtsc". Add another scenario: Migrate VM with cpu feature "invtsc" (which is supported in RHEL-7.4) and tsc timer in RHEL-7.6.z to RHEL7.7 (with scaling=yes and scaling=no). The result is same with https://bugzilla.redhat.com/show_bug.cgi?id=1641702#c14. So all the results are as expected, move this bug to be verfified. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2019:2294 |