Bug 1641702
Summary: | check tsc scaling fea-ture of destination host on migration | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Marcelo Tosatti <mtosatti> | |
Component: | libvirt | Assignee: | Jiri Denemark <jdenemar> | |
Status: | CLOSED ERRATA | QA Contact: | jiyan <jiyan> | |
Severity: | medium | Docs Contact: | ||
Priority: | high | |||
Version: | 7.6 | CC: | ehabkost, jdenemar, jsuchane, kchamart, mtessun, sfroemer, xuzhang, yalzhang, yama | |
Target Milestone: | rc | Keywords: | Upstream | |
Target Release: | --- | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | libvirt-4.5.0-21.el7 | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1648273 (view as bug list) | Environment: | ||
Last Closed: | 2019-08-06 13:14:02 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1600168, 1648273 |
Description
Marcelo Tosatti
2018-10-22 14:40:09 UTC
Patches sent upstream for review: https://www.redhat.com/archives/libvir-list/2019-May/msg00912.html The patches are pushed upstream now: commit dd3fc650de8ef8b05b491c9f362b660e07a857fd Refs: v5.4.0-33-gdd3fc650de Author: Jiri Denemark <jdenemar> AuthorDate: Mon Jun 3 13:13:38 2019 +0200 Commit: Jiri Denemark <jdenemar> CommitDate: Mon Jun 3 18:07:16 2019 +0200 qemu: Make virQEMUCapsProbeHostCPUForEmulator more generic The function is renamed as virQEMUCapsProbeHostCPU and it does not get the list of allowed CPU models from qemuCaps anymore. This is responsibility is moved to the caller. The result is just a very thin wrapper around virCPUGetHost mostly required mocking in tests. The generic function is used in place of a direct call to virCPUGetHost in virQEMUCapsInitHostCPUModel to make sure tests don't accidentally probe host CPU. Signed-off-by: Jiri Denemark <jdenemar> commit 02c1d3a6e1d24a777254f4dceeaf54942db7f871 Refs: v5.4.0-34-g02c1d3a6e1 Author: Jiri Denemark <jdenemar> AuthorDate: Mon Jun 3 13:15:19 2019 +0200 Commit: Jiri Denemark <jdenemar> CommitDate: Mon Jun 3 18:07:16 2019 +0200 qemuargv2xmltest: Use mocked virQEMUCapsProbeHostCPU The qemuTestParseCapabilitiesArch call would eventually lead to the host CPU being probed via virCPUGetHost. Let's divert this to a mocked version already used by the qemuxml2argvtest. Signed-off-by: Jiri Denemark <jdenemar> commit f0f6faba63becfab38c928905ac6ed79f9a318b8 Refs: v5.4.0-35-gf0f6faba63 Author: Jiri Denemark <jdenemar> AuthorDate: Thu May 30 16:34:59 2019 +0200 Commit: Jiri Denemark <jdenemar> CommitDate: Mon Jun 3 18:07:16 2019 +0200 util: Add virHostCPUGetTscInfo On a KVM x86_64 host which supports invariant TSC this function can be used to detect the TSC frequency and the availability of TSC scaling. The magic MSR numbers required to check if VMX scaling is supported on the host are documented in Volume 3 of the Intel® 64 and IA-32 Architectures Software Developer’s Manual. Signed-off-by: Jiri Denemark <jdenemar> commit c277b9ad5c740bb4c4b915754ae74621f93f9d37 Refs: v5.4.0-36-gc277b9ad5c Author: Jiri Denemark <jdenemar> AuthorDate: Thu May 30 21:47:49 2019 +0200 Commit: Jiri Denemark <jdenemar> CommitDate: Mon Jun 3 18:07:16 2019 +0200 conf: Report TSC frequency in host CPU capabilities This patch adds a new <counter name='tsc' frequency='N' scaling='on|off'/> element into the host CPU capabilities XML. Signed-off-by: Jiri Denemark <jdenemar> commit 32f577ab10aefda6c4666abd07814c5c39f57788 Refs: v5.4.0-37-g32f577ab10 Author: Jiri Denemark <jdenemar> AuthorDate: Tue Apr 16 13:24:45 2019 +0200 Commit: Jiri Denemark <jdenemar> CommitDate: Mon Jun 3 18:07:16 2019 +0200 cpu_x86: Fix placement of *CheckFeature functions Commit 0a97486e09 moved them outside #ifdef, but after virCPUx86GetHost, which will start calling them in the following patch. Signed-off-by: Jiri Denemark <jdenemar> commit ceb04d15e671b4fea1d674ee43c91410da9fe57d Refs: v5.4.0-38-gceb04d15e6 Author: Jiri Denemark <jdenemar> AuthorDate: Thu May 30 21:47:38 2019 +0200 Commit: Jiri Denemark <jdenemar> CommitDate: Mon Jun 3 18:07:16 2019 +0200 cpu_x86: Probe TSC frequency and scaling support When the host CPU supports invariant TSC the host CPU definition created by virCPUx86GetHost will contain (unless probing fails for some reason) addition TSC related data. Signed-off-by: Jiri Denemark <jdenemar> commit 7da62c91f043209e3d40c2dc7655c5e35a4309bf Refs: v5.4.0-39-g7da62c91f0 Author: Jiri Denemark <jdenemar> AuthorDate: Fri May 31 00:03:59 2019 +0200 Commit: Jiri Denemark <jdenemar> CommitDate: Mon Jun 3 18:07:16 2019 +0200 qemu: Check TSC frequency before starting QEMU When migrating a domain with invtsc CPU feature enabled, the TSC frequency of the destination host must match the frequency used when the domain was started on the source host or the destination host has to support TSC scaling. If the frequencies do not match and the destination host does not support TSC scaling, QEMU will fail to set the right TSC frequency when starting vCPUs on the destination and thus migration will fail. However, this is quite late since both host might have spent significant time transferring memory and perhaps even storage data. By adding the check to libvirt we can let migration fail before any data starts to be sent over. If for some reason libvirt is unable to detect the host's TSC frequency or scaling support, we'll just let QEMU try and the migration will either succeed or fail later. Luckily, we mandate TSC frequency to be explicitly set in the domain XML to even allow migration of domains with invtsc. We can just check whether the requested frequency is compatible with the current host before starting QEMU. https://bugzilla.redhat.com/show_bug.cgi?id=1641702 Signed-off-by: Jiri Denemark <jdenemar> Hi jiri it seems that virsh hypervisor-cpu-baseline/ virsh cpu-baseline can not get the right cpu baseline through 'virsh capabilities' because of "TSC frequency". Version: libvirt-4.5.0-20.el7.x86_64 kernel-3.10.0-1053.el7.x86_64 qemu-kvm-rhev-2.12.0-31.el7.x86_64 Steps: 1. Obtain the output of 'virsh capabilities' # virsh capabilities >> new # virsh capabilities <capabilities> <host> <uuid>30333735-3938-4e43-4732-323053424b53</uuid> <cpu> <arch>x86_64</arch> <model>Opteron_G3</model> <vendor>AMD</vendor> <microcode version='16777433'/> <counter name='tsc' frequency='2000038000'/> 2. Using 'virsh hypervisor-cpu-baseline'/'virsh cpubaseline' to get the Cpu baseline based on new file # virsh hypervisor-cpu-baseline 66 error: unsupported configuration: Invalid TSC frequency # virsh cpu-baseline 66 error: unsupported configuration: Invalid TSC frequency Actual result: "virsh hypervisor-cpu-baseline"/"virsh cpu-baseline" failed because of TSC frequency. Additional info: These two cmds both can accept the output of "virsh capabilities" as input file to get cpu baseline. hypervisor-cpu-baseline FILE [virttype] [emulator] [arch] [machine] [--features] [--migratable] Compute a baseline CPU which will be compatible with all CPUs defined in an XML file and with the CPU the hypervisor is able to provide on the host. (This is different from cpu- baseline which does not consider any hypervisor abilities when computing the baseline CPU.) The XML FILE may contain either host or guest CPU definitions describing the host CPU model. The host CPU definition is the <cpu> element and its contents as printed by capabilities command. The guest CPU definition may be created from the host CPU model found in domain capabilities XML (printed by domcapabilities command). In addition to the <cpu> elements, this command accepts full capabilities XMLs, or domain capabilities XMLs containing the CPU definitions. For best results, use only the CPU definitions from domain capabilities. cpu-baseline FILE [--features] [--migratable] Compute baseline CPU which will be supported by all host CPUs given in <file>. (See hypervisor-cpu-baseline command to get a CPU which can be provided by a specific hypervisor.) The list of host CPUs is built by extracting all <cpu> elements from the <file>. Thus, the <file> can contain either a set of <cpu> elements separated by new lines or even a set of complete <capabilities> elements printed by capabilities command. If --features is specified, then the resulting XML description will explicitly include all features that make up the CPU, without this option features that are part of the CPU model will not be listed in the XML description. If --migratable is specified, features that block migration will not be included in the resulting CPU. Oops, this is a bug in the code which parses CPU definition from capabilities XML. I just sent the fix upstream for review: https://www.redhat.com/archives/libvir-list/2019-June/msg00152.html Fixed upstream by commit 4d21d4acf2eac961b8c25f1ec49a9c25f3951fdb Refs: v5.4.0-51-g4d21d4acf2 Author: Jiri Denemark <jdenemar> AuthorDate: Thu Jun 6 09:29:38 2019 +0200 Commit: Jiri Denemark <jdenemar> CommitDate: Thu Jun 6 09:40:40 2019 +0200 cpu_conf: Fix XPath for parsing TSC frequency Due to this bug the following command would fail on any host where TSC frequency can be probed: $ virsh capabilities | virsh cpu-baseline /dev/stdin error: unsupported configuration: Invalid TSC frequency https://bugzilla.redhat.com/show_bug.cgi?id=1641702 Signed-off-by: Jiri Denemark <jdenemar> Reviewed-by: Ján Tomko <jtomko> Verified this bug in libvirt-4.5.0-22.el7.x86_64 Version: libvirt-4.5.0-22.el7.x86_64 kernel-3.10.0-1053.el7.x86_64 qemu-kvm-rhev-2.12.0-32.el7.x86_64 Steps: Scenario-1: Check the output of "virsh capabilities" and compare/baseline cpu S1. Check the output of "virsh capabilities" when scaling=yes and compare/baseline cpu # virsh capabilities <counter name='tsc' frequency='2095078000' scaling='yes'/> # virsh capabilities > cap.xml # virsh hypervisor-cpu-baseline cap.xml # virsh hypervisor-cpu-compare cap.xml # virsh cpu-compare cap.xml # virsh cpu-baseline cap.xml ==> No innormal err for the cmds above S2. Check the output of "virsh capabilities" when scaling=no and compare/baseline cpu # virsh capabilities <counter name='tsc' frequency='2397223000' scaling='no'/> # virsh capabilities >> cap.xml # virsh hypervisor-cpu-baseline cap.xml # virsh hypervisor-cpu-compare cap.xml # virsh cpu-compare cap.xml # virsh cpu-baseline cap.xml ==> No innormal err for the cmds above Scenario-2: Configure tsc related XML for VM with different value of frequency S1. Configure the value less than the frequency in the output of "virsh capabilities" when scaling=yes # virsh capabilities |more <counter name='tsc' frequency='2095078000' scaling='yes'/> # virsh domstate vm shut off # virsh dumpxml vm --inactive |grep "<clock" -A10 <clock offset='utc'> ... <timer name='tsc' frequency='1000000000'/> </clock> # virsh start vm Domain vm started # ps -ef |grep vm qemu 70339 1 99 22:32 ? 00:02:15 /usr/libexec/qemu-kvm -name guest=vm ... -cpu Skylake-Server-IBRS,ds=off,acpi=off,ss=on,ht=off,tm=off,pbe=off,dtes64=off,ds_cpl=off,vmx=off,smx=off,est=off,tm2=off,xtpr=off,pdcm=off,dca=off,osxsave=off,tsc_adjust=on,clflushopt=on,intel-pt=off,pku=on,ospke=off,md-clear=on,stibp=on,ssbd=on,xsaves=off,invtsc=on,hypervisor=on,tsc-frequency=1000000000 S2. Configure the value more than the frequency in the output of "virsh capabilities" when scaling=yes # virsh capabilities |more <counter name='tsc' frequency='2095078000' scaling='yes'/> # virsh domstate vm shut off # virsh dumpxml vm --inactive |grep "<clock" -A10 <clock offset='utc'> ... <timer name='tsc' frequency='3000000000'/> </clock> # virsh start vm Domain vm started # ps -ef |grep vm qemu 70339 1 99 22:32 ? 00:02:15 /usr/libexec/qemu-kvm -name guest=vm ... -cpu Skylake-Server-IBRS,ds=off,acpi=off,ss=on,ht=off,tm=off,pbe=off,dtes64=off,ds_cpl=off,vmx=off,smx=off,est=off,tm2=off,xtpr=off,pdcm=off,dca=off,osxsave=off,tsc_adjust=on,clflushopt=on,intel-pt=off,pku=on,ospke=off,md-clear=on,stibp=on,ssbd=on,xsaves=off,invtsc=on,hypervisor=on,tsc-frequency=3000000000 S3. Configure the value equals the frequency in the output of "virsh capabilities" when scaling=yes # virsh capabilities |more <counter name='tsc' frequency='2095078000' scaling='yes'/> # virsh domstate q35771 shut off # virsh dumpxml q35771 --inactive <clock offset='utc'> ... <timer name='tsc' frequency='2095078000'/> </clock> # virsh start q35771 Domain q35771 started # ps -ef |grep q35771 -cpu Skylake-Server-IBRS,ds=on,acpi=on,ss=on,ht=on,tm=on,pbe=on,dtes64=on,monitor=on,ds_cpl=on,vmx=on,smx=on,est=on,tm2=on,xtpr=on,pdcm=on,dca=on,osxsave=on,tsc_adjust=on,clflushopt=on,intel-pt=on,pku=on,ospke=on,md-clear=on,stibp=on,ssbd=on,xsaves=on, ** invtsc=on,tsc-frequency=2095078000 ** S4. Configure the value less or more than the frequency in the output of "virsh capabilities" when scaling=no # virsh capabilities <counter name='tsc' frequency='2397223000' scaling='no'/> # virsh domstate vmq35_771 shut off # virsh dumpxml vmq35_771 --inactive |grep "<clock" -A10 <clock offset='utc'> ... <timer name='tsc' frequency='1000000000'/> </clock> # virsh start vmq35_771 error: Failed to start domain vmq35_771 error: unsupported configuration: Requested TSC frequency 1000000000 Hz does not match host (2397223000 Hz) and TSC scaling is not supported by the host CPU # virsh dumpxml vmq35_771 --inactive |grep "<clock" -A10 <clock offset='utc'> ... <timer name='tsc' frequency='2397223111'/> </clock> # virsh start vmq35_771 error: Failed to start domain vmq35_771 error: unsupported configuration: Requested TSC frequency 2397223111 Hz does not match host (2397223000 Hz) and TSC scaling is not supported by the host CPU S5. Configure the value equals than the frequency in the output of "virsh capabilities" when scaling=no # virsh capabilities <counter name='tsc' frequency='2397223000' scaling='no'/> # virsh domstate vmq35_771 shut off # virsh dumpxml vmq35_771 --inactive |grep "<clock" -A5 <clock offset='utc'> ... <timer name='tsc' frequency='2397223000'/> </clock> # virsh start vmq35_771 Domain vmq35_771 started # ps -ef |grep vmq35_771 qemu 8781 1 99 22:51 ? 00:00:10 /usr/libexec/qemu-kvm -name guest=vmq35_771 ... -cpu Penryn,vme=on,ss=on,x2apic=on,tsc-deadline=on,xsave=on,hypervisor=on,arat=on,tsc_adjust=on,tsc-frequency=2397223000 Scenario-3: Migrate VM in RHEL-7.7 host to RHEL-7.7 host with scaling=yes/no (the frequency in src and dst host is different.) S1: Migrate VM in RHEL-7.7 host to RHEL-7.7 host with scaling=yes (the frequency in src and dst host is different.) 1. Start the VM in src host and migrate the vm to dst host # virsh capabilities |grep counter <counter name='tsc' frequency='2095078000' scaling='yes'/> # virsh domstate vm shut off # virsh dumpxml vm --inactive |grep "<clock" -A5 <clock offset='utc'> ... <timer name='tsc' frequency='2095078000'/> </clock> # virsh start vm Domain vm started # virsh migrate vm qemu+ssh://dsthost/system --live --postcopy --postcopy-after-precopy --p2p --verbose --copy-storage-all Migration: [100 %] 2. Check the vm status in dst host # virsh capabilities |grep counter <counter name='tsc' frequency='1696014000' scaling='yes'/> # virsh domstate vm running # virsh dumpxml vm |grep "<clock" -A5 <clock offset='utc'> ... <timer name='tsc' frequency='2095078000'/> </clock> # ps -ef |grep vm qemu 3770 1 99 23:03 ? 00:25:21 /usr/libexec/qemu-kvm -name guest=vm ... -cpu Skylake-Server-IBRS,ds=off,acpi=off,ss=on,ht=off,tm=off,pbe=off,dtes64=off,ds_cpl=off,vmx=off,smx=off,est=off,tm2=off,xtpr=off,pdcm=off,dca=off,osxsave=off,tsc_adjust=on,clflushopt=on,intel-pt=off,pku=on,ospke=off,md-clear=on,stibp=on,ssbd=on,xsaves=off,invtsc=on,hypervisor=on,tsc-frequency=2095078000 S2: Migrate VM in RHEL-7.7 host to RHEL-7.7 host with scaling=no (the frequency in src and dst host is different.) 1. Start the VM in src host and migrate the vm to dst host (src host info) # virsh capabilities |grep counter <counter name='tsc' frequency='2095078000' scaling='yes'/> (dst host info) # virsh capabilities |grep "<counter" <counter name='tsc' frequency='2397223000' scaling='no'/> # virsh domstate vm shut off # virsh dumpxml vm --inactive |grep "<clock" -A5 <clock offset='utc'> ... <timer name='tsc' frequency='2095078000'/> </clock> # virsh start vm Domain vm started # virsh migrate vm qemu+ssh://hp-dl380g9-02.lab.eng.pek2.redhat.com/system --live --postcopy --postcopy-after-precopy --p2p --verbose --copy-storage-all error: unsupported configuration: Requested TSC frequency 2095078000 Hz does not match host (2397223000 Hz) and TSC scaling is not supported by the host CPU In Scenario-3, the vm is configured with cpu feature "invtsc". Add another scenario: Migrate VM with cpu feature "invtsc" (which is supported in RHEL-7.4) and tsc timer in RHEL-7.6.z to RHEL7.7 (with scaling=yes and scaling=no). The result is same with https://bugzilla.redhat.com/show_bug.cgi?id=1641702#c14. So all the results are as expected, move this bug to be verfified. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2019:2294 |