Bug 1839095
| Summary: | VM fails to migrate between identical hosts not supporting TSC scaling | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux Advanced Virtualization | Reporter: | Jiri Denemark <jdenemar> |
| Component: | libvirt | Assignee: | Jiri Denemark <jdenemar> |
| Status: | CLOSED ERRATA | QA Contact: | Lili Zhu <lizhu> |
| Severity: | medium | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 8.2 | CC: | bugs, ddepaula, fjin, jdenemar, jen, jsuchane, lsvaty, mavital, michal.skrivanek, mtosatti, mzamazal, pagranat, tbaransk, virt-maint, yalzhang, ymankad |
| Target Milestone: | rc | Keywords: | Automation |
| Target Release: | 8.2 | Flags: | pm-rhel:
mirror+
|
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | libvirt-6.10.0-1.el8 | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | 1821199 | Environment: | |
| Last Closed: | 2021-05-25 06:42:15 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | 6.10.0 |
| Embargoed: | |||
| Bug Depends On: | 1872366, 1882793 | ||
| Bug Blocks: | 1821199 | ||
|
Description
Jiri Denemark
2020-05-22 13:26:43 UTC
Copying my comment from the original bug (https://bugzilla.redhat.com/show_bug.cgi?id=1821199#c28): Marcelo, it seems the behavior does not match how I would understand the code in QEMU and the kernel (in comment 26). On a host without TSC scaling QEMU fails to set the frequency unless it is exactly the same as reported by the kernel: From virsh capabilities: <counter name='tsc' frequency='2903993000' scaling='no'/> TSC requested 1 kHz below the host: $ /usr/bin/qemu-system-x86_64 -machine pc,accel=kvm -cpu host,invtsc=on,tsc-frequency=2903992000 qemu-system-x86_64: warning: TSC frequency mismatch between VM (2903992 kHz) and host (2903993 kHz), and TSC scaling unavailable qemu-system-x86_64: kvm_init_vcpu failed: Operation not supported TSC requested 1 kHz above the host: $ /usr/bin/qemu-system-x86_64 -machine pc,accel=kvm -cpu host,invtsc=on,tsc-frequency=2903994000 qemu-system-x86_64: warning: TSC frequency mismatch between VM (2903994 kHz) and host (2903993 kHz), and TSC scaling unavailable qemu-system-x86_64: kvm_init_vcpu failed: Operation not supported Exact TSC: $ /usr/bin/qemu-system-x86_64 -machine pc,accel=kvm -cpu host,invtsc=on,tsc-frequency=2903993000 # QEMU runs happily here The TSC tolerance is the default value (which corresponds to +/- 726 kHz interval around the host frequency on this particular host): # cat /sys/module/kvm/parameters/tsc_tolerance_ppm 250 I tried this on several hosts without TSC scaling and the behavior is the same everywhere. Did I misunderstand anything? Or am I just doing it all wrong? Patch sent upstream for review: https://www.redhat.com/archives/libvir-list/2020-November/msg00519.html This is fixed upstream by
commit d8e5b4560006590668d4669f54a46b08ec14c1a2
Refs: v6.9.0-204-gd8e5b45600
Author: Jiri Denemark <jdenemar>
AuthorDate: Mon May 25 11:35:12 2020 +0200
Commit: Jiri Denemark <jdenemar>
CommitDate: Thu Nov 12 17:29:16 2020 +0100
qemu: Do not require TSC frequency to strictly match host
Some CPUs provide a way to read exact TSC frequency, while measuring it
is required on other CPUs. However, measuring is never exact and the
result may slightly differ across reboots. For this reason both Linux
kernel and QEMU recently started allowing for guests TSC frequency to
fall into +/- 250 ppm tolerance interval around the host TSC frequency.
Let's do the same to avoid unnecessary failures (esp. during migration)
in case the host frequency does not exactly match the frequency
configured in a domain XML.
https://bugzilla.redhat.com/show_bug.cgi?id=1839095
Signed-off-by: Jiri Denemark <jdenemar>
Reviewed-by: Daniel Henrique Barboza <danielhb413>
Testing this feature with:
libvirt-client-6.10.0-1.module+el8.4.0+8898+a84e86e1.x86_64
qemu-kvm-5.2.0-2.module+el8.4.0+9186+ec44380f.x86_64
1. check host capabilities
# virsh capabilities |grep tsc
<counter name='tsc' frequency='2399996000' scaling='no'/>
<feature name='tsc_adjust'/>
<feature name='invtsc'/>
2. prepare a guest with the following xml definition
...
<cpu mode='host-model' check='partial'>
<feature policy='require' name='invtsc'/>
</cpu>
...
<clock offset='utc'>
<timer name='tsc' frequency='2399396001'/>
</clock>
3. start the guest
# virsh start rhel8.4
error: Failed to start domain rhel8.4
error: unsupported configuration: Requested TSC frequency 2399396001 Hz is outside tolerance range ([2399396001, 2400595999] Hz) around host frequency 2399996000 Hz and TSC scaling is not supported by the host CPU
4. prepare a guest with the tsc frequency 2400595999Hz
# virsh start rhel8.4
error: Failed to start domain rhel8.4
error: unsupported configuration: Requested TSC frequency 2400595999 Hz is outside tolerance range ([2399396001, 2400595999] Hz) around host frequency 2399996000 Hz and TSC scaling is not supported by the host CPU
Hi, Jiri
I am confused about the frequency boundaries. AFAIK, square brackets mean the end point is included.
Please help to check.
Also tested the frequencies between 2399396002 and 23993960999
# virsh dumpxml rhel8.4 |grep tsc
<feature policy='require' name='invtsc'/>
<timer name='tsc' frequency='2399396999'/>
# virsh start rhel8.4
error: Failed to start domain rhel8.4
error: internal error: qemu unexpectedly closed the monitor: 2020-12-21T12:04:25.865140Z qemu-kvm: warning: TSC frequency mismatch between VM (2399396 kHz) and host (2399996 kHz), and TSC scaling unavailable
2020-12-21T12:04:25.865236Z qemu-kvm: kvm_init_vcpu: kvm_arch_init_vcpu failed (0): Operation not supported
Not hit the issue for the case near the upper boundary.
(In reply to Lili Zhu from comment #10) > I am confused about the frequency boundaries. AFAIK, square brackets mean > the end point is included. Oops, yes, the kernel code fails if freq < min or freq > max while I didn't include the boundaries when converting the code to succeed when within bounds. I'll fix this. (In reply to Lili Zhu from comment #11) > Also tested the frequencies between 2399396002 and 23993960999 > # virsh dumpxml rhel8.4 |grep tsc > <feature policy='require' name='invtsc'/> > <timer name='tsc' frequency='2399396999'/> > > # virsh start rhel8.4 > error: Failed to start domain rhel8.4 > error: internal error: qemu unexpectedly closed the monitor: > 2020-12-21T12:04:25.865140Z qemu-kvm: warning: TSC frequency mismatch > between VM (2399396 kHz) and host (2399996 kHz), and TSC scaling unavailable > 2020-12-21T12:04:25.865236Z qemu-kvm: kvm_init_vcpu: kvm_arch_init_vcpu > failed (0): Operation not supported > > Not hit the issue for the case near the upper boundary. Unfortunately, asking for TSC frequency within the tolerance range around host frequency may not be enough to make it work. And there's no way to ask the kernel (other than trying to set the frequency) or even get a sensible error from the kernel about why it does not work. So QEMU just asks the kernel to set the guest frequency and reports the generic error when this fails. The important thing here it's not libvirt which is refusing to start the domain because of too strict requirements. Thus libvirt checks the interval to report a reasonable error when the TSC frequency is significantly off and leaving the rest for QEMU/KVM to deal with. (In reply to Lili Zhu from comment #10) > error: unsupported configuration: Requested TSC frequency 2399396001 Hz is > outside tolerance range ([2399396001, 2400595999] Hz) around host frequency > 2399996000 Hz and TSC scaling is not supported by the host CPU > > error: unsupported configuration: Requested TSC frequency 2400595999 Hz is > outside tolerance range ([2399396001, 2400595999] Hz) around host frequency > 2399996000 Hz and TSC scaling is not supported by the host CPU This is now fixed upstream by commit f7c40b5c716fea5d2a4179569146307ebebc76ba Refs: v6.10.0-309-gf7c40b5c71 Author: Jiri Denemark <jdenemar> AuthorDate: Tue Jan 5 23:53:25 2021 +0100 Commit: Jiri Denemark <jdenemar> CommitDate: Wed Jan 6 11:24:37 2021 +0100 qemu: The TSC tolerance interval should be closed The kernel refuses to set guest TSC frequency less than a minimum frequency or greater than maximum frequency (both computed based on the host TSC frequency). When writing the libvirt code with a reversed logic (return success when the requested frequency falls within the tolerance interval) I forgot to include the boundaries. Fixes: d8e5b4560006590668d4669f54a46b08ec14c1a2 https://bugzilla.redhat.com/show_bug.cgi?id=1839095 Signed-off-by: Jiri Denemark <jdenemar> Reviewed-by: Peter Krempa <pkrempa> 1. prepare 2 hosts have cpu feature "invtsc", not supporting TSC scaling.
The tsc frequency of the 2 hosts are slightly different.
host A:
# virsh capabilities |grep counter
<counter name='tsc' frequency='2399996000' scaling='no'/>
host B:
# virsh capabilities |grep counter
<counter name='tsc' frequency='2399997000' scaling='no'/>
2. prepare a guest with the following xml definition on host A
# virsh dumpxml rhel8.4
...
<cpu mode='host-model' check='partial'>
<feature policy='require' name='invtsc'/>
</cpu>
<clock offset='utc'>
<timer name='tsc' frequency='2399996000'/>
</clock>
...
3. start the guest, then migrate the guest to host B
# virsh migrate rhel8.4 qemu+ssh://dell-per730-36.lab.eng.pek2.redhat.com/system --live --p2p --undefinesource --persistent --verbose
Migration: [100 %]
4. check the guest state
# virsh list --all
Id Name State
-------------------------
2 rhel8.4 running
5. check the guest xml
# virsh dumpxml rhel8.4
....
<cpu mode='custom' match='exact' check='full'>
<model fallback='forbid'>Haswell-noTSX-IBRS</model>
<vendor>Intel</vendor>
<feature policy='require' name='vme'/>
<feature policy='require' name='ss'/>
<feature policy='require' name='vmx'/>
<feature policy='require' name='pdcm'/>
<feature policy='require' name='f16c'/>
<feature policy='require' name='rdrand'/>
<feature policy='require' name='hypervisor'/>
<feature policy='require' name='arat'/>
<feature policy='require' name='tsc_adjust'/>
<feature policy='require' name='umip'/>
<feature policy='require' name='md-clear'/>
<feature policy='require' name='stibp'/>
<feature policy='require' name='arch-capabilities'/>
<feature policy='require' name='ssbd'/>
<feature policy='require' name='xsaveopt'/>
<feature policy='require' name='pdpe1gb'/>
<feature policy='require' name='abm'/>
<feature policy='require' name='ibpb'/>
<feature policy='require' name='amd-stibp'/>
<feature policy='require' name='amd-ssbd'/>
<feature policy='require' name='skip-l1dfl-vmentry'/>
<feature policy='require' name='pschange-mc-no'/>
<feature policy='require' name='invtsc'/>
</cpu>
<clock offset='utc'>
<timer name='rtc' tickpolicy='catchup'/>
<timer name='pit' tickpolicy='delay'/>
<timer name='hpet' present='no'/>
<timer name='tsc' frequency='2399996000'/>
</clock>
...
migration succeed.
As the guest migration between two identical hosts without TSC scaling support and slightly different TSC frequency succeed now, mark the bug as verified. Will track the issue in Comment 10 in another bug. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (virt:av bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:2098 |