Bug 1839095 - VM fails to migrate between identical hosts not supporting TSC scaling
Summary: VM fails to migrate between identical hosts not supporting TSC scaling
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux Advanced Virtualization
Classification: Red Hat
Component: libvirt
Version: 8.2
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: rc
: 8.2
Assignee: Jiri Denemark
QA Contact: Lili Zhu
URL:
Whiteboard:
Depends On: 1872366 1882793
Blocks: 1821199
TreeView+ depends on / blocked
 
Reported: 2020-05-22 13:26 UTC by Jiri Denemark
Modified: 2021-05-25 06:42 UTC (History)
16 users (show)

Fixed In Version: libvirt-6.10.0-1.el8
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1821199
Environment:
Last Closed: 2021-05-25 06:42:15 UTC
Type: Bug
Target Upstream Version: 6.10.0
Embargoed:


Attachments (Terms of Use)

Description Jiri Denemark 2020-05-22 13:26:43 UTC
+++ This bug was initially created as a clone of Bug #1821199 +++

Trying to migrate a domain between two identical hosts with slightly different
TSC frequency fails with:

    unsupported configuration: Requested TSC frequency 2133408000 Hz does not
    match host (2133406000 Hz) and TSC scaling is not supported by the host CPU

Apparently it is possible to get the exact frequency on modern CPUs, but it is
just an output of some calibration code, which means the frequency may differ
slightly even on identical hosts.

Version-Release number of selected component (if applicable):

libvirt-6.0.0-16.el8

How reproducible:100%

Steps to Reproduce:

Find two identical hosts without TSC scaling support and slightly different
TSC frequency. Both can be checked in virsh capabilities:

virsh -r capabilities | grep "counter name='tsc'"
      <counter name='tsc' frequency='2300026000' scaling='no'/>

Start a domain with

    <cpu mode='host-passthrough' check='none'>
      <feature policy='require' name='invtsc'/>
    </cpu>
    <clock offset='utc'>
      <timer name='tsc' frequency='$HOST_TSC_FREQUENCY'/>
    </clock>

and try to migrate it to the other host.

It is possible to reproduce this issue even with a single host and not
involving migration. Just try to start a domain configured as shown above, but
use TSC frequency which slightly differs from the host (e.g.,
$HOST_TSC_FREQUENCY - 10000) in the <timer> element. The domain will fail to
start with the error from bug description.

--- Additional comment from Milan Zamazal on 2020-04-09 14:37:06 UTC ---

On one host:
kernel 4.18.0-147.0.3.el8_1 -> 4.18.0-147.8.1.el8_1
systemd 239-18.el8_1.4 -> 239-29.el8
qemu 4.2.0-16 -> 4.2.0-17
libvirt 6.0.0-15 -> 6.0.0-16

On the other host:
kernel 4.18.0-147.0.3.el8_1 -> 4.18.0-147.8.1.el8_1
systemd 239-18.el8_1.4 -> 239-28.el8
qemu 4.1.0-23 -> 4.2.0-17
libvirt 5.6.0-10 -> 6.0.0-16

After inspecting my environment and old logs I can see the problem in my
environment is that on one of the hosts the reported TSC frequency has changed
from 2133408000 Hz to 2133406000 Hz (and there is no TSC frequency scaling now
or before). I tried to reboot the host and now the reported TSC frequency is
2133407000 Hz. So the reported frequency is nondeterministic and slightly
variable, which causes the migration problem.

--- Additional comment from Milan Zamazal on 2020-04-15 17:10:33 UTC ---

Looking into linux/arch/x86/kernel/tsc.c and dmesg on my host, the TSC
frequency value comes from calibration and the kernel tries to keep it within
certain bounds of accuracy. It's possible to get an exact value from hardware
info on modern Intel CPUs but in other cases it's only measurement. If my
observations are correct, we can't expect to have exactly the same TSC
frequency values in many cases, even on the same machine across reboots (as my
host demonstrates).

Now the question is how to deal with that fact in migrations. Do I assume
correctly that libvirt would reject the VM on the destination if the TSC
frequency specified in the domain XML wasn't the same as on the host? And can
slightly different TSC frequencies cause any harm to HP VMs? One very rude and
ugly way to deal with that would be to tolerate slight differences in Engine
(assuming it's OK for HP VMs) and replace the TSC frequency in libvirt hook.
Another way would be to restrict migrations to hardware providing exact TSC
frequency info, which is perhaps too restrictive and quite confusing. Maybe
libvirt could provide some assistance, which might be the best solution.

What do you all think?

--- Additional comment from Jiri Denemark on 2020-04-16 08:48:03 UTC ---

Yes, libvirt compares the TSC frequency in domain XML with the frequency
probed from the host and refuses to start the domain if they don't exactly
match. Unfortunately, changing the TSC frequency during migration is forbidden
too, libvirt explicitly checks the frequencies match in both the original and
updated domain definition (either supplied by a parameter to the migration API
or via a pre-migration hook). We need to check whether the strict match is
really necessary by trying to create a domain with a TSC frequency which
slightly differs from the host's frequency which does not support scaling.

--- Additional comment from Jiri Denemark on 2020-04-29 15:06:22 UTC ---

Marcelo, how do you think we should handle migration between two identical
hosts which do not support TSC scaling, but the TSC frequency probed by the
kernel differs a bit. Currently libvirt refuses to migrate a domain between
these two hosts because of TSC frequency mismatch.

--- Additional comment from Marcelo Tosatti on 2020-05-19 20:13:41 UTC ---

(In reply to Milan Zamazal from comment #12)
> After inspecting my environment and old logs I can see the problem in my
> environment is that on one of the hosts the reported TSC frequency has
> changed from 2133408000 Hz to 2133406000 Hz (and there is no TSC frequency
> scaling now or before). I tried to reboot the host and now the reported TSC
> frequency is 2133407000 Hz. So the reported frequency is nondeterministic
> and slightly variable, which causes the migration problem.

Jiri,

KVM_SET_TSC_KHZ supports an error of 250 ppm (see tsc_tolerance_ppm and 
adjust_tsc_khz in arch/x86/kvm/x86.c in the kernel source).

Can that code be added to libvirt as well?

--- Additional comment from Jiri Denemark on 2020-05-21 20:18:51 UTC ---

The domain XML shows invtsc is exposed to the guest.

Thanks Marcelo for the reference to the kernel code:

    /* tsc tolerance in parts per million - default to 1/2 of the NTP threshold */
    static u32 __read_mostly tsc_tolerance_ppm = 250;
    module_param(tsc_tolerance_ppm, uint, S_IRUGO | S_IWUSR);

    static u32 adjust_tsc_khz(u32 khz, s32 ppm)
    {
        u64 v = (u64)khz * (1000000 + ppm);
        do_div(v, 1000000);
        return v;
    }

    static int kvm_set_tsc_khz(struct kvm_vcpu *vcpu, u32 user_tsc_khz)
    {
        ...
        thresh_lo = adjust_tsc_khz(tsc_khz, -tsc_tolerance_ppm);
        thresh_hi = adjust_tsc_khz(tsc_khz, tsc_tolerance_ppm);
        if (user_tsc_khz < thresh_lo || user_tsc_khz > thresh_hi) {
            pr_debug("kvm: requested TSC rate %u falls outside tolerance [%u,%u]\n", user_tsc_khz, thresh_lo, thresh_hi);
            use_scaling = 1;
        }
        ...
    }

So for the host TSC frequency 2133408000 Hz a domain can request anything
within +/- 533 kHz of the host frequency. The only problem is that
tsc_tolerance_ppm is a parameter of the kvm module.

We have two options, either use the default value in libvirt and hope nobody
changed it on the host or we can try setting the frequency via KVM and check
the result. But looking at the kernel code I don't see how we could
detect that setting TSC failed because of unsupported TSC scaling rather than
some other reason.

Anyway it seems the kernel is less strict and allows setting TSC frequency
which is greater than host TSC frequency even without TSC scaling:

    static int set_tsc_khz(struct kvm_vcpu *vcpu, u32 user_tsc_khz, bool scale)
    {
        ...
        /* TSC scaling supported? */
        if (!kvm_has_tsc_control) {
            if (user_tsc_khz > tsc_khz) {
                vcpu->arch.tsc_catchup = 1;
                vcpu->arch.tsc_always_catchup = 1;
                return 0;
            } else {
                pr_warn_ratelimited("user requested TSC rate below hardware speed\n");
                return -1;
            }
        }
        ...
    }

I checked the code in QEMU and it calls KVM_SET_TSC_KHZ first and checks TSC
frequencies only if the call fails. In other words, libvirt is the only part
which needs fixing here because it is too strict.

Comment 1 Jiri Denemark 2020-06-04 08:32:40 UTC
Copying my comment from the original bug (https://bugzilla.redhat.com/show_bug.cgi?id=1821199#c28):

Marcelo, it seems the behavior does not match how I would understand the code
in QEMU and the kernel (in comment 26). On a host without TSC scaling QEMU
fails to set the frequency unless it is exactly the same as reported by the
kernel:

From virsh capabilities:

    <counter name='tsc' frequency='2903993000' scaling='no'/>

TSC requested 1 kHz below the host:

    $ /usr/bin/qemu-system-x86_64 -machine pc,accel=kvm -cpu host,invtsc=on,tsc-frequency=2903992000
    qemu-system-x86_64: warning: TSC frequency mismatch between VM (2903992 kHz) and host (2903993 kHz), and TSC scaling unavailable
    qemu-system-x86_64: kvm_init_vcpu failed: Operation not supported

TSC requested 1 kHz above the host:

    $ /usr/bin/qemu-system-x86_64 -machine pc,accel=kvm -cpu host,invtsc=on,tsc-frequency=2903994000
    qemu-system-x86_64: warning: TSC frequency mismatch between VM (2903994 kHz) and host (2903993 kHz), and TSC scaling unavailable
    qemu-system-x86_64: kvm_init_vcpu failed: Operation not supported

Exact TSC:

    $ /usr/bin/qemu-system-x86_64 -machine pc,accel=kvm -cpu host,invtsc=on,tsc-frequency=2903993000
    # QEMU runs happily here

The TSC tolerance is the default value (which corresponds to +/- 726 kHz
interval around the host frequency on this particular host):

    # cat /sys/module/kvm/parameters/tsc_tolerance_ppm
    250

I tried this on several hosts without TSC scaling and the behavior is the same
everywhere. Did I misunderstand anything? Or am I just doing it all wrong?

Comment 3 Jiri Denemark 2020-11-11 12:24:42 UTC
Patch sent upstream for review: https://www.redhat.com/archives/libvir-list/2020-November/msg00519.html

Comment 4 Jiri Denemark 2020-11-12 16:34:26 UTC
This is fixed upstream by

commit d8e5b4560006590668d4669f54a46b08ec14c1a2
Refs: v6.9.0-204-gd8e5b45600
Author:     Jiri Denemark <jdenemar>
AuthorDate: Mon May 25 11:35:12 2020 +0200
Commit:     Jiri Denemark <jdenemar>
CommitDate: Thu Nov 12 17:29:16 2020 +0100

    qemu: Do not require TSC frequency to strictly match host

    Some CPUs provide a way to read exact TSC frequency, while measuring it
    is required on other CPUs. However, measuring is never exact and the
    result may slightly differ across reboots. For this reason both Linux
    kernel and QEMU recently started allowing for guests TSC frequency to
    fall into +/- 250 ppm tolerance interval around the host TSC frequency.

    Let's do the same to avoid unnecessary failures (esp. during migration)
    in case the host frequency does not exactly match the frequency
    configured in a domain XML.

    https://bugzilla.redhat.com/show_bug.cgi?id=1839095

    Signed-off-by: Jiri Denemark <jdenemar>
    Reviewed-by: Daniel Henrique Barboza <danielhb413>

Comment 10 Lili Zhu 2020-12-21 12:01:06 UTC
Testing this feature with:
libvirt-client-6.10.0-1.module+el8.4.0+8898+a84e86e1.x86_64
qemu-kvm-5.2.0-2.module+el8.4.0+9186+ec44380f.x86_64

1. check host capabilities
# virsh capabilities |grep tsc
      <counter name='tsc' frequency='2399996000' scaling='no'/>
      <feature name='tsc_adjust'/>
      <feature name='invtsc'/>

2. prepare a guest with the following xml definition
...
 <cpu mode='host-model' check='partial'>
    <feature policy='require' name='invtsc'/>
  </cpu>
...
  <clock offset='utc'>
    <timer name='tsc' frequency='2399396001'/>
  </clock>

3. start the guest
# virsh start rhel8.4 
error: Failed to start domain rhel8.4
error: unsupported configuration: Requested TSC frequency 2399396001 Hz is outside tolerance range ([2399396001, 2400595999] Hz) around host frequency 2399996000 Hz and TSC scaling is not supported by the host CPU

4. prepare a guest with the tsc frequency 2400595999Hz
# virsh start rhel8.4 
error: Failed to start domain rhel8.4
error: unsupported configuration: Requested TSC frequency 2400595999 Hz is outside tolerance range ([2399396001, 2400595999] Hz) around host frequency 2399996000 Hz and TSC scaling is not supported by the host CPU

Hi, Jiri
I am confused about the frequency boundaries. AFAIK, square brackets mean the end point is included.
Please help to check.

Comment 11 Lili Zhu 2020-12-21 12:07:33 UTC
Also tested the frequencies between 2399396002 and 23993960999
# virsh dumpxml rhel8.4  |grep tsc
    <feature policy='require' name='invtsc'/>
    <timer name='tsc' frequency='2399396999'/>

# virsh start rhel8.4 
error: Failed to start domain rhel8.4
error: internal error: qemu unexpectedly closed the monitor: 2020-12-21T12:04:25.865140Z qemu-kvm: warning: TSC frequency mismatch between VM (2399396 kHz) and host (2399996 kHz), and TSC scaling unavailable
2020-12-21T12:04:25.865236Z qemu-kvm: kvm_init_vcpu: kvm_arch_init_vcpu failed (0): Operation not supported

Not hit the issue for the case near the upper boundary.

Comment 12 Jiri Denemark 2021-01-05 15:45:05 UTC
(In reply to Lili Zhu from comment #10)
> I am confused about the frequency boundaries. AFAIK, square brackets mean
> the end point is included.

Oops, yes, the kernel code fails if freq < min or freq > max while I didn't
include the boundaries when converting the code to succeed when within bounds.
I'll fix this.

Comment 13 Jiri Denemark 2021-01-05 15:51:34 UTC
(In reply to Lili Zhu from comment #11)
> Also tested the frequencies between 2399396002 and 23993960999
> # virsh dumpxml rhel8.4  |grep tsc
>     <feature policy='require' name='invtsc'/>
>     <timer name='tsc' frequency='2399396999'/>
> 
> # virsh start rhel8.4 
> error: Failed to start domain rhel8.4
> error: internal error: qemu unexpectedly closed the monitor:
> 2020-12-21T12:04:25.865140Z qemu-kvm: warning: TSC frequency mismatch
> between VM (2399396 kHz) and host (2399996 kHz), and TSC scaling unavailable
> 2020-12-21T12:04:25.865236Z qemu-kvm: kvm_init_vcpu: kvm_arch_init_vcpu
> failed (0): Operation not supported
> 
> Not hit the issue for the case near the upper boundary.

Unfortunately, asking for TSC frequency within the tolerance range around host
frequency may not be enough to make it work. And there's no way to ask the
kernel (other than trying to set the frequency) or even get a sensible error
from the kernel about why it does not work. So QEMU just asks the kernel to
set the guest frequency and reports the generic error when this fails.

The important thing here it's not libvirt which is refusing to start the
domain because of too strict requirements. Thus libvirt checks the interval to
report a reasonable error when the TSC frequency is significantly off and
leaving the rest for QEMU/KVM to deal with.

Comment 14 Jiri Denemark 2021-01-06 10:28:31 UTC
(In reply to Lili Zhu from comment #10)
> error: unsupported configuration: Requested TSC frequency 2399396001 Hz is
> outside tolerance range ([2399396001, 2400595999] Hz) around host frequency
> 2399996000 Hz and TSC scaling is not supported by the host CPU
> 
> error: unsupported configuration: Requested TSC frequency 2400595999 Hz is
> outside tolerance range ([2399396001, 2400595999] Hz) around host frequency
> 2399996000 Hz and TSC scaling is not supported by the host CPU

This is now fixed upstream by

commit f7c40b5c716fea5d2a4179569146307ebebc76ba
Refs: v6.10.0-309-gf7c40b5c71
Author:     Jiri Denemark <jdenemar>
AuthorDate: Tue Jan 5 23:53:25 2021 +0100
Commit:     Jiri Denemark <jdenemar>
CommitDate: Wed Jan 6 11:24:37 2021 +0100

    qemu: The TSC tolerance interval should be closed

    The kernel refuses to set guest TSC frequency less than a minimum
    frequency or greater than maximum frequency (both computed based on the
    host TSC frequency). When writing the libvirt code with a reversed logic
    (return success when the requested frequency falls within the tolerance
    interval) I forgot to include the boundaries.

    Fixes: d8e5b4560006590668d4669f54a46b08ec14c1a2
    https://bugzilla.redhat.com/show_bug.cgi?id=1839095

    Signed-off-by: Jiri Denemark <jdenemar>
    Reviewed-by: Peter Krempa <pkrempa>

Comment 15 Lili Zhu 2021-01-08 12:23:12 UTC
1. prepare 2 hosts have cpu feature "invtsc", not supporting TSC scaling.
The tsc frequency of the 2 hosts are slightly different.
host A:
# virsh capabilities |grep counter
      <counter name='tsc' frequency='2399996000' scaling='no'/>

host B:
# virsh capabilities |grep counter
      <counter name='tsc' frequency='2399997000' scaling='no'/>

2. prepare a guest with the following xml definition on host A
# virsh dumpxml rhel8.4
...
 <cpu mode='host-model' check='partial'>
    <feature policy='require' name='invtsc'/>
  </cpu>
  <clock offset='utc'>
    <timer name='tsc' frequency='2399996000'/>
  </clock>
...

3. start the guest, then migrate the guest to host B
# virsh migrate rhel8.4 qemu+ssh://dell-per730-36.lab.eng.pek2.redhat.com/system --live --p2p --undefinesource --persistent --verbose 
Migration: [100 %]

4. check the guest state
# virsh list --all
 Id   Name      State
-------------------------
 2    rhel8.4   running


5. check the guest xml
# virsh dumpxml rhel8.4
....
  <cpu mode='custom' match='exact' check='full'>
    <model fallback='forbid'>Haswell-noTSX-IBRS</model>
    <vendor>Intel</vendor>
    <feature policy='require' name='vme'/>
    <feature policy='require' name='ss'/>
    <feature policy='require' name='vmx'/>
    <feature policy='require' name='pdcm'/>
    <feature policy='require' name='f16c'/>
    <feature policy='require' name='rdrand'/>
    <feature policy='require' name='hypervisor'/>
    <feature policy='require' name='arat'/>
    <feature policy='require' name='tsc_adjust'/>
    <feature policy='require' name='umip'/>
    <feature policy='require' name='md-clear'/>
    <feature policy='require' name='stibp'/>
    <feature policy='require' name='arch-capabilities'/>
    <feature policy='require' name='ssbd'/>
    <feature policy='require' name='xsaveopt'/>
    <feature policy='require' name='pdpe1gb'/>
    <feature policy='require' name='abm'/>
    <feature policy='require' name='ibpb'/>
    <feature policy='require' name='amd-stibp'/>
    <feature policy='require' name='amd-ssbd'/>
    <feature policy='require' name='skip-l1dfl-vmentry'/>
    <feature policy='require' name='pschange-mc-no'/>
    <feature policy='require' name='invtsc'/>
  </cpu>
  <clock offset='utc'>
    <timer name='rtc' tickpolicy='catchup'/>
    <timer name='pit' tickpolicy='delay'/>
    <timer name='hpet' present='no'/>
    <timer name='tsc' frequency='2399996000'/>
  </clock>
...

migration succeed.

Comment 16 Lili Zhu 2021-01-11 08:13:14 UTC
As the guest migration between two identical hosts without TSC scaling support and slightly different
TSC frequency succeed now, mark the bug as verified.

Will track the issue in Comment 10 in another bug.

Comment 18 errata-xmlrpc 2021-05-25 06:42:15 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (virt:av bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:2098


Note You need to log in before you can comment on or make changes to this bug.