Description of problem: Reference: Bug 2125671 On a heterogeneous AMD cluster where each node has own tsc-frequency - the lowest frequency set to all nodes, as expected: > $ for i in $(oc get node -o name);do echo $i;oc describe $i | grep tsc-freq; done > node/cnv-qe-infra-25.cnvqe2.lab.eng.rdu2.redhat.com > cpu-timer.node.kubevirt.io/tsc-frequency=1800000000 > scheduling.node.kubevirt.io/tsc-frequency-1800000000=true > node/cnv-qe-infra-26.cnvqe2.lab.eng.rdu2.redhat.com > cpu-timer.node.kubevirt.io/tsc-frequency=2500000000 > scheduling.node.kubevirt.io/tsc-frequency-1800000000=true > scheduling.node.kubevirt.io/tsc-frequency-2500000000=true > node/cnv-qe-infra-27.cnvqe2.lab.eng.rdu2.redhat.com > cpu-timer.node.kubevirt.io/tsc-frequency=3000000000 > scheduling.node.kubevirt.io/tsc-frequency-1800000000=true > scheduling.node.kubevirt.io/tsc-frequency-3000000000=true And VM is asking for this frequency: > bash-4.4$ virsh dumpxml 1 | grep tsc > <timer name='tsc' frequency='1800000000'/> However, VM may observe time jumps in logs right after run or after migration: > Nov 15 13:22:28 rhel-tsc-10 systemd[4839]: Startup finished in 27ms. > Nov 15 13:22:28 rhel-tsc-10 systemd[1]: Started User Manager for UID 1000. > Nov 15 13:22:28 rhel-tsc-10 systemd[1]: Started Session 2 of user fedora. > Nov 15 16:20:18 rhel-tsc-10 kernel: clocksource: timekeeping watchdog on CPU0: Marking clocksource 'tsc' as unstable because the skew is too large: > Nov 15 16:20:18 rhel-tsc-10 kernel: clocksource: 'kvm-clock' wd_now: a007fc3247f wd_last: 5368ba9ca5 mask: ffffffffffffffff > Nov 15 16:20:18 rhel-tsc-10 kernel: clocksource: 'tsc' cs_now: 1200f50582d2 cs_last: 96329863ce mask: ffffffffffffffff > Nov 15 16:20:18 rhel-tsc-10 kernel: tsc: Marking TSC unstable due to clocksource watchdog > Nov 15 16:20:18 rhel-tsc-10 systemd[1]: Starting dnf makecache... and switching from tsc to kvm-clock: > # cat /sys/devices/system/clocksource/clocksource0/current_clocksource > kvm-clock Version-Release number of selected component (if applicable): 4.11
As part of qe_test_coverage, we plan to add a testcase and cover migration with different cpu frequencies on the nodes.
Verified on OCP 4.11.20 + CNV v4.11.2-21: the kernel on the node: > $ uname -r > 4.18.0-372.36.1.el8_6.x86_64 Created 15 VMs, migrated multiple times: all VMs are accessible, don't see any time jumps. Don't see switching clocksource to kvm-clock.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Virtualization 4.11.3 Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2023:0621