Bug 2185076
Summary: | delays injecting PIT timer interrupt with OpenShift | ||
---|---|---|---|
Product: | Container Native Virtualization (CNV) | Reporter: | Marcelo Tosatti <mtosatti> |
Component: | Virtualization | Assignee: | lpivarc |
Status: | CLOSED ERRATA | QA Contact: | zhe peng <zpeng> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 4.11.3 | CC: | fbaudin, fdeutsch, gveitmic, jsuchane, lpivarc, mkletzan, pelauter, sgott, vromanso |
Target Milestone: | --- | ||
Target Release: | 4.14.0 | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2023-11-08 14:05:27 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Marcelo Tosatti
2023-04-06 19:12:09 UTC
Pseudo-algorithm: 1) Retrieve the qemu process PID. 2) Retrieve the list of processes on the host. Search for the process named "kvm-pit/$pid-of-qemu", where $pid-of-qemu is an integer containing the qemu process PID. For example: "kvm-pit/35771" and $ cat /proc/35771/comm qemu-kvm 3) Add the TID of that kvm-pit process to the cgroup of the vcpus (cgroupManager.AttachTID API). 4) Assign the same cpumask as assigned to VCPU0 to kvm-pit process (can use sched_getaffinity/sched_setaffinity). Marcello, assigned this BZ to you as you did the work. Thanks for your help! Issue fixed upstream: https://github.com/kubevirt/kubevirt/pull/9613 Backport for this problem under way: https://bugzilla.redhat.com/show_bug.cgi?id=2208674 verify with build: CNV-v4.14.0.rhel9-1553 step: 1. enable static cpu manager policy $ oc label node c01-zpeng-414-9xqx2-worker-0-9lgvv cpumanager=true $ oc edit machineconfigpools.machineconfiguration.openshift.io worker add: ... labels: custom-kubelet: cpumanager-enabled ... edit KubeletConfig for CPU Manager $ oc edit kubeletconfigs.machineconfiguration.openshift.io .... name: cpumanager-enabled resourceVersion: "157576" uid: 4b8a8bef-af63-4cf5-be8e-16e9cc96dd2d ... spec: kubeletConfig: cpuManagerPolicy: static cpuManagerReconcilePeriod: 5s machineConfigPoolSelector: matchLabels: custom-kubelet: cpumanager-enabled pools.operator.machineconfiguration.openshift.io/worker: "" ... Check the worker for the updated kubelet.conf: $ oc debug node/c01-zpeng-414-9xqx2-worker-0-9lgvv sh-4.4# cat /etc/kubernetes/kubelet.conf | grep cpuManager "cpuManagerPolicy": "static", "cpuManagerReconcilePeriod": "5s", 2: create a vm have dedicatedcpu: true flag, requests/limits with integer values and equal. .... domain: clock: timer: pit: tickPolicy: delay cpu: cores: 1 dedicatedCpuPlacement: true isolateEmulatorThread: true model: host-passthrough sockets: 2 threads: 1 ...... resources: limits: memory: 2Gi requests: memory: 2Gi 3: before start vm, enable trace in host sh-5.1#echo 140000 > /sys/kernel/debug/tracing/buffer_size_kb enable sched_switch, sched_waking tracepoint: sh-5.1#echo sched_switch > /sys/kernel/debug/tracing/set_event sh-5.1#echo sched_waking > /sys/kernel/debug/tracing/set_event start collecting the trace and saving it to a file: cat /sys/kernel/debug/tracing/trace_pipe > /root/saved-trace.txt 4: start vm, check trace file: ..... kvm-pit/77-1016715 [001] d.h3. 11772.319302: sched_waking: comm=CPU 0/KVM pid=1016711 prio=120 target_cpu=001 .... CPU 0/KVM-1016711 [001] d.h3. 11774.406453: sched_waking: comm=kvm-pit/77 pid=1016715 prio=120 target_cpu=001 .... sh-5.1# cat /proc/1016715/status | grep Cpus_allowed Cpus_allowed: 02 Cpus_allowed_list: 1 move to verified. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Virtualization 4.14.0 Images security and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2023:6817 |