Bug 2185076
| Summary: | delays injecting PIT timer interrupt with OpenShift | ||
|---|---|---|---|
| Product: | Container Native Virtualization (CNV) | Reporter: | Marcelo Tosatti <mtosatti> |
| Component: | Virtualization | Assignee: | lpivarc |
| Status: | VERIFIED --- | QA Contact: | zhe peng <zpeng> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 4.11.3 | CC: | fbaudin, fdeutsch, gveitmic, jsuchane, lpivarc, mkletzan, pelauter, sgott, vromanso |
| Target Milestone: | --- | ||
| Target Release: | 4.14.0 | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | Type: | Bug | |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
Pseudo-algorithm: 1) Retrieve the qemu process PID. 2) Retrieve the list of processes on the host. Search for the process named "kvm-pit/$pid-of-qemu", where $pid-of-qemu is an integer containing the qemu process PID. For example: "kvm-pit/35771" and $ cat /proc/35771/comm qemu-kvm 3) Add the TID of that kvm-pit process to the cgroup of the vcpus (cgroupManager.AttachTID API). 4) Assign the same cpumask as assigned to VCPU0 to kvm-pit process (can use sched_getaffinity/sched_setaffinity). Marcello, assigned this BZ to you as you did the work. Thanks for your help! Issue fixed upstream: https://github.com/kubevirt/kubevirt/pull/9613 Backport for this problem under way: https://bugzilla.redhat.com/show_bug.cgi?id=2208674 verify with build: CNV-v4.14.0.rhel9-1553
step:
1. enable static cpu manager policy
$ oc label node c01-zpeng-414-9xqx2-worker-0-9lgvv cpumanager=true
$ oc edit machineconfigpools.machineconfiguration.openshift.io worker
add:
...
labels:
custom-kubelet: cpumanager-enabled
...
edit KubeletConfig for CPU Manager
$ oc edit kubeletconfigs.machineconfiguration.openshift.io
....
name: cpumanager-enabled
resourceVersion: "157576"
uid: 4b8a8bef-af63-4cf5-be8e-16e9cc96dd2d
...
spec:
kubeletConfig:
cpuManagerPolicy: static
cpuManagerReconcilePeriod: 5s
machineConfigPoolSelector:
matchLabels:
custom-kubelet: cpumanager-enabled
pools.operator.machineconfiguration.openshift.io/worker: ""
...
Check the worker for the updated kubelet.conf:
$ oc debug node/c01-zpeng-414-9xqx2-worker-0-9lgvv
sh-4.4# cat /etc/kubernetes/kubelet.conf | grep cpuManager
"cpuManagerPolicy": "static",
"cpuManagerReconcilePeriod": "5s",
2: create a vm have dedicatedcpu: true flag,
requests/limits with integer values and equal.
....
domain:
clock:
timer:
pit:
tickPolicy: delay
cpu:
cores: 1
dedicatedCpuPlacement: true
isolateEmulatorThread: true
model: host-passthrough
sockets: 2
threads: 1
......
resources:
limits:
memory: 2Gi
requests:
memory: 2Gi
3: before start vm, enable trace in host
sh-5.1#echo 140000 > /sys/kernel/debug/tracing/buffer_size_kb
enable sched_switch, sched_waking tracepoint:
sh-5.1#echo sched_switch > /sys/kernel/debug/tracing/set_event
sh-5.1#echo sched_waking > /sys/kernel/debug/tracing/set_event
start collecting the trace and saving it to a file:
cat /sys/kernel/debug/tracing/trace_pipe > /root/saved-trace.txt
4: start vm, check trace file:
.....
kvm-pit/77-1016715 [001] d.h3. 11772.319302: sched_waking: comm=CPU 0/KVM pid=1016711 prio=120 target_cpu=001
....
CPU 0/KVM-1016711 [001] d.h3. 11774.406453: sched_waking: comm=kvm-pit/77 pid=1016715 prio=120 target_cpu=001
....
sh-5.1# cat /proc/1016715/status | grep Cpus_allowed
Cpus_allowed: 02
Cpus_allowed_list: 1
move to verified.
|
Description of problem: For using guests which require timely injection of PIT interrupts, kvm-pit process should be pinned to same CPU as vcpu0. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. start guest which uses PIT timer (RHEL-5 with clock=pit for example). 2. measure rate of interrupt injection with tracing. 3. Actual results: kvm-pit running on pCPU-0 might not execute, due to additional load on CPU-0, for longer periods of time. <idle>-0 [000] d.h. 102470.847586: hrtimer_expire_entry: hrtimer=00000000e3dea339 function=pit_timer_fn [kvm] now=102463903132805 <idle>-0 [000] d.h. 102470.847587: sched_waking: comm=kvm-pit/67 pid=1212317 prio=120 target_cpu=001 <idle>-0 [001] d... 102470.847590: sched_switch: prev_comm=swapper/1 prev_pid=0 prev_prio=120 prev_state=S ==> next_comm=kvm-pit/67 next_pid=1212317 next_prio=120 kvm-pit/67-1212317 [001] d... 102470.847594: sched_switch: prev_comm=kvm-pit/67 prev_pid=1212317 prev_prio=120 prev_state=D ==> next_comm=swapper/1 next_pid=0 next_prio=120 CPU 0/KVM-1212312 [006] d... 102470.847596: kvm_entry: vcpu 0, rip 0x35fd CPU 0/KVM-1212312 [006] d... 102470.847597: kvm_exit: vcpu 0 reason EOI_INDUCED rip 0x1bab info1 0x0000000000000022 info2 0x0000000000000000 intr_info 0x00000000 error_code 0x00000000 CPU 0/KVM-1212312 [006] d... 102470.847598: kvm_entry: vcpu 0, rip 0x1bab ... CPU 0/KVM-1212312 [006] d... 102470.847622: kvm_exit: vcpu 0 reason HLT rip 0x35fc info1 0x0000000000000000 info2 0x0000000000000000 intr_info 0x00000000 error_code 0x00000000 <...>-64244 [000] d.h. 102470.857590: hrtimer_expire_entry: hrtimer=00000000e3dea339 function=pit_timer_fn [kvm] now=102463913135899 <...>-64244 [000] d.h. 102470.857599: sched_waking: comm=kvm-pit/67 pid=1212317 prio=120 target_cpu=001 kube-apiserver-64649 [004] d.h. 102470.860004: hrtimer_expire_entry: hrtimer=00000000f5450c52 function=pit_timer_fn [kvm] now=102463915550678 <...>-64245 [003] d.h. 102470.863029: hrtimer_expire_entry: hrtimer=000000004736fe96 function=pit_timer_fn [kvm] now=102463918575518 <...>-64244 [000] d.h. 102470.867587: hrtimer_expire_entry: hrtimer=00000000e3dea339 function=pit_timer_fn [kvm] now=102463923133350 <...>-64244 [000] d.h. 102470.869780: hrtimer_expire_entry: hrtimer=000000005deb41ca function=pit_timer_fn [kvm] now=102463925326017 <...>-64244 [000] d... 102470.877465: sched_switch: prev_comm=kube-apiserver prev_pid=64244 prev_prio=120 prev_state=S ==> next_comm=kvm-pit/67 next_pid=1212317 next_prio=120 kvm-pit/67-1212317 [000] d... 102470.877483: sched_switch: prev_comm=kvm-pit/67 prev_pid=1212317 prev_prio=120 prev_state=D ==> next_comm=cluster-policy- next_pid=60163 next_prio=120 CPU 0/KVM-1212312 [006] d... 102470.877486: kvm_entry: vcpu 0, rip 0x35fd CPU 0/KVM-1212312 [006] d... 102470.877490: kvm_exit: vcpu 0 reason EOI_INDUCED rip 0x1bab info1 0x0000000000000022 info2 0x0000000000000000 intr_info 0x00000000 error_code 0x00000000 CPU 0/KVM-1212312 [006] d... 102470.877491: sched_waking: comm=kvm-pit/67 pid=1212317 prio=120 target_cpu=000 CPU 0/KVM-1212312 [006] d... 102470.877494: kvm_entry: vcpu 0, rip 0x1bab Expected results: PIT timer interrupt injection should not deviate more than 500us from expected (armed) time, when period is 10ms. Additional info: To fix this kubevirt should pin kvm-pit thread, for that VM, to pCPU of vCPU-0.