Description of problem: CRI-O cpu usage goes to over 300% on running 300 workload pods on a Single Node Openshift (with OpenshiftSDN) with PAO installed and RT-kernel running. After cleaning up the workload, the cri-o cpu usage doesn't stabilize and remains high (observed for over a couple hours). Cluster version: 4.10.0-0.nightly-2022-01-10-101431 PAO version: NAME DISPLAY VERSION REPLACES PHASE performance-addon-operator.v4.9.4 Performance Addon Operator 4.9.4 Succeeded RT-Kernel: [root@nchhabra-baremetal01 logs]# oc get nodes -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME nchhabra-baremetal04 Ready master,worker 3d2h v1.22.1+6859754 10.95.147.247 <none> Red Hat Enterprise Linux CoreOS 410.84.202201100616-0 (Ootpa) 4.18.0-305.30.1.rt7.102.el8_4.x86_64 cri-o ://1.23.0-98.rhaos4.10.git9b7f5ae.el8 Version-Release number of selected component (if applicable): How reproducible: Run about 300 workload pods on SNO with RT-Kernel installed. Certain pods might exhibit "ContainerCreationErrors". Clean up created pods and namespaces. Steps to Reproduce: 1. Install PAO on SNO with OpenshiftSDN, allocatable pods set to 1100 2. Apply performance profile enabling RT-Kernel and wait for node to reboot and mcp to get updated 3. Run about 300 workload pods. Certain pods might exhibit "ContainerCreationErrors". Monitor CRI-O CPU usage. 4. Clean up created pods and namespaces. 5. Monitor CRI-O cpu usage Actual results: CPU utilization to stabilize post cleanup of pods Expected results: CPU utilzation over 300% after cleanup Additional info:
snapshots of cri-o usage: https://snapshot.raintank.io/dashboard/snapshot/Qod2z6nbDcA18vYwNCBUPAhc3NeDIB4e 3 hour period: https://snapshot.raintank.io/dashboard/snapshot/OZH2lHoLdfe7wJMdY4wwGlo4o5OdMZt5
Tasks: 1988 total, 4 running, 1984 sleeping, 0 stopped, 0 zombie %Cpu(s): 6.6 us, 1.2 sy, 0.0 ni, 92.0 id, 0.0 wa, 0.0 hi, 0.2 si, 0.0 st MiB Mem : 64068.7 total, 4800.9 free, 43162.1 used, 16105.8 buff/cache MiB Swap: 0.0 total, 0.0 free, 0.0 used. 20163.3 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 891184 root 20 0 58.6g 486356 19664 S 343.2 0.7 489:02.10 crio 885550 root 20 0 3692956 2.3g 34836 S 89.3 3.6 183:52.54 kube-apiserver 24109 root 20 0 10.6g 486808 114536 S 71.8 0.7 163:36.46 etcd 915386 root 20 0 74.5g 727468 10832 S 54.5 1.1 189:54.59 kubelet 154357 1000420+ 20 0 842240 67504 10452 S 30.2 0.1 6:48.49 operator