Bug 2040485

Summary: SNO with PAO and RT-Kernel installed exhibiting high CRI-O utilization even after cleaning up the previously running workload
Product: OpenShift Container Platform Reporter: Noreen <nchhabra>
Component: NodeAssignee: Peter Hunt <pehunt>
Node sub component: CRI-O QA Contact: Sunil Choudhary <schoudha>
Status: CLOSED DUPLICATE Docs Contact:
Severity: medium    
Priority: medium CC: akrzos, aos-bugs, murali, pehunt
Version: 4.10   
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-01-26 19:55:44 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Noreen 2022-01-13 19:57:09 UTC
Description of problem:
CRI-O cpu usage goes to over 300% on running 300 workload pods on a Single Node Openshift (with OpenshiftSDN) with PAO installed and RT-kernel running. After cleaning up the workload, the cri-o cpu usage doesn't stabilize and remains high (observed for over a couple hours).

Cluster version: 4.10.0-0.nightly-2022-01-10-101431

PAO version:
NAME                                DISPLAY                      VERSION   REPLACES   PHASE 
performance-addon-operator.v4.9.4   Performance Addon Operator   4.9.4                Succeeded

RT-Kernel:
[root@nchhabra-baremetal01 logs]# oc get nodes -o wide                                                                                                                                                             
NAME                   STATUS   ROLES           AGE    VERSION           INTERNAL-IP     EXTERNAL-IP   OS-IMAGE                                                        KERNEL-VERSION                         CONTAINER-RUNTIME                                                                                                                                                                                                       
nchhabra-baremetal04   Ready    master,worker   3d2h   v1.22.1+6859754   10.95.147.247   <none>        Red Hat Enterprise Linux CoreOS 410.84.202201100616-0 (Ootpa)   4.18.0-305.30.1.rt7.102.el8_4.x86_64   cri-o
://1.23.0-98.rhaos4.10.git9b7f5ae.el8  

Version-Release number of selected component (if applicable):


How reproducible:
Run about 300 workload pods on SNO with RT-Kernel installed. Certain pods might exhibit "ContainerCreationErrors". Clean up created pods and namespaces. 

Steps to Reproduce:
1. Install PAO on SNO with OpenshiftSDN, allocatable pods set to 1100 
2. Apply performance profile enabling RT-Kernel and wait for node to reboot and mcp to get updated
3. Run about 300 workload pods. Certain pods might exhibit "ContainerCreationErrors". Monitor CRI-O CPU usage.
4. Clean up created pods and namespaces. 
5. Monitor CRI-O cpu usage


Actual results:

CPU utilization to stabilize post cleanup of pods

Expected results:

CPU utilzation over 300% after cleanup 

Additional info:

Comment 2 Noreen 2022-01-13 20:09:58 UTC
Tasks: 1988 total,   4 running, 1984 sleeping,   0 stopped,   0 zombie
%Cpu(s):  6.6 us,  1.2 sy,  0.0 ni, 92.0 id,  0.0 wa,  0.0 hi,  0.2 si,  0.0 st
MiB Mem :  64068.7 total,   4800.9 free,  43162.1 used,  16105.8 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.  20163.3 avail Mem 

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                                                                                                                
 891184 root      20   0   58.6g 486356  19664 S 343.2   0.7 489:02.10 crio                                                                                                                                                                   
 885550 root      20   0 3692956   2.3g  34836 S  89.3   3.6 183:52.54 kube-apiserver                                                                                                                                                         
  24109 root      20   0   10.6g 486808 114536 S  71.8   0.7 163:36.46 etcd                                                                                                                                                                   
 915386 root      20   0   74.5g 727468  10832 S  54.5   1.1 189:54.59 kubelet                                                                                                                                                                
 154357 1000420+  20   0  842240  67504  10452 S  30.2   0.1   6:48.49 operator