Bug 2008193

Summary: High load average after DU reboot while pods are coming up
Product: OpenShift Container Platform Reporter: Marius Cornea <mcornea>
Component: Telco EdgeAssignee: Ken Young <keyoung>
Telco Edge sub component: RAN QA Contact: yliu1
Status: CLOSED DUPLICATE Docs Contact:
Severity: high    
Priority: unspecified    
Version: 4.8   
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-10-05 15:03:45 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
reboot_log none

Description Marius Cornea 2021-09-27 14:52:41 UTC
Created attachment 1826684 [details]
reboot_log

Description of problem:

After a DU node reboot, the load average spikes to ~130 and it takes around 10 minutes for all the pods to get into Running state.

Version-Release number of selected component (if applicable):

4.8.0-0.nightly-2021-09-18-202713
4.18.0-305.19.1.rt7.91.el8_4.x86_64

How reproducible:
100%

Steps to Reproduce:

1. Reboot a DU node via `sudo systemctl reboot`
2. Wait for the node to reboot
3. Capture info regarding the running pods and uptime output from the DU node:
while true; do oc get pods --no-headers -A | grep -v Running | grep -v Complete | wc -l >> reboot_log; ssh core.lab.eng.rdu2.redhat.com -6  'uptime' >> reboot_log; sleep 1;done

Actual results:

Attached.

It takes around 10 minutes for all pods to get into Running state after the api has become available and the average load spikes to a max of 132.33.
 

Expected results:

No high average load spikes and faster pods recovery time

Additional info:

The node has assigned 2 CPUs for management workload per:
http://registry.kni-qe-0.lab.eng.rdu2.redhat.com:3000/kni-qe/ztp-site-configs/src/kni-qe-1-4.8/siteconfig/kni-qe-1.yaml#L36

The node CPU is Intel(R) Xeon(R) Gold 6212U CPU @ 2.40GHz

Comment 1 Ken Young 2021-10-05 14:57:54 UTC
This is a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1975356

Comment 2 Marius Cornea 2021-10-05 15:03:45 UTC
(In reply to Ken Young from comment #1)
> This is a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1975356

Thanks, closed as a duplicate.

*** This bug has been marked as a duplicate of bug 1975356 ***