Bug 2008193 - High load average after DU reboot while pods are coming up
Summary: High load average after DU reboot while pods are coming up
Keywords:
Status: CLOSED DUPLICATE of bug 1975356
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Telco Edge
Version: 4.8
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ---
Assignee: Ken Young
QA Contact: yliu1
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-09-27 14:52 UTC by Marius Cornea
Modified: 2021-10-05 15:03 UTC (History)
0 users

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-10-05 15:03:45 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
reboot_log (9.43 KB, text/plain)
2021-09-27 14:52 UTC, Marius Cornea
no flags Details

Description Marius Cornea 2021-09-27 14:52:41 UTC
Created attachment 1826684 [details]
reboot_log

Description of problem:

After a DU node reboot, the load average spikes to ~130 and it takes around 10 minutes for all the pods to get into Running state.

Version-Release number of selected component (if applicable):

4.8.0-0.nightly-2021-09-18-202713
4.18.0-305.19.1.rt7.91.el8_4.x86_64

How reproducible:
100%

Steps to Reproduce:

1. Reboot a DU node via `sudo systemctl reboot`
2. Wait for the node to reboot
3. Capture info regarding the running pods and uptime output from the DU node:
while true; do oc get pods --no-headers -A | grep -v Running | grep -v Complete | wc -l >> reboot_log; ssh core.lab.eng.rdu2.redhat.com -6  'uptime' >> reboot_log; sleep 1;done

Actual results:

Attached.

It takes around 10 minutes for all pods to get into Running state after the api has become available and the average load spikes to a max of 132.33.
 

Expected results:

No high average load spikes and faster pods recovery time

Additional info:

The node has assigned 2 CPUs for management workload per:
http://registry.kni-qe-0.lab.eng.rdu2.redhat.com:3000/kni-qe/ztp-site-configs/src/kni-qe-1-4.8/siteconfig/kni-qe-1.yaml#L36

The node CPU is Intel(R) Xeon(R) Gold 6212U CPU @ 2.40GHz

Comment 1 Ken Young 2021-10-05 14:57:54 UTC
This is a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1975356

Comment 2 Marius Cornea 2021-10-05 15:03:45 UTC
(In reply to Ken Young from comment #1)
> This is a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1975356

Thanks, closed as a duplicate.

*** This bug has been marked as a duplicate of bug 1975356 ***


Note You need to log in before you can comment on or make changes to this bug.