2008193 – High load average after DU reboot while pods are coming up

Bug 2008193 - High load average after DU reboot while pods are coming up

Summary: High load average after DU reboot while pods are coming up

Keywords:
Status:	CLOSED DUPLICATE of bug 1975356
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Telco Edge
Sub Component:
Version:	4.8
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Ken Young
QA Contact:	yliu1
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-09-27 14:52 UTC by Marius Cornea
Modified:	2021-10-05 15:03 UTC (History)
CC List:	0 users
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-10-05 15:03:45 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
reboot_log (9.43 KB, text/plain) 2021-09-27 14:52 UTC, Marius Cornea	no flags	Details
View All

Description Marius Cornea 2021-09-27 14:52:41 UTC

Created attachment 1826684 [details]
reboot_log

Description of problem:

After a DU node reboot, the load average spikes to ~130 and it takes around 10 minutes for all the pods to get into Running state.

Version-Release number of selected component (if applicable):

4.8.0-0.nightly-2021-09-18-202713
4.18.0-305.19.1.rt7.91.el8_4.x86_64

How reproducible:
100%

Steps to Reproduce:

1. Reboot a DU node via `sudo systemctl reboot`
2. Wait for the node to reboot
3. Capture info regarding the running pods and uptime output from the DU node:
while true; do oc get pods --no-headers -A | grep -v Running | grep -v Complete | wc -l >> reboot_log; ssh core.lab.eng.rdu2.redhat.com -6  'uptime' >> reboot_log; sleep 1;done

Actual results:

Attached.

It takes around 10 minutes for all pods to get into Running state after the api has become available and the average load spikes to a max of 132.33.
 

Expected results:

No high average load spikes and faster pods recovery time

Additional info:

The node has assigned 2 CPUs for management workload per:
http://registry.kni-qe-0.lab.eng.rdu2.redhat.com:3000/kni-qe/ztp-site-configs/src/kni-qe-1-4.8/siteconfig/kni-qe-1.yaml#L36

The node CPU is Intel(R) Xeon(R) Gold 6212U CPU @ 2.40GHz

Comment 1 Ken Young 2021-10-05 14:57:54 UTC

This is a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1975356

Comment 2 Marius Cornea 2021-10-05 15:03:45 UTC

(In reply to Ken Young from comment #1)
> This is a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1975356

Thanks, closed as a duplicate.

*** This bug has been marked as a duplicate of bug 1975356 ***

Note You need to log in before you can comment on or make changes to this bug.