Bug 2006953 - Long reboot recovery time for DU node with RT kernel with large number of pods
Summary: Long reboot recovery time for DU node with RT kernel with large number of pods
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Telco Edge
Version: 4.9
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: ---
Assignee: Ian Miller
QA Contact: yliu1
URL:
Whiteboard:
Depends On:
Blocks: 2008604
TreeView+ depends on / blocked
 
Reported: 2021-09-22 17:48 UTC by Ian Miller
Modified: 2022-08-26 14:31 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 2008604 (view as bug list)
Environment:
Last Closed: 2022-08-26 14:31:50 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Ian Miller 2021-09-22 17:48:52 UTC
Reboot recovery time of a DU node with the RT kernel is much longer than without RT kernel. The root cause is BZ 1975356. The DU node profile needs to enable the fix by adding rcupdate.rcu_normal_after_boot=0 to the kernel commandline.

Comment 1 yliu1 2021-09-22 17:59:59 UTC
Soft reboot test on 4.9 load with the additional kernel arg took about 14 minutes with 43 test pods. 
If this kernel arg was not added, it could take more than half an hour to recovery.


Note You need to log in before you can comment on or make changes to this bug.