Created attachment 1826684 [details] reboot_log Description of problem: After a DU node reboot, the load average spikes to ~130 and it takes around 10 minutes for all the pods to get into Running state. Version-Release number of selected component (if applicable): 4.8.0-0.nightly-2021-09-18-202713 4.18.0-305.19.1.rt7.91.el8_4.x86_64 How reproducible: 100% Steps to Reproduce: 1. Reboot a DU node via `sudo systemctl reboot` 2. Wait for the node to reboot 3. Capture info regarding the running pods and uptime output from the DU node: while true; do oc get pods --no-headers -A | grep -v Running | grep -v Complete | wc -l >> reboot_log; ssh core.lab.eng.rdu2.redhat.com -6 'uptime' >> reboot_log; sleep 1;done Actual results: Attached. It takes around 10 minutes for all pods to get into Running state after the api has become available and the average load spikes to a max of 132.33. Expected results: No high average load spikes and faster pods recovery time Additional info: The node has assigned 2 CPUs for management workload per: http://registry.kni-qe-0.lab.eng.rdu2.redhat.com:3000/kni-qe/ztp-site-configs/src/kni-qe-1-4.8/siteconfig/kni-qe-1.yaml#L36 The node CPU is Intel(R) Xeon(R) Gold 6212U CPU @ 2.40GHz
This is a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1975356
(In reply to Ken Young from comment #1) > This is a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1975356 Thanks, closed as a duplicate. *** This bug has been marked as a duplicate of bug 1975356 ***