The issue doesn't happen when CFS throttling (quota) is disabled or on periodic (nohz=off) kernel. Probes fails because of exhausted quota. The quota is exhausted due to delayed quota refill. The refill is delayed because hrtimers are delayed. hrtimers are delayed because of missed interrupts. Interrupts at missed because of nohz mode. Steps to Reproduce: 1. install SNO 2. install https://gitlab.cee.redhat.com/cshulyup/vdu-workload-emulator/-/tree/base?ref_type=heads by running script ./add_test-deployments.sh 3. monitor pods restarts with command `oc get pods -n test | grep ago` for 10 (ten) days. Actual results: - some pods are restarted after several days Expected results: - no single pod restart during 10 days (240 hours) The fix: - patch "softirq: Wake ktimers thread also in softirq", already merged to other branches: - https://gitlab.com/redhat/rhel/src/kernel/rhel-9/-/merge_requests/695 - https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/2543/ Additional info: - https://issues.redhat.com/browse/OCPBUGS-3547 - https://docs.google.com/document/d/14aYlleo7EaYAjY6QM45AUQ9bw6FUlL20zIkqoyx4PiY/edit#heading=h.jwttsod63yqc
*** Bug 2128603 has been marked as a duplicate of this bug. ***
*** Bug 2210126 has been marked as a duplicate of this bug. ***
Hello Guys, Looks the fix only lands kernel-rt, any chance to make it to rhel8.9(kernel)? Because we have Bug 2128603 which can be fixed by this change. Thanks,