Description of problem: - When multiple CronJobs fails, they bring down worker nodes 1 by 1 in the entire cluster until all the nodes are down & cluster turns unstable. Version-Release number of selected component (if applicable): - Tested the behavior in 4.8.24 - Customer also suggested behavior occurs in ROSA 4.8.13 (but I've not tested it). How reproducible: 100% Steps to Reproduce: 1. The cluster has 3 worker nodes. 2. Create some CronJobs using the attached CronJob file. Once they start failing, we can see that within an hour, 1st node goes down & then in 2 more hours we see the entire cluster is down. 3. The failed jobs somehow don't release the RAM until the node crashes. Actual results: - Nodes enter NotReady state one by one & eventually the whole cluster goes down completely. Expected results: - Failed cronjob pods should be terminated & memory should be released. Additional info:
The patch has been merged this night into 4.8. Marking this one as duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=2042175 *** This bug has been marked as a duplicate of bug 2042175 ***