Description of problem: The field failedJobsHistoryLimit does not work as expected. The pods created by the cronjob show ContainerCannotRun Version-Release number of selected component (if applicable): How reproducible: 100% Steps to Reproduce: 1. Create cron job vi cronjob.yaml apiVersion: batch/v1beta1 kind: CronJob metadata: name: cron-job-test spec: schedule: "*/1 * * * *" successfulJobsHistoryLimit: 4 failedJobsHistoryLimit: 2 jobTemplate: spec: template: spec: containers: - name: cron-job-test image: perl command: ["perl5", "-Mbignum=bpi", "-wle", "print bpi(2000)"] restartPolicy: OnFailure ---> Here we are using perl5 (which is wrong), so that atleast 2 failed jobs are retained. 2. oc create -f cronjob.yaml 3. oc get jobs -w Actual results: A number of failed jobs get populated instead of maintaining 2(as mentioned in the yaml definition). Expected results: 2 failed jobs should be maintained. Additional info: The customer is testing this feature and came across this.
This is working as expected. When you set the restartPolicy to OnFailure the kubelet is responsible for restarting the pod. Iow. it will retry the pod about 6 times, each with a longer delay (10s, 20s, etc.), see [1] for details. This means that it will longer time for the pod to actually fail (the controller does not treat CrashLoopBackOff as a failure). But once it reaches the failed state (RunContainerError) the controller ensures there are no more than 2 failed jobs. When you set the restart policy to Never then you can tweak the backoffLimit parameter, there is more details about the topic in [2]. [1] https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#restart-policy [2] https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/#handling-pod-and-container-failures
Here's a sample failing cronjob: apiVersion: batch/v1beta1 kind: CronJob metadata: name: hello spec: failedJobsHistoryLimit: 1 schedule: "*/1 * * * *" jobTemplate: spec: backoffLimit: 1 template: metadata: name: hello labels: job: test spec: containers: - name: hello image: busybox command: ["/bin/sh", "-c", "exit 1"] restartPolicy: Never Notice two elements: - backoffLimit - which will allow the job to fail fast - the command that results in container to fail