Bug 1565048
Summary: | failedJobsHistoryLimit field does not work as expected in a cron job | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Fatima <fshaikh> |
Component: | Master | Assignee: | Maciej Szulik <maszulik> |
Status: | CLOSED NOTABUG | QA Contact: | Wang Haoran <haowang> |
Severity: | medium | Docs Contact: | |
Priority: | unspecified | ||
Version: | 3.9.0 | CC: | aos-bugs, byount, jokerman, maszulik, mfojtik, mmccomas, sgaikwad |
Target Milestone: | --- | Keywords: | Reopened |
Target Release: | 3.9.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | No Doc Update | |
Doc Text: |
undefined
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2018-05-07 13:45:08 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Fatima
2018-04-09 08:59:25 UTC
This is working as expected. When you set the restartPolicy to OnFailure the kubelet is responsible for restarting the pod. Iow. it will retry the pod about 6 times, each with a longer delay (10s, 20s, etc.), see [1] for details. This means that it will longer time for the pod to actually fail (the controller does not treat CrashLoopBackOff as a failure). But once it reaches the failed state (RunContainerError) the controller ensures there are no more than 2 failed jobs. When you set the restart policy to Never then you can tweak the backoffLimit parameter, there is more details about the topic in [2]. [1] https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#restart-policy [2] https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/#handling-pod-and-container-failures Here's a sample failing cronjob: apiVersion: batch/v1beta1 kind: CronJob metadata: name: hello spec: failedJobsHistoryLimit: 1 schedule: "*/1 * * * *" jobTemplate: spec: backoffLimit: 1 template: metadata: name: hello labels: job: test spec: containers: - name: hello image: busybox command: ["/bin/sh", "-c", "exit 1"] restartPolicy: Never Notice two elements: - backoffLimit - which will allow the job to fail fast - the command that results in container to fail |