Bug 1702543
Summary: | Job Controller does not work correctly as backoffLimit configured | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Daein Park <dapark> |
Component: | Master | Assignee: | Maciej Szulik <maszulik> |
Status: | CLOSED ERRATA | QA Contact: | zhou ying <yinzhou> |
Severity: | medium | Docs Contact: | |
Priority: | unspecified | ||
Version: | 3.11.0 | CC: | aos-bugs, jokerman, maszulik, mmccomas, yinzhou |
Target Milestone: | --- | ||
Target Release: | 4.1.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | No Doc Update | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2019-06-04 10:47:56 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Daein Park
2019-04-24 05:32:00 UTC
There's https://github.com/kubernetes/kubernetes/pull/67859 which should solve the issue, but that will be available in 4.1, since it's part of k8s 1.13. Based on that I'm setting target release 4.1 and moving to qa. I still can reproduce the issue with ocp version: [root@dhcp-140-138 ~]# oc version Client Version: version.Info{Major:"4", Minor:"1+", GitVersion:"v4.1.0", GitCommit:"74c534b60", GitTreeState:"", BuildDate:"2019-04-21T21:13:18Z", GoVersion:"", Compiler:"", Platform:""} Server Version: version.Info{Major:"1", Minor:"13+", GitVersion:"v1.13.4+81fc896", GitCommit:"81fc896", GitTreeState:"clean", BuildDate:"2019-04-21T23:18:54Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"} Payload: 4.1.0-0.nightly-2019-04-22-192604 [root@dhcp-140-138 ~]# oc get job test-job -o yaml apiVersion: batch/v1 kind: Job metadata: creationTimestamp: "2019-04-25T01:11:28Z" labels: controller-uid: 0d819ac4-66f7-11e9-ad9c-068b1b1786bc job-name: test-job name: test-job namespace: test6 resourceVersion: "789545" selfLink: /apis/batch/v1/namespaces/test6/jobs/test-job uid: 0d819ac4-66f7-11e9-ad9c-068b1b1786bc spec: backoffLimit: 6 completions: 1 parallelism: 1 selector: matchLabels: controller-uid: 0d819ac4-66f7-11e9-ad9c-068b1b1786bc template: metadata: creationTimestamp: null labels: controller-uid: 0d819ac4-66f7-11e9-ad9c-068b1b1786bc job-name: test-job spec: containers: - command: - perl - -e - sleep 10; print "working\n"; exit 89 image: perl imagePullPolicy: Always name: test-job resources: {} terminationMessagePath: /dev/termination-log terminationMessagePolicy: File dnsPolicy: ClusterFirst restartPolicy: Never schedulerName: default-scheduler securityContext: {} terminationGracePeriodSeconds: 30 status: conditions: - lastProbeTime: "2019-04-25T01:15:51Z" lastTransitionTime: "2019-04-25T01:15:51Z" message: Job has reached the specified backoff limit reason: BackoffLimitExceeded status: "True" type: Failed failed: 5 startTime: "2019-04-25T01:11:28Z" The mechanism underneath is not a strong guarantee, but rather a best effort, so it may happen that sometimes the fail number might not reach the desired. Like mentioned in https://github.com/kubernetes/kubernetes/issues/64787 and https://github.com/kubernetes/kubernetes/issues/70251 From my tests I reached the number every time and never had issues, but that's just author's luck. Please retest and report failure rate and accept it if it's ~80%. Retested and the failure rate less than 20%: The first time, create 7 jobs, only 1 failed; The second time, create 7 jobs, all succeed. So will accept it. [zhouying@dhcp-140-138 ~]$ oc version Client Version: version.Info{Major:"4", Minor:"1+", GitVersion:"v4.1.0", GitCommit:"44e89e525", GitTreeState:"clean", BuildDate:"2019-04-25T22:42:17Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"13+", GitVersion:"v1.13.4+5c41ab6", GitCommit:"5c41ab6", GitTreeState:"clean", BuildDate:"2019-04-24T20:42:36Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"} Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0758 |