Hide Forgot
+++ This bug was initially created as a clone of Bug #1689061 +++ Builds should get a failure reason for evicted pods and not be GenericBuildFailure. On 3.11 api.ci we see evictions frequently and they are hard to debug. Builds should report a reason for eviction. Request backport to origin 3.11 so we can get this on api.ci. --- apiVersion: build.openshift.io/v1 kind: Build metadata: annotations: ci.openshift.io/job-spec: '{"type":"postsubmit","job":"branch-ci-openshift-release-controller-master-images","buildid":"53","prowjobid":"bc04d182-46d5-11e9-b760-0a58ac10b13f","refs":{"org":"openshift","repo":"release-controller","base_ref":"master","base_sha":"253573b4cccb254de4bdd621499bdc30c2769c29","base_link":"https://github.com/openshift/release-controller/compare/4375b4ac2e8d...253573b4cccb"}}' openshift.io/build.pod-name: release-controller-build creationTimestamp: 2019-03-15T03:53:29Z labels: build-id: "53" created-by-ci: "true" creates: release-controller job: branch-ci-openshift-release-controller-master-images persists-between-builds: "false" prow.k8s.io/id: bc04d182-46d5-11e9-b760-0a58ac10b13f name: release-controller namespace: ci-op-7g4vf063 ownerReferences: - apiVersion: image.openshift.io/v1 controller: true kind: ImageStream name: pipeline uid: d435bb30-46d5-11e9-9b95-42010a8e0003 resourceVersion: "90888834" selfLink: /apis/build.openshift.io/v1/namespaces/ci-op-7g4vf063/builds/release-controller uid: e5043c52-46d5-11e9-9b95-42010a8e0003 spec: nodeSelector: null output: imageLabels: - name: vcs-type value: git - name: vcs-url value: https://github.com/openshift/release-controller - name: io.openshift.build.name - name: io.openshift.build.namespace - name: io.openshift.build.commit.ref value: master - name: io.openshift.build.source-location value: https://github.com/openshift/release-controller - name: vcs-ref value: 253573b4cccb254de4bdd621499bdc30c2769c29 - name: io.openshift.build.commit.id value: 253573b4cccb254de4bdd621499bdc30c2769c29 - name: io.openshift.build.commit.message - name: io.openshift.build.commit.author - name: io.openshift.build.commit.date - name: io.openshift.build.source-context-dir pushSecret: name: builder-dockercfg-k4g2k to: kind: ImageStreamTag name: pipeline:release-controller namespace: ci-op-7g4vf063 postCommit: {} resources: limits: memory: 6Gi requests: cpu: 100m memory: 200Mi serviceAccount: builder source: images: - as: - "0" from: kind: ImageStreamTag name: pipeline:root paths: null - as: null from: kind: ImageStreamTag name: pipeline:src paths: - destinationDir: . sourcePath: /go/src/github.com/openshift/release-controller///. type: Image strategy: dockerStrategy: forcePull: true from: kind: ImageStreamTag name: pipeline:os namespace: ci-op-7g4vf063 imageOptimizationPolicy: SkipLayers noCache: true type: Docker triggeredBy: null status: completionTimestamp: 2019-03-15T03:53:53Z message: Generic Build failure - check logs for details. output: {} outputDockerImageReference: docker-registry.default.svc:5000/ci-op-7g4vf063/pipeline:release-controller phase: Failed reason: GenericBuildFailed startTimestamp: 2019-03-15T03:53:53Z --- status: message: 'Pod The node was low on resource: [DiskPressure]. ' phase: Failed reason: Evicted startTime: 2019-03-15T03:53:53Z --- Additional comment from Clayton Coleman on 2019-03-15 04:17:58 UTC --- Also note that's *ALL* the status the pod has, so that may be causing other failures in the build controller.
API PR: https://github.com/openshift/api/pull/256
Origin PR: https://github.com/openshift/origin/pull/22346
Let me give it a shot tomorrow.
$ git tag --contains 29cde93 [origin]$ git log --oneline 29cde93..HEAD 9b1e77773a (HEAD -> release-3.11, origin/release-3.11) Merge pull request #22443 from danwinship/sync-inuse-vnids-on-restart-3.11 c137ed0d25 Merge pull request #22397 from jcantrill/1676720 6f59b4eb4c Fix reinitialization of NetworkPolicy state on restart a2aa67a169 Initialize NetworkPolicy which-namespaces-are-in-use properly on restart a8f6aec707 Clean up NetworkPolicies on NetNamespace deletion 03b5b9e76a bug 1676720. Check clusterlogging curator for cronjob instead of DC No 3.11 puddle contains the fix yet.
Sorry my bad ... checking ose repo now
[hongkliu@MiWiFi-R1CM-srv ose]$ git tag --contains 29cde93 v3.11.104-1 v3.11.105-1
Still saw `GenericBuildFailed` Every 6.0s: oc get build -n testproject Fri Apr 12 16:01:47 2019 NAME TYPE FROM STATUS STARTED DURATION django-ex-7 Source Git@0905223 Complete About an hour ago 1m16s django-ex-8 Source Git@0905223 Complete About an hour ago 1m12s django-ex-9 Source Git@0905223 Complete 44 minutes ago 1m38s django-ex-10 Source Git@0905223 Failed (GenericBuildFailed) 41 minutes ago 2m6s django-ex-12 Source Git@0905223 Complete 31 minutes ago 1m16s django-ex-14 Source Git Failed (GenericBuildFailed) 22 minutes ago 40s django-ex-15 Source Git@0905223 Complete 19 minutes ago 1m1s django-ex-16 Source Git@0905223 Failed (GenericBuildFailed) 18 minutes ago 53s
Only django-ex-10 and django-ex-16 are relevant to disk pressure. django-ex-14 is something else.
Not all evictions are reported to the pod (which is what the build controller uses). When reproducing eviction related issues, always include the pod yaml of the build pod.
Sorry ... did not know the requirement of pod yaml. A. If it is for the pod definition, then the build is trigger by the bc created by `oc new-app centos/python-35-centos7~https://github.com/sclorg/django-ex`. B. If it is for the pod status, then I have to redo the test. @Clayton, Let me know if it is Case B above. Thanks.
@Hongkai we need case B - fetch the status of the pod. Can you please re-run the test and report your findings?
Sure. I will rerun it tomorrow.
Use the latest for the moment: # yum list installed | grep openshift atomic-openshift.x86_64 3.11.109-1.git.0.8f0b752.el7 Every 3.0s: oc get build -n testproject Wed Apr 24 14:47:36 2019 NAME TYPE FROM STATUS STARTED DURATION django-ex-5 Source Git@0905223 Complete 44 minutes ago 1m1s django-ex-6 Source Git@0905223 Complete 43 minutes ago 1m3s django-ex-7 Source Git@0905223 Complete 37 minutes ago 4m39s django-ex-8 Source Git@0905223 Complete 32 minutes ago 1m22s django-ex-9 Source Git@0905223 Failed 24 minutes ago 2m10s django-ex-10 Source Git@0905223 Complete 17 minutes ago 1m0s django-ex-11 Source Git Failed (BuildPodEvicted) 14 minutes ago 21s http://file.rdu.redhat.com/~hongkliu/test_result/bz1690066/20190424/Screenshot%20from%202019-04-24%2010-44-00.png This is different from the result in Comment 9. I think it is what we expect for the fix. Only one NIP: Status of builds: django-ex-9: Failed django-ex-11: Failed (BuildPodEvicted) For a moment, I saw `(BuildPodEvicted)` for ex-9, but it vanished quickly after. For ex-11, `(BuildPodEvicted)` is stable. From what I did, they of them failed up to the same issue - `low disk space`. pod status files: http://file.rdu.redhat.com/~hongkliu/test_result/bz1690066/20190424/ I think the important thing for this bug is we should not see `GenericBuildFailed` as build status which IMO has been achieved. Please reopen if i missed the point.
http://file.rdu.redhat.com/~hongkliu/test_result/bz1690066/20190424/Screenshot%20from%202019-04-24%2011-10-44.png Tested more, this unstable `(BuildPodEvicted)` like ex-9 did not show.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:1605