Bug 1596440

Summary: OOMKilled build pod should surface that status on build object
Product: OpenShift Container Platform Reporter: Clayton Coleman <ccoleman>
Component: BuildAssignee: Ben Parees <bparees>
Status: CLOSED ERRATA QA Contact: wewang <wewang>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 3.10.0CC: aos-bugs, erezende, wewang
Target Milestone: ---   
Target Release: 3.11.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Enhancement
Doc Text:
Feature: information about OOMKilled build pod gets propagated to a build object Reason: it simplifies debugging and discovering what went awry if appropriate failure reasons are described to the user Result: a build controller populates correctly the status reason and message when build pod is OOMKilled
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-10-11 07:20:43 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Clayton Coleman 2018-06-29 01:35:37 UTC
OOMKill is common, we get a reason on the pod status when it happens.  The build should surface that up as an explicit error type and message on the build object (rather than generic error).  Expected to see build failed for reason "out of memory".  We should also double check eviction and report that.  

Would be good to have for 3.11


Pod:
---

Name:           rpms-build
Namespace:      ci-op-hgb560gs
Node:           origin-ci-ig-n-s1lt/10.142.0.5
Start Time:     Thu, 28 Jun 2018 21:08:01 -0400
Labels:         openshift.io/build.name=rpms
Annotations:    openshift.io/build.name=rpms
                openshift.io/scc=privileged
Status:         Failed
IP:             172.16.42.16
Controlled By:  Build/rpms
Init Containers:
  manage-dockerfile:
    Container ID:  docker://6e7f97abeb85173302896d1075b23c5bd77a3a2b0faabe4aff565fa82a335c3c
    Image:         docker.io/openshift/origin-docker-builder:v3.10.0
    Image ID:      docker-pullable://docker.io/openshift/origin-docker-builder@sha256:a734a4d3394cbf9c0b0141c404c341c1c8ca0622d0a5997494d4e8763780d86d
    Port:          <none>
    Host Port:     <none>
    Command:
      openshift-manage-dockerfile
    Args:
      --loglevel=0
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Thu, 28 Jun 2018 21:08:03 -0400
      Finished:     Thu, 28 Jun 2018 21:08:04 -0400
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     6
      memory:  6Gi
    Requests:
      cpu:     2
      memory:  4Gi
    Environment:
      BUILD:  {"kind":"Build","apiVersion":"v1","metadata":{"name":"rpms","namespace":"ci-op-hgb560gs","selfLink":"/apis/build.openshift.io/v1/namespaces/ci-op-hgb560gs/builds/rpms","uid":"de7f0b50-7b38-11e8-bfdd-42010a8e0004","resourceVersion":"24457848","creationTimestamp":"2018-06-29T01:08:01Z","labels":{"build-id":"","created-by-ci":"true","creates":"rpms","job":"dev","persists-between-builds":"false"},"annotations":{"ci.openshift.io/job-spec":""},"ownerReferences":[{"apiVersion":"image.openshift.io/v1","kind":"ImageStream","name":"pipeline","uid":"2e1ad106-7b37-11e8-bfdd-42010a8e0004","controller":true}]},"spec":{"serviceAccount":"builder","source":{"type":"Dockerfile","dockerfile":"FROM pipeline:bin\nRUN [\"/bin/bash\", \"-c\", \"set -o errexit; umask 0002; make build-rpms; ln -s $( pwd )/_output/local/releases/rpms/ /srv/repo\"]"},"strategy":{"type":"Docker","dockerStrategy":{"from":{"kind":"DockerImage","name":"docker-registry.default.svc:5000/ci-op-hgb560gs/pipeline@sha256:b6728769bf92f153d5ea85d5c3c39a331774aed3b85b4bee5e29f688c57e0b2d"},"pullSecret":{"name":"builder-dockercfg-s45gl"},"noCache":true,"forcePull":true,"imageOptimizationPolicy":"SkipLayers"}},"output":{"to":{"kind":"DockerImage","name":"docker-registry.default.svc:5000/ci-op-hgb560gs/pipeline:rpms"},"pushSecret":{"name":"builder-dockercfg-s45gl"}},"resources":{"limits":{"cpu":"6","memory":"6Gi"},"requests":{"cpu":"2","memory":"4Gi"}},"postCommit":{},"nodeSelector":null,"triggeredBy":null},"status":{"phase":"New","outputDockerImageReference":"docker-registry.default.svc:5000/ci-op-hgb560gs/pipeline:rpms","output":{}}}

    Mounts:
      /tmp/build from buildworkdir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from builder-token-dk2hc (ro)
Containers:
  docker-build:
    Container ID:  docker://1fa83966ddf3eb1680eb399d45f37f216845da199c350f70297d828a7350c062
    Image:         docker.io/openshift/origin-docker-builder:v3.10.0
    Image ID:      docker-pullable://docker.io/openshift/origin-docker-builder@sha256:a734a4d3394cbf9c0b0141c404c341c1c8ca0622d0a5997494d4e8763780d86d
    Port:          <none>
    Host Port:     <none>
    Command:
      openshift-docker-build
    Args:
      --loglevel=0
    State:      Terminated
      Reason:   OOMKilled
      Message:
Pulling image docker-registry.default.svc:5000/ci-op-hgb560gs/pipeline@sha256:b6728769bf92f153d5ea85d5c3c39a331774aed3b85b4bee5e29f688c57e0b2d ...
--> FROM docker-registry.default.svc:5000/ci-op-hgb560gs/pipeline@sha256:b6728769bf92f153d5ea85d5c3c39a331774aed3b85b4bee5e29f688c57e0b2d as 0
--> RUN ["/bin/bash","-c","set -o errexit; umask 0002; make build-rpms; ln -s $( pwd )/_output/local/releases/rpms/ /srv/repo"]
OS_ONLY_BUILD_PLATFORMS='linux/amd64' hack/build-rpms.sh
[INFO] [01:08:07+0000] Building release RPMs for /go/src/github.com/openshift/origin/origin.spec ...
[WARNING] [01:08:07+0000] Repository is not clean, performing fast build and reusing _output
Executing(%prep): /bin/sh -e /var/tmp/rpm-tmp.AVu6PV
+ umask 022
+ cd /tmp/openshift/build-rpms/rpm/BUILD
+ exit 0
Executing(%build): /bin/sh -e /var/tmp/rpm-tmp.lETqkE
+ umask 022
+ cd /tmp/openshift/build-rpms/rpm/BUILD
+ BUILD_PLATFORM=linux/amd64
+ OS_ONLY_BUILD_PLATFORMS=linux/amd64
+ OS_GIT_COMMIT=8157540
+ OS_GIT_TREE_STATE=dirty
+ OS_GIT_VERSION=v3.11.0-alpha.0+8157540-147-dirty
+ OS_GIT_MAJOR=3
+ OS_GIT_MINOR=11+
+ OS_GIT_PATCH=0
+ KUBE_GIT_MAJOR=1
+ KUBE_GIT_MINOR=10+
+ KUBE_GIT_COMMIT=b81c8f8
+ KUBE_GIT_VERSION=v1.10.0+b81c8f8
+ ETCD_GIT_VERSION=v3.2.16-0-g121edf0
+ ETCD_GIT_COMMIT=121edf0
+ OS_BUILD_RELEASE_ARCHIVES=n
+ make build-cross
make[1]: Entering directory `/go/src/github.com/openshift/origin'
hack/build-cross.sh
++ Building go targets for linux/amd64: images/pod
++ Building go targets for linux/amd64: pkg/network/sdn-cni-plugin vendor/github.com/containernetworking/plugins/plugins/ipam/host-local vendor/github.com/containernetworking/plugins/plugins/main/loopback
++ Building go targets for linux/amd64: cmd/hypershift cmd/openshift cmd/oc cmd/oadm cmd/template-service-broker cmd/openshift-node-config vendor/k8s.io/kubernetes/cmd/hyperkube

      Exit Code:    137
      Started:      Thu, 28 Jun 2018 21:08:04 -0400
      Finished:     Thu, 28 Jun 2018 21:08:28 -0400
    Ready:          False
    Restart Count:  0
    Limits:
      cpu:     6
      memory:  6Gi
    Requests:
      cpu:     2
      memory:  4Gi
    Environment:
      BUILD:  {"kind":"Build","apiVersion":"v1","metadata":{"name":"rpms","namespace":"ci-op-hgb560gs","selfLink":"/apis/build.openshift.io/v1/namespaces/ci-op-hgb560gs/builds/rpms","uid":"de7f0b50-7b38-11e8-bfdd-42010a8e0004","resourceVersion":"24457848","creationTimestamp":"2018-06-29T01:08:01Z","labels":{"build-id":"","created-by-ci":"true","creates":"rpms","job":"dev","persists-between-builds":"false"},"annotations":{"ci.openshift.io/job-spec":""},"ownerReferences":[{"apiVersion":"image.openshift.io/v1","kind":"ImageStream","name":"pipeline","uid":"2e1ad106-7b37-11e8-bfdd-42010a8e0004","controller":true}]},"spec":{"serviceAccount":"builder","source":{"type":"Dockerfile","dockerfile":"FROM pipeline:bin\nRUN [\"/bin/bash\", \"-c\", \"set -o errexit; umask 0002; make build-rpms; ln -s $( pwd )/_output/local/releases/rpms/ /srv/repo\"]"},"strategy":{"type":"Docker","dockerStrategy":{"from":{"kind":"DockerImage","name":"docker-registry.default.svc:5000/ci-op-hgb560gs/pipeline@sha256:b6728769bf92f153d5ea85d5c3c39a331774aed3b85b4bee5e29f688c57e0b2d"},"pullSecret":{"name":"builder-dockercfg-s45gl"},"noCache":true,"forcePull":true,"imageOptimizationPolicy":"SkipLayers"}},"output":{"to":{"kind":"DockerImage","name":"docker-registry.default.svc:5000/ci-op-hgb560gs/pipeline:rpms"},"pushSecret":{"name":"builder-dockercfg-s45gl"}},"resources":{"limits":{"cpu":"6","memory":"6Gi"},"requests":{"cpu":"2","memory":"4Gi"}},"postCommit":{},"nodeSelector":null,"triggeredBy":null},"status":{"phase":"New","outputDockerImageReference":"docker-registry.default.svc:5000/ci-op-hgb560gs/pipeline:rpms","output":{}}}

      PUSH_DOCKERCFG_PATH:  /var/run/secrets/openshift.io/push
      PULL_DOCKERCFG_PATH:  /var/run/secrets/openshift.io/pull
    Mounts:
      /tmp/build from buildworkdir (rw)
      /var/run/crio/crio.sock from crio-socket (rw)
      /var/run/docker.sock from docker-socket (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from builder-token-dk2hc (ro)
      /var/run/secrets/openshift.io/pull from builder-dockercfg-s45gl-pull (ro)
      /var/run/secrets/openshift.io/push from builder-dockercfg-s45gl-push (ro)
Conditions:
  Type           Status
  Initialized    True
  Ready          False
  PodScheduled   True
Volumes:
  buildworkdir:
    Type:    EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
  docker-socket:
    Type:          HostPath (bare host directory volume)
    Path:          /var/run/docker.sock
    HostPathType:
  crio-socket:
    Type:          HostPath (bare host directory volume)
    Path:          /var/run/crio/crio.sock
    HostPathType:
  builder-dockercfg-s45gl-push:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  builder-dockercfg-s45gl
    Optional:    false
  builder-dockercfg-s45gl-pull:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  builder-dockercfg-s45gl
    Optional:    false
  builder-token-dk2hc:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  builder-token-dk2hc
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  role=app
Tolerations:     node.kubernetes.io/memory-pressure:NoSchedule
Events:
  Type    Reason     Age   From                          Message
  ----    ------     ----  ----                          -------
  Normal  Scheduled  16m   default-scheduler             Successfully assigned rpms-build to origin-ci-ig-n-s1lt
  Normal  Pulled     16m   kubelet, origin-ci-ig-n-s1lt  Container image "docker.io/openshift/origin-docker-builder:v3.10.0" already present on machine
  Normal  Created    16m   kubelet, origin-ci-ig-n-s1lt  Created container
  Normal  Started    16m   kubelet, origin-ci-ig-n-s1lt  Started container
  Normal  Pulled     16m   kubelet, origin-ci-ig-n-s1lt  Container image "docker.io/openshift/origin-docker-builder:v3.10.0" already present on machine
  Normal  Created    16m   kubelet, origin-ci-ig-n-s1lt  Created container
  Normal  Started    16m   kubelet, origin-ci-ig-n-s1lt  Started container


Build:
------
Name:		rpms
Namespace:	ci-op-hgb560gs
Created:	15 minutes ago
Labels:		build-id=
		created-by-ci=true
		creates=rpms
		job=dev
		persists-between-builds=false
Annotations:	ci.openshift.io/job-spec=
		openshift.io/build.pod-name=rpms-build

Status:		Failed (Generic Build failure - check logs for details.)
Started:	Thu, 28 Jun 2018 21:08:01 EDT
Duration:	28s

Build Pod:	rpms-build

Strategy:	Docker
Dockerfile:
  FROM pipeline:bin
  RUN ["/bin/bash", "-c", "set -o errexit; umask 0002; make build-rpms; ln -s $( pwd )/_output/local/releases/rpms/ /srv/repo"]
From Image:	ImageStreamTag ci-op-hgb560gs/pipeline:bin
No Cache:	true
Force Pull:	true
Output to:	ImageStreamTag ci-op-hgb560gs/pipeline:rpms
Empty Source:	no input source provided
Push Secret:	builder-dockercfg-s45gl

Build trigger cause:	<unknown>

Log Tail:	make[1]: Entering directory `/go/src/github.com/openshift/origin'
		hack/build-cross.sh
		++ Building go targets for linux/amd64: images/pod
		++ Building go targets for linux/amd64: pkg/network/sdn-cn...ithub.com/containernetworking/plugins/plugins/main/loopback
		++ Building go targets for linux/amd64: cmd/hypershift cmd...penshift-node-config vendor/k8s.io/kubernetes/cmd/hyperkube
Events:
  Type		Reason		Age	From				Message
  ----		------		----	----				-------
  Normal	Scheduled	15m	default-scheduler		Successfully assigned rpms-build to origin-ci-ig-n-s1lt
  Normal	Pulled		15m	kubelet, origin-ci-ig-n-s1lt	Container image "docker.io/openshift/origin-docker-builder:v3.10.0" already present on machine
  Normal	Created		15m	kubelet, origin-ci-ig-n-s1lt	Created container
  Normal	Started		15m	kubelet, origin-ci-ig-n-s1lt	Started container
  Normal	Pulled		15m	kubelet, origin-ci-ig-n-s1lt	Container image "docker.io/openshift/origin-docker-builder:v3.10.0" already present on machine
  Normal	Created		15m	kubelet, origin-ci-ig-n-s1lt	Created container
  Normal	Started		15m	kubelet, origin-ci-ig-n-s1lt	Started container
  Normal	BuildStarted	15m	build-controller		Build ci-op-hgb560gs/rpms is now running
  Normal	BuildFailed	15m	build-controller		Build ci-op-hgb560gs/rpms failed

Comment 1 openshift-github-bot 2018-07-19 10:33:26 UTC
Commits pushed to master at https://github.com/openshift/origin

https://github.com/openshift/origin/commit/a7acc5b46aeba921c9ee3b7dc6f631937340e5d9
Bug 1596440 - surface OOMKilled pod to build

https://github.com/openshift/origin/commit/6b0c54066a9b718e16485e7a8f30a12035e9b015
Merge pull request #20297 from wozniakjan/bug-1596440/surface_oomkilled_in_build

Bug 1596440 - surface OOMKilled pod to build

Comment 2 Ben Parees 2018-07-31 15:21:08 UTC
*** Bug 1610437 has been marked as a duplicate of this bug. ***

Comment 5 wewang 2018-08-24 03:08:21 UTC
Verified in openshift v3.11.0-0.21.0
build already gave exactly error info about openshift killed the process because of OOM.

steps:
1. create a project and set a low resource limit with follow info
        {
            "apiVersion": "v1",
            "kind": "LimitRange",
            "metadata": {
                "creationTimestamp": null,
                "name": "resource-limits"
            },
            "spec": {
                "limits": [
                    {
                        "type": "Pod",
                        "max": {
                            "cpu": "1",
                            "memory": "100Mi"
                        },
                        "min": {
                            "cpu": "10m",
                            "memory": "50Mi"
                        }
                    },
                    {
                        "type": "Container",
                        "default": {
                            "cpu": "50m",
                            "memory": "100Mi"
                        },
                        "defaultRequest": {
                            "cpu": "10m",
                            "memory": "50Mi"
                        }
                    }
                ]
            }
        }

2. Check the info
[root@qe-wewang-testmaster-etcd-1 ~]# oc describe limitrange -n wewang2
Name:       resource-limits
Namespace:  wewang2
Type        Resource  Min   Max    Default Request  Default Limit  Max Limit/Request Ratio
----        --------  ---   ---    ---------------  -------------  -----------------------
Pod         memory    50Mi  100Mi  -                -              -
Pod         cpu       10m   1      -                -              -
Container   cpu       -     -      10m              50m            -
Container   memory    -     -      50Mi             100Mi          -

3.Create apps in the limited project, check the build and build pod
$ oc new-app nodejs-mongodb-example -n wewang2
$ oc get builds
NAME                       TYPE      FROM          STATUS                       STARTED         DURATION
nodejs-mongodb-example-1   Source    Git@b078bcf   Failed (OutOfMemoryKilled)   6 minutes ago   1m26s
$ oc get pods
NAME                             READY     STATUS      RESTARTS   AGE
mongodb-1-deploy                 1/1       Running     0          7m
nodejs-mongodb-example-1-build   0/1       OOMKilled   0          7m

Comment 7 errata-xmlrpc 2018-10-11 07:20:43 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2652