Description of problem: Now when readiness fail, there is no event to record the info. This bug first reported on ocp3.3(https://bugzilla.redhat.com/show_bug.cgi?id=1367204) and now reproduce again on 3.6. 3.5 no this bug. Version-Release number of selected component (if applicable): openshift v3.6.78 kubernetes v1.6.1+5115d708d7 etcd 3.1.0 How reproducible: Always Steps to Reproduce: 1. Create a pod with readiness failure # oc create -f https://raw.githubusercontent.com/openshift-qe/v3-testfiles/master/pods/tc533910/readiness-probe-exec.yaml -n dma pod "hello-pod" created 2.When pod become running, describe the pod and check the event. # oc get po -n dma NAME READY STATUS RESTARTS AGE hello-pod 0/1 Running 0 23s [root@ip-172-18-10-205 ~]# oc get po -n dma NAME READY STATUS RESTARTS AGE hello-pod 0/1 Running 0 25s [root@ip-172-18-10-205 ~]# oc describe po hello-pod -n dma Name: hello-pod Namespace: dma Security Policy: anyuid Node: ip-172-18-5-59.ec2.internal/172.18.5.59 Start Time: Thu, 18 May 2017 23:07:42 -0400 Labels: name=hello-pod Annotations: openshift.io/scc=anyuid Status: Running IP: 10.128.0.13 Controllers: <none> Containers: hello-pod: Container ID: docker://5816f2b9f500813fd9462daf6b809e71b5547397bb60066b1c5f7b0f13dc1038 Image: docker.io/ocpqe/hello-pod:latest Image ID: docker-pullable://docker.io/ocpqe/hello-pod@sha256:90b815d55c95fffafd7b68a997787d0b939cdae1bca785c6f52b5d3ffa70714f Port: 8080/TCP State: Running Started: Thu, 18 May 2017 23:08:02 -0400 Ready: False Restart Count: 0 Readiness: exec [/bin/hello] delay=0s timeout=1s period=10s #success=1 #failure=3 Environment: <none> Mounts: /tmp from tmp (rw) /var/run/secrets/kubernetes.io/serviceaccount from default-token-n9vrh (ro) Conditions: Type Status Initialized True Ready False PodScheduled True Volumes: tmp: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: default-token-n9vrh: Type: Secret (a volume populated by a Secret) SecretName: default-token-n9vrh Optional: false QoS Class: BestEffort Node-Selectors: <none> Tolerations: <none> Events: FirstSeen LastSeen Count From SubObjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ ------- 35s 35s 1 default-scheduler Normal Scheduled Successfully assigned hello-pod to ip-172-18-5-59.ec2.internal 34s 34s 1 kubelet, ip-172-18-5-59.ec2.internal spec.containers{hello-pod} Normal Pulling pulling image "docker.io/ocpqe/hello-pod:latest" 15s 15s 1 kubelet, ip-172-18-5-59.ec2.internal spec.containers{hello-pod} Normal Pulled Successfully pulled image "docker.io/ocpqe/hello-pod:latest" 15s 15s 1 kubelet, ip-172-18-5-59.ec2.internal spec.containers{hello-pod} Normal Created Created container with id 5816f2b9f500813fd9462daf6b809e71b5547397bb60066b1c5f7b0f13dc1038 14s 14s 1 kubelet, ip-172-18-5-59.ec2.internal spec.containers{hello-pod} Normal Started Started container with id 5816f2b9f500813fd9462daf6b809e71b5547397bb60066b1c5f7b0f13dc1038 Actual results: Expected results: 2. Should contain the Readiness failure event. $ oc get event -n dma|grep "Readiness probe failed" 4s 4m 25 nginx Pod spec.containers{nginx} Warning Unhealthy {kubelet openshift-159.lab.sjc.redhat.com} Readiness probe failed: Get http://10.1.0.6:112/: dial tcp 10.1.0.6:112: getsockopt: connection refused Additional info:
When deploymentconfig create pod without specify cpu,memory, the forbidden event also missing. Steps: 1. Create a quota # oc new-app --file=https://raw.githubusercontent.com/openshift-qe/v3-testfiles/master/quota/quota_template.yaml --param=CPU_VALUE\=20 --param=MEM_VALUE\=1Gi --param=PV_VALUE\=10 --param=POD_VALUE\=10 --param=RC_VALUE\=20 --param=RQ_VALUE\=1 --param=SECRET_VALUE\=10 --param=SVC_VALUE\=5 -n dma2 3. Create a dc without specify cpu,memory, Check the event # oc describe quota myquota -n dma2 Name: myquota Namespace: dma2 Resource Used Hard -------- ---- ---- cpu 0 20 memory 0 1Gi persistentvolumeclaims 0 10 pods 0 10 replicationcontrollers 0 20 resourcequotas 1 1 secrets 9 10 services 0 5 # oc run nginx -n dma2 --image=nginx --replicas 1 deploymentconfig "nginx" created [root@ip-172-18-10-205 ~]# oc get event -n dma2 LASTSEEN FIRSTSEEN COUNT NAME KIND SUBOBJECT TYPE REASON SOURCE MESSAGE 13s 13s 1 nginx DeploymentConfig Normal DeploymentCreated deploymentconfig-controller Created new replication controller "nginx-1" for version 1 # oc get event -n dma2 LASTSEEN FIRSTSEEN COUNT NAME KIND SUBOBJECT TYPE REASON SOURCE MESSAGE 22s 22s 1 nginx DeploymentConfig Normal DeploymentCreated deploymentconfig-controller Created new replication controller "nginx-1" for version 1 Actual results: 3. forbidden event is missing Expected results: 3. In before version(ocp3.5), there should show forbidden event Warning FailedRetry deployments-controller nginx-1: About to stop retrying nginx-1: couldn't create deployer pod for ve-m2/nginx-1: pods "nginx-1-deploy" is forbidden: failed quota: myquota: must specify cpu,memory
I did the following: oc create -f https://raw.githubusercontent.com/openshift-qe/v3-testfiles/master/pods/tc533910/readiness-probe-exec.yaml I see the following in the logs: I0522 20:09:53.632838 23246 prober.go:106] Readiness probe for "hello-pod_myproject(0f482ed8-3f2a-11e7-87c5-c85b76cda386):hello-pod" failed (failure): rpc error: code = 13 desc = invalid header field value "oci runtime error: exec failed: container_linux.go:247: starting container process caused \"exec: \\\"/bin/hello\\\": stat /bin/hello: no such file or directory\"\n" E0522 20:09:53.632872 23246 event.go:259] Could not construct reference to: '&v1.ObjectReference{Kind:"Pod", Namespace:"myproject", Name:"hello-pod", UID:"0f482ed8-3f2a-11e7-87c5-c85b76cda386", APIVersion:"v1", ResourceVersion:"1004", FieldPath:"spec.containers{hello-pod}"}' due to: 'object does not implement the List interfaces'. Will not report event: 'Warning' 'Unhealthy' 'Readiness probe failed: rpc error: code = 13 desc = invalid header field value "oci runtime error: exec failed: container_linux.go:247: starting container process caused \"exec: \\\"/bin/hello\\\": stat /bin/hello: no such file or directory\"\n" I am debugging further.
I was able to reproduce the same on kubernetes upstream.
opened upstream issue: https://github.com/kubernetes/kubernetes/issues/46241
Upstream PR: https://github.com/kubernetes/kubernetes/pull/46246
Origin PR: https://github.com/openshift/origin/pull/14299
Verify on openshift v3.6.94 # oc create -f https://raw.githubusercontent.com/openshift-qe/v3-testfiles/master/pods/tc533910/readiness-probe-exec.yaml -n dma pod "hello-pod" created # oc get po -n dma NAME READY STATUS RESTARTS AGE hello-pod 0/1 Running 0 57s # oc describe po hello-pod -n dma Name: hello-pod Namespace: dma Security Policy: anyuid Node: qe-dma36-node-registry-router-1/10.240.0.4 Start Time: Mon, 05 Jun 2017 05:26:39 -0400 Labels: name=hello-pod Annotations: openshift.io/scc=anyuid Status: Running IP: 10.129.0.59 Controllers: <none> Containers: hello-pod: Container ID: docker://89cbe8ab7d05d16222dad27a1c09ca434cb663c38019895226a037c180ac070f Image: docker.io/ocpqe/hello-pod:latest Image ID: docker-pullable://docker.io/ocpqe/hello-pod@sha256:90b815d55c95fffafd7b68a997787d0b939cdae1bca785c6f52b5d3ffa70714f Port: 8080/TCP State: Running Started: Mon, 05 Jun 2017 05:26:43 -0400 Ready: False Restart Count: 0 Readiness: exec [/bin/hello] delay=0s timeout=1s period=10s #success=1 #failure=3 Environment: <none> Mounts: /tmp from tmp (rw) /var/run/secrets/kubernetes.io/serviceaccount from default-token-g18x9 (ro) Conditions: Type Status Initialized True Ready False PodScheduled True Volumes: tmp: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: default-token-g18x9: Type: Secret (a volume populated by a Secret) SecretName: default-token-g18x9 Optional: false QoS Class: BestEffort Node-Selectors: <none> Tolerations: <none> Events: FirstSeen LastSeen Count From SubObjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ ------- 1m 1m 1 default-scheduler Normal Scheduled Successfully assigned hello-pod to qe-dma36-node-registry-router-1 1m 1m 1 kubelet, qe-dma36-node-registry-router-1 spec.containers{hello-pod} Normal Pulled Container image "docker.io/ocpqe/hello-pod:latest" already present on machine 1m 1m 1 kubelet, qe-dma36-node-registry-router-1 spec.containers{hello-pod} Normal Created Created container with id 89cbe8ab7d05d16222dad27a1c09ca434cb663c38019895226a037c180ac070f 1m 1m 1 kubelet, qe-dma36-node-registry-router-1 spec.containers{hello-pod} Normal Started Started container with id 89cbe8ab7d05d16222dad27a1c09ca434cb663c38019895226a037c180ac070f 55s 5s 6 kubelet, qe-dma36-node-registry-router-1 spec.containers{hello-pod} Warning Unhealthy Readiness probe failed: rpc error: code = 13 desc = invalid header field value "oci runtime error: exec failed: container_linux.go:247: starting container process caused \"exec: \\\"/bin/hello\\\": stat /bin/hello: no such file or directory\"\n"
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:1716