Bug 1452442 - Event should show full failed reason when readiness probe failed
Summary: Event should show full failed reason when readiness probe failed
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 3.6.0
Hardware: Unspecified
OS: Unspecified
high
medium
Target Milestone: ---
: ---
Assignee: Derek Carr
QA Contact: DeShuai Ma
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-05-19 03:09 UTC by DeShuai Ma
Modified: 2017-09-26 13:51 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-08-10 05:25:32 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2017:1716 0 normal SHIPPED_LIVE Red Hat OpenShift Container Platform 3.6 RPM Release Advisory 2017-08-10 09:02:50 UTC

Description DeShuai Ma 2017-05-19 03:09:10 UTC
Description of problem:
Now when readiness fail, there is no event to record the info. This bug first reported on ocp3.3(https://bugzilla.redhat.com/show_bug.cgi?id=1367204) and now reproduce again on 3.6. 3.5 no this bug.

Version-Release number of selected component (if applicable):
openshift v3.6.78
kubernetes v1.6.1+5115d708d7
etcd 3.1.0

How reproducible:
Always

Steps to Reproduce:
1. Create a pod with readiness failure
# oc create -f https://raw.githubusercontent.com/openshift-qe/v3-testfiles/master/pods/tc533910/readiness-probe-exec.yaml -n dma
pod "hello-pod" created

2.When pod become running, describe the pod and check the event.
# oc get po -n dma
NAME        READY     STATUS    RESTARTS   AGE
hello-pod   0/1       Running   0          23s
[root@ip-172-18-10-205 ~]# oc get po -n dma
NAME        READY     STATUS    RESTARTS   AGE
hello-pod   0/1       Running   0          25s
[root@ip-172-18-10-205 ~]# oc describe po hello-pod -n dma
Name:			hello-pod
Namespace:		dma
Security Policy:	anyuid
Node:			ip-172-18-5-59.ec2.internal/172.18.5.59
Start Time:		Thu, 18 May 2017 23:07:42 -0400
Labels:			name=hello-pod
Annotations:		openshift.io/scc=anyuid
Status:			Running
IP:			10.128.0.13
Controllers:		<none>
Containers:
  hello-pod:
    Container ID:	docker://5816f2b9f500813fd9462daf6b809e71b5547397bb60066b1c5f7b0f13dc1038
    Image:		docker.io/ocpqe/hello-pod:latest
    Image ID:		docker-pullable://docker.io/ocpqe/hello-pod@sha256:90b815d55c95fffafd7b68a997787d0b939cdae1bca785c6f52b5d3ffa70714f
    Port:		8080/TCP
    State:		Running
      Started:		Thu, 18 May 2017 23:08:02 -0400
    Ready:		False
    Restart Count:	0
    Readiness:		exec [/bin/hello] delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment:	<none>
    Mounts:
      /tmp from tmp (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-n9vrh (ro)
Conditions:
  Type		Status
  Initialized 	True 
  Ready 	False 
  PodScheduled 	True 
Volumes:
  tmp:
    Type:	EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:	
  default-token-n9vrh:
    Type:	Secret (a volume populated by a Secret)
    SecretName:	default-token-n9vrh
    Optional:	false
QoS Class:	BestEffort
Node-Selectors:	<none>
Tolerations:	<none>
Events:
  FirstSeen	LastSeen	Count	From					SubObjectPath			Type		Reason		Message
  ---------	--------	-----	----					-------------			--------	------		-------
  35s		35s		1	default-scheduler							Normal		Scheduled	Successfully assigned hello-pod to ip-172-18-5-59.ec2.internal
  34s		34s		1	kubelet, ip-172-18-5-59.ec2.internal	spec.containers{hello-pod}	Normal		Pulling		pulling image "docker.io/ocpqe/hello-pod:latest"
  15s		15s		1	kubelet, ip-172-18-5-59.ec2.internal	spec.containers{hello-pod}	Normal		Pulled		Successfully pulled image "docker.io/ocpqe/hello-pod:latest"
  15s		15s		1	kubelet, ip-172-18-5-59.ec2.internal	spec.containers{hello-pod}	Normal		Created		Created container with id 5816f2b9f500813fd9462daf6b809e71b5547397bb60066b1c5f7b0f13dc1038
  14s		14s		1	kubelet, ip-172-18-5-59.ec2.internal	spec.containers{hello-pod}	Normal		Started		Started container with id 5816f2b9f500813fd9462daf6b809e71b5547397bb60066b1c5f7b0f13dc1038

Actual results:

Expected results:
2. Should contain the Readiness failure event.
$ oc get event -n dma|grep "Readiness probe failed"
4s         4m          25        nginx     Pod       spec.containers{nginx}   Warning   Unhealthy   {kubelet openshift-159.lab.sjc.redhat.com}   Readiness probe failed: Get http://10.1.0.6:112/: dial tcp 10.1.0.6:112: getsockopt: connection refused


Additional info:

Comment 1 DeShuai Ma 2017-05-19 03:35:51 UTC
When deploymentconfig create pod without specify cpu,memory, the forbidden event also missing.

Steps:
1. Create a quota
# oc new-app --file=https://raw.githubusercontent.com/openshift-qe/v3-testfiles/master/quota/quota_template.yaml --param=CPU_VALUE\=20 --param=MEM_VALUE\=1Gi --param=PV_VALUE\=10 --param=POD_VALUE\=10 --param=RC_VALUE\=20 --param=RQ_VALUE\=1 --param=SECRET_VALUE\=10 --param=SVC_VALUE\=5 -n dma2

3. Create a dc without specify cpu,memory, Check the event
# oc describe quota myquota -n dma2
Name:			myquota
Namespace:		dma2
Resource		Used	Hard
--------		----	----
cpu			0	20
memory			0	1Gi
persistentvolumeclaims	0	10
pods			0	10
replicationcontrollers	0	20
resourcequotas		1	1
secrets			9	10
services		0	5
# oc run nginx -n dma2  --image=nginx --replicas 1
deploymentconfig "nginx" created
[root@ip-172-18-10-205 ~]# oc get event -n dma2
LASTSEEN   FIRSTSEEN   COUNT     NAME      KIND               SUBOBJECT   TYPE      REASON              SOURCE                        MESSAGE
13s        13s         1         nginx     DeploymentConfig               Normal    DeploymentCreated   deploymentconfig-controller   Created new replication controller "nginx-1" for version 1
# oc get event -n dma2
LASTSEEN   FIRSTSEEN   COUNT     NAME      KIND               SUBOBJECT   TYPE      REASON              SOURCE                        MESSAGE
22s        22s         1         nginx     DeploymentConfig               Normal    DeploymentCreated   deploymentconfig-controller   Created new replication controller "nginx-1" for version 1


Actual results:
3. forbidden event is missing


Expected results:
3. In before version(ocp3.5), there should show forbidden event
 Warning   FailedRetry         deployments-controller        nginx-1: About to stop retrying nginx-1: couldn't create deployer pod for ve-m2/nginx-1: pods "nginx-1-deploy" is forbidden: failed quota: myquota: must specify cpu,memory

Comment 2 Derek Carr 2017-05-22 20:13:46 UTC
I did the following:

oc create -f https://raw.githubusercontent.com/openshift-qe/v3-testfiles/master/pods/tc533910/readiness-probe-exec.yaml

I see the following in the logs:

I0522 20:09:53.632838   23246 prober.go:106] Readiness probe for "hello-pod_myproject(0f482ed8-3f2a-11e7-87c5-c85b76cda386):hello-pod" failed (failure): rpc error: code = 13 desc = invalid header field value "oci runtime error: exec failed: container_linux.go:247: starting container process caused \"exec: \\\"/bin/hello\\\": stat /bin/hello: no such file or directory\"\n"

E0522 20:09:53.632872   23246 event.go:259] Could not construct reference to: '&v1.ObjectReference{Kind:"Pod", Namespace:"myproject", Name:"hello-pod", UID:"0f482ed8-3f2a-11e7-87c5-c85b76cda386", APIVersion:"v1", ResourceVersion:"1004", FieldPath:"spec.containers{hello-pod}"}' due to: 'object does not implement the List interfaces'. Will not report event: 'Warning' 'Unhealthy' 'Readiness probe failed: rpc error: code = 13 desc = invalid header field value "oci runtime error: exec failed: container_linux.go:247: starting container process caused \"exec: \\\"/bin/hello\\\": stat /bin/hello: no such file or directory\"\n"

I am debugging further.

Comment 3 Derek Carr 2017-05-22 20:21:50 UTC
I was able to reproduce the same on kubernetes upstream.

Comment 4 Derek Carr 2017-05-22 20:28:06 UTC
opened upstream issue:
https://github.com/kubernetes/kubernetes/issues/46241

Comment 5 Derek Carr 2017-05-22 21:29:06 UTC
Upstream PR:
https://github.com/kubernetes/kubernetes/pull/46246

Comment 6 Derek Carr 2017-05-23 14:45:22 UTC
Origin PR:
https://github.com/openshift/origin/pull/14299

Comment 8 DeShuai Ma 2017-06-05 09:29:23 UTC
Verify on openshift v3.6.94

# oc create -f https://raw.githubusercontent.com/openshift-qe/v3-testfiles/master/pods/tc533910/readiness-probe-exec.yaml -n dma
pod "hello-pod" created
# oc get po -n dma
NAME        READY     STATUS    RESTARTS   AGE
hello-pod   0/1       Running   0          57s
# oc describe po hello-pod -n dma
Name:			hello-pod
Namespace:		dma
Security Policy:	anyuid
Node:			qe-dma36-node-registry-router-1/10.240.0.4
Start Time:		Mon, 05 Jun 2017 05:26:39 -0400
Labels:			name=hello-pod
Annotations:		openshift.io/scc=anyuid
Status:			Running
IP:			10.129.0.59
Controllers:		<none>
Containers:
  hello-pod:
    Container ID:	docker://89cbe8ab7d05d16222dad27a1c09ca434cb663c38019895226a037c180ac070f
    Image:		docker.io/ocpqe/hello-pod:latest
    Image ID:		docker-pullable://docker.io/ocpqe/hello-pod@sha256:90b815d55c95fffafd7b68a997787d0b939cdae1bca785c6f52b5d3ffa70714f
    Port:		8080/TCP
    State:		Running
      Started:		Mon, 05 Jun 2017 05:26:43 -0400
    Ready:		False
    Restart Count:	0
    Readiness:		exec [/bin/hello] delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment:	<none>
    Mounts:
      /tmp from tmp (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-g18x9 (ro)
Conditions:
  Type		Status
  Initialized 	True 
  Ready 	False 
  PodScheduled 	True 
Volumes:
  tmp:
    Type:	EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:	
  default-token-g18x9:
    Type:	Secret (a volume populated by a Secret)
    SecretName:	default-token-g18x9
    Optional:	false
QoS Class:	BestEffort
Node-Selectors:	<none>
Tolerations:	<none>
Events:
  FirstSeen	LastSeen	Count	From						SubObjectPath			Type		Reason		Message
  ---------	--------	-----	----						-------------			--------	------		-------
  1m		1m		1	default-scheduler								Normal		Scheduled	Successfully assigned hello-pod to qe-dma36-node-registry-router-1
  1m		1m		1	kubelet, qe-dma36-node-registry-router-1	spec.containers{hello-pod}	Normal		Pulled		Container image "docker.io/ocpqe/hello-pod:latest" already present on machine
  1m		1m		1	kubelet, qe-dma36-node-registry-router-1	spec.containers{hello-pod}	Normal		Created		Created container with id 89cbe8ab7d05d16222dad27a1c09ca434cb663c38019895226a037c180ac070f
  1m		1m		1	kubelet, qe-dma36-node-registry-router-1	spec.containers{hello-pod}	Normal		Started		Started container with id 89cbe8ab7d05d16222dad27a1c09ca434cb663c38019895226a037c180ac070f
  55s		5s		6	kubelet, qe-dma36-node-registry-router-1	spec.containers{hello-pod}	Warning		Unhealthy	Readiness probe failed: rpc error: code = 13 desc = invalid header field value "oci runtime error: exec failed: container_linux.go:247: starting container process caused \"exec: \\\"/bin/hello\\\": stat /bin/hello: no such file or directory\"\n"

Comment 10 errata-xmlrpc 2017-08-10 05:25:32 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:1716


Note You need to log in before you can comment on or make changes to this bug.