Bug 1465801

Summary:	Some events record in ocp36 is different from ocp35
Product:	OpenShift Container Platform	Reporter:	DeShuai Ma <dma>
Component:	Node	Assignee:	Derek Carr <decarr>
Status:	CLOSED ERRATA	QA Contact:	DeShuai Ma <dma>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	3.6.0	CC:	aos-bugs, eparis, jokerman, mmccomas, wmeng, xtian, yadu
Target Milestone:	---
Target Release:	3.7.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	No Doc Update
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2017-11-28 21:58:46 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description DeShuai Ma 2017-06-28 08:55:18 UTC

Description of problem:
When quota forbid dc create pod, describe dc there is no event record.
For rc there is 'FailedCreate' event

Version-Release number of selected component (if applicable):
openshift v3.6.126
kubernetes v1.6.1+5115d708d7
etcd 3.2.0

How reproducible:
Always

Steps to Reproduce:
1.Create quota in project
#oc new-app --file=https://raw.githubusercontent.com/openshift-qe/v3-testfiles/master/quota/quota_template.yaml --param=CPU_VALUE\=20 --param=MEM_VALUE\=1Gi --param=PV_VALUE\=10 --param=POD_VALUE\=10 --param=RC_VALUE\=20 --param=RQ_VALUE\=1 --param=SECRET_VALUE\=10 --param=SVC_VALUE\=5 -n dma

2.Create dc without cpu,memory info
[root@qe-dma126-master-1 dma]# oc run hello-pod --image=docker.io/ocpqe/hello-pod --replicas 1 -n dma
deploymentconfig "hello-pod" created

3.Describe the dc
[root@qe-dma126-master-1 dma]# oc get dc -n dma
NAME        REVISION   DESIRED   CURRENT   TRIGGERED BY
hello-pod   1          1         0         config
[root@qe-dma126-master-1 dma]# oc describe dc/hello-pod -n dma
Name:		hello-pod
Namespace:	dma
Created:	2 minutes ago
Labels:		run=hello-pod
Annotations:	<none>
Latest Version:	1
Selector:	run=hello-pod
Replicas:	1
Triggers:	Config
Strategy:	Rolling
Template:
Pod Template:
  Labels:	run=hello-pod
  Containers:
   hello-pod:
    Image:		docker.io/ocpqe/hello-pod
    Port:		
    Environment:	<none>
    Mounts:		<none>
  Volumes:		<none>

Deployment #1 (latest):
	Name:		hello-pod-1
	Created:	2 minutes ago
	Status:		New
	Replicas:	0 current / 0 desired
	Selector:	deployment=hello-pod-1,deploymentconfig=hello-pod,run=hello-pod
	Labels:		openshift.io/deployment-config.name=hello-pod,run=hello-pod
	Pods Status:	0 Running / 0 Waiting / 0 Succeeded / 0 Failed

Events:
  FirstSeen	LastSeen	Count	From				SubObjectPath	Type		Reason			Message
  ---------	--------	-----	----				-------------	--------	------			-------
  2m		2m		1	deploymentconfig-controller			Normal		DeploymentCreated	Created new replication controller "hello-pod-1" for version 1

Actual results:


Expected results:
This is should have event: Error creating: pods "hello-pod-" is forbidden: failed quota: myquota: must specify cpu,memory

Additional info:
1) Only create pod, there has forbidden event
2) Only create rc, there also has this event

Comment 1 DeShuai Ma 2017-06-28 09:28:54 UTC

Description of problem:
Another event related issue:
When container failed to start, the event don't have any detail failed reason and keep ContainerCreating.
In ocp3.5 we have detail reason said "...Error syncing pod. .... shm_rmid_forced: invalid argument" But in ocp36 in event can't see what's wrong.

Version-Release number of selected component (if applicable):
openshift v3.6.126
kubernetes v1.6.1+5115d708d7
etcd 3.2.0

Steps:
[root@qe-dma126-master-1 ~]# oc create -f https://raw.githubusercontent.com/openshift-qe/v3-testfiles/master/pods/sysctls/pod-sysctl-safe-invalid2.yaml -n test1
pod "hello-pod" created

[root@qe-dma126-master-1 ~]# oc get po -n test1
NAME        READY     STATUS              RESTARTS   AGE
hello-pod   0/1       ContainerCreating   0          1m
[root@qe-dma126-master-1 ~]# oc describe po hello-pod -n test1
Name:			hello-pod
Namespace:		test1
Security Policy:	anyuid
Node:			qe-dma126-node-registry-router-1/10.240.0.50
Start Time:		Wed, 28 Jun 2017 05:20:00 -0400
Labels:			name=hello-pod
Annotations:		openshift.io/scc=anyuid
			security.alpha.kubernetes.io/sysctls=kernel.shm_rmid_forced=hello
Status:			Pending
IP:			
Controllers:		<none>
Containers:
  hello-pod:
    Container ID:	
    Image:		docker.io/ocpqe/hello-pod:latest
    Image ID:		
    Port:		8080/TCP
    State:		Waiting
      Reason:		ContainerCreating
    Ready:		False
    Restart Count:	0
    Environment:	<none>
    Mounts:
      /tmp from tmp (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-qv1mb (ro)
Conditions:
  Type		Status
  Initialized 	True 
  Ready 	False 
  PodScheduled 	True 
Volumes:
  tmp:
    Type:	EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:	
  default-token-qv1mb:
    Type:	Secret (a volume populated by a Secret)
    SecretName:	default-token-qv1mb
    Optional:	false
QoS Class:	BestEffort
Node-Selectors:	<none>
Tolerations:	<none>
Events:
  FirstSeen	LastSeen	Count	From						SubObjectPath	Type		Reason			Message
  ---------	--------	-----	----						-------------	--------	------			-------
  1m		1m		1	default-scheduler						Normal		Scheduled		Successfully assigned hello-pod to qe-dma126-node-registry-router-1
  1m		7s		32	kubelet, qe-dma126-node-registry-router-1			Warning		DNSSearchForming	Found and omitted duplicated dns domain in host search line: 'cluster.local' during merging with cluster dns domains
  1m		5s		32	kubelet, qe-dma126-node-registry-router-1			Normal		SandboxChanged		Pod sandbox changed, it will be killed and re-created.
  1m		4s		33	kubelet, qe-dma126-node-registry-router-1			Warning		FailedSync		Error syncing pod

Expected results:
Should show container failed reason "/proc/sys/kernel/shm_rmid_forced: invalid argument" in event.

Additional info:
[root@qe-dma126-master-1 ~]# docker run --sysctl='kernel.shm_rmid_forced=hello' docker.io/ocpqe/hello-pod
container_linux.go:247: starting container process caused "process_linux.go:359: container init caused \"write /proc/sys/kernel/shm_rmid_forced: invalid argument\""
/usr/bin/docker-current: Error response from daemon: invalid header field value "oci runtime error: container_linux.go:247: starting container process caused \"process_linux.go:359: container init caused \\\"write /proc/sys/kernel/shm_rmid_forced: invalid argument\\\"\"\n".

Comment 2 Yan Du 2017-06-29 08:49:39 UTC

Met same issue with invalid annotation in pod:
If we add some invalid egress-bandwidth/ingress-bandwidth in pod，
eg:      "annotations": {
            "kubernetes.io/egress-bandwidth": "-10M",
            "kubernetes.io/ingress-bandwidth": "-3M"

We could get some meaningful message when describe the pod in OCP-3.5:
[root@host-8-175-59 ~]# oc describe pod iperf
Name:			iperf
Namespace:		d1
Security Policy:	anyuid
Node:			host-8-175-81.host.centralci.eng.rdu2.redhat.com/10.8.175.81
Start Time:		Thu, 29 Jun 2017 04:42:53 -0400
Labels:			<none>
Status:			Pending
IP:			
Controllers:		<none>
Containers:
  iperf:
    Container ID:	
    Image:		yadu/hello-openshift-iperf
    Image ID:		
    Port:		
    State:		Waiting
      Reason:		ContainerCreating
    Ready:		False
    Restart Count:	0
    Volume Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-4hg0k (ro)
    Environment Variables:	<none>
Conditions:
  Type		Status
  Initialized 	True 
  Ready 	False 
  PodScheduled 	True 
Volumes:
  default-token-4hg0k:
    Type:	Secret (a volume populated by a Secret)
    SecretName:	default-token-4hg0k
QoS Class:	BestEffort
Tolerations:	<none>
Events:
  FirstSeen	LastSeen	Count	From								SubObjectPath	TypeReason		Message
  ---------	--------	-----	----								-------------	--------	------		-------
  21s		21s		1	{default-scheduler }								Normal		Scheduled	Successfully assigned iperf to host-8-175-81.host.centralci.eng.rdu2.redhat.com
  20s		1s		20	{kubelet host-8-175-81.host.centralci.eng.rdu2.redhat.com}			Warning		FailedSync	Error syncing pod, skipping: failed to "SetupNetwork" for "iperf_d1" with SetupNetworkError: "Failed to setup network for pod \"iperf_d1(f107bae3-5ca6-11e7-ab56-fa163e27a692)\" using network plugins \"cni\": CNI request failed with status 400: 'failed to parse pod bandwidth: resource is unreasonably small (< 1kbit)\n'; Skipping pod"

But now in OCP-3.6, we could not get such warnings.

Comment 3 Derek Carr 2017-06-29 21:13:28 UTC

Scenario 1:

$ oc new-app --file=https://raw.githubusercontent.com/openshift-qe/v3-testfiles/master/quota/quota_template.yaml --param=CPU_VALUE\=20 --param=MEM_VALUE\=1Gi --param=PV_VALUE\=10 --param=POD_VALUE\=10 --param=RC_VALUE\=20 --param=RQ_VALUE\=1 --param=SECRET_VALUE\=10 --param=SVC_VALUE\=5 
$ oc run hello-pod --image=openshift/hello-openshift --replicas=1
$ oc describe dc/hello-pod
see no event

wait ~3m, and i see the following event:

$ oc describe dc
Name:		hello-pod
Namespace:	myproject
Created:	4 minutes ago
Labels:		run=hello-pod
Annotations:	<none>
Latest Version:	1
Selector:	run=hello-pod
Replicas:	1
Triggers:	Config
Strategy:	Rolling
Template:
Pod Template:
  Labels:	run=hello-pod
  Containers:
   hello-pod:
    Image:		openshift/hello-openshift
    Port:		
    Environment:	<none>
    Mounts:		<none>
  Volumes:		<none>

Deployment #1 (latest):
	Name:		hello-pod-1
	Created:	4 minutes ago
	Status:		New
	Replicas:	0 current / 0 desired
	Selector:	deployment=hello-pod-1,deploymentconfig=hello-pod,run=hello-pod
	Labels:		openshift.io/deployment-config.name=hello-pod,run=hello-pod
	Pods Status:	0 Running / 0 Waiting / 0 Succeeded / 0 Failed

Events:
  FirstSeen	LastSeen	Count	From				SubObjectPath	Type		Reason			Message
  ---------	--------	-----	----				-------------	--------	------			-------
  4m		4m		1	deploymentconfig-controller			Normal		DeploymentCreated	Created new replication controller "hello-pod-1" for version 1
  1m		1m		1	deployer-controller				Warning		FailedRetry		hello-pod-1: About to stop retrying "hello-pod-1": couldn't create deployer pod for "myproject/hello-pod-1": pods "hello-pod-1-deploy" is forbidden: failed quota: myquota: must specify cpu,memory

Comment 4 Derek Carr 2017-06-29 21:18:07 UTC

To improve the experience, I have opened a PR to send a FailedCreate event when the deployment controller is unable to create the deploy pod right away.

See: https://github.com/openshift/origin/pull/14970

$ oc describe dc
Name:		hello-pod
Namespace:	myproject
Created:	2 minutes ago
Labels:		run=hello-pod
Annotations:	<none>
Latest Version:	1
Selector:	run=hello-pod
Replicas:	1
Triggers:	Config
Strategy:	Rolling
Template:
Pod Template:
  Labels:	run=hello-pod
  Containers:
   hello-pod:
    Image:		openshift/hello-openshift
    Port:		
    Environment:	<none>
    Mounts:		<none>
  Volumes:		<none>

Deployment #1 (latest):
	Name:		hello-pod-1
	Created:	2 minutes ago
	Status:		New
	Replicas:	0 current / 0 desired
	Selector:	deployment=hello-pod-1,deploymentconfig=hello-pod,run=hello-pod
	Labels:		openshift.io/deployment-config.name=hello-pod,run=hello-pod
	Pods Status:	0 Running / 0 Waiting / 0 Succeeded / 0 Failed

Events:
  FirstSeen	LastSeen	Count	From				SubObjectPath	Type		Reason			Message
  ---------	--------	-----	----				-------------	--------	------			-------
  2m		2m		1	deploymentconfig-controller			Normal		DeploymentCreated	Created new replication controller "hello-pod-1" for version 1
  13s		13s		1	deployer-controller				Warning		FailedRetry		Stop retrying: couldn't create deployer pod for "myproject/hello-pod-1": pods "hello-pod-1-deploy" is forbidden: failed quota: myquota: must specify cpu,memory
  2m		13s		16	deployer-controller				Warning		FailedCreate		Error creating deployer pod: pods "hello-pod-1-deploy" is forbidden: failed quota: myquota: must specify cpu,memory

Comment 5 Derek Carr 2017-06-30 14:36:46 UTC

The kubelet FailedSync event was intentionally changed to reduce etcd event spam in PR: https://github.com/openshift/origin/pull/14693

we will look to get a more distinct network failed event in the future.

Comment 6 Eric Paris 2017-06-30 15:54:28 UTC

We consider
https://github.com/openshift/origin/pull/14970
The fix for this PR

Comment 8 DeShuai Ma 2017-07-05 08:44:48 UTC

Verified on openshift v3.6.133.

# oc run hello-pod --image=docker.io/ocpqe/hello-pod --replicas 1 -n dma
deploymentconfig "hello-pod" created
[root@ip-172-18-4-137 ~]# oc get dc -n dma
NAME        REVISION   DESIRED   CURRENT   TRIGGERED BY
hello-pod   1          1         0         config
[root@ip-172-18-4-137 ~]# oc describe dc/hello-pod -n dma
Name:		hello-pod
Namespace:	dma
Created:	30 seconds ago
Labels:		run=hello-pod
Annotations:	<none>
Latest Version:	1
Selector:	run=hello-pod
Replicas:	1
Triggers:	Config
Strategy:	Rolling
Template:
Pod Template:
  Labels:	run=hello-pod
  Containers:
   hello-pod:
    Image:		docker.io/ocpqe/hello-pod
    Port:		
    Environment:	<none>
    Mounts:		<none>
  Volumes:		<none>

Deployment #1 (latest):
	Name:		hello-pod-1
	Created:	30 seconds ago
	Status:		New
	Replicas:	0 current / 0 desired
	Selector:	deployment=hello-pod-1,deploymentconfig=hello-pod,run=hello-pod
	Labels:		openshift.io/deployment-config.name=hello-pod,run=hello-pod
	Pods Status:	0 Running / 0 Waiting / 0 Succeeded / 0 Failed

Events:
  FirstSeen	LastSeen	Count	From				SubObjectPath	Type		Reason			Message
  ---------	--------	-----	----				-------------	--------	------			-------
  30s		30s		1	deploymentconfig-controller			Normal		DeploymentCreated	Created new replication controller "hello-pod-1" for version 1
  30s		9s		13	deployer-controller				Warning		FailedCreate		Error creating deployer pod: pods "hello-pod-1-deploy" is forbidden: failed quota: myquota: must specify cpu,memory

Comment 12 errata-xmlrpc 2017-11-28 21:58:46 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:3188