Bug 1465801 - Some events record in ocp36 is different from ocp35
Some events record in ocp36 is different from ocp35
Status: CLOSED ERRATA
Product: OpenShift Container Platform
Classification: Red Hat
Component: Pod (Show other bugs)
3.6.0
Unspecified Unspecified
medium Severity medium
: ---
: 3.7.0
Assigned To: Derek Carr
DeShuai Ma
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2017-06-28 04:55 EDT by DeShuai Ma
Modified: 2017-11-28 16:58 EST (History)
7 users (show)

See Also:
Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2017-11-28 16:58:46 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description DeShuai Ma 2017-06-28 04:55:18 EDT
Description of problem:
When quota forbid dc create pod, describe dc there is no event record.
For rc there is 'FailedCreate' event

Version-Release number of selected component (if applicable):
openshift v3.6.126
kubernetes v1.6.1+5115d708d7
etcd 3.2.0

How reproducible:
Always

Steps to Reproduce:
1.Create quota in project
#oc new-app --file=https://raw.githubusercontent.com/openshift-qe/v3-testfiles/master/quota/quota_template.yaml --param=CPU_VALUE\=20 --param=MEM_VALUE\=1Gi --param=PV_VALUE\=10 --param=POD_VALUE\=10 --param=RC_VALUE\=20 --param=RQ_VALUE\=1 --param=SECRET_VALUE\=10 --param=SVC_VALUE\=5 -n dma

2.Create dc without cpu,memory info
[root@qe-dma126-master-1 dma]# oc run hello-pod --image=docker.io/ocpqe/hello-pod --replicas 1 -n dma
deploymentconfig "hello-pod" created

3.Describe the dc
[root@qe-dma126-master-1 dma]# oc get dc -n dma
NAME        REVISION   DESIRED   CURRENT   TRIGGERED BY
hello-pod   1          1         0         config
[root@qe-dma126-master-1 dma]# oc describe dc/hello-pod -n dma
Name:		hello-pod
Namespace:	dma
Created:	2 minutes ago
Labels:		run=hello-pod
Annotations:	<none>
Latest Version:	1
Selector:	run=hello-pod
Replicas:	1
Triggers:	Config
Strategy:	Rolling
Template:
Pod Template:
  Labels:	run=hello-pod
  Containers:
   hello-pod:
    Image:		docker.io/ocpqe/hello-pod
    Port:		
    Environment:	<none>
    Mounts:		<none>
  Volumes:		<none>

Deployment #1 (latest):
	Name:		hello-pod-1
	Created:	2 minutes ago
	Status:		New
	Replicas:	0 current / 0 desired
	Selector:	deployment=hello-pod-1,deploymentconfig=hello-pod,run=hello-pod
	Labels:		openshift.io/deployment-config.name=hello-pod,run=hello-pod
	Pods Status:	0 Running / 0 Waiting / 0 Succeeded / 0 Failed

Events:
  FirstSeen	LastSeen	Count	From				SubObjectPath	Type		Reason			Message
  ---------	--------	-----	----				-------------	--------	------			-------
  2m		2m		1	deploymentconfig-controller			Normal		DeploymentCreated	Created new replication controller "hello-pod-1" for version 1

Actual results:


Expected results:
This is should have event: Error creating: pods "hello-pod-" is forbidden: failed quota: myquota: must specify cpu,memory

Additional info:
1) Only create pod, there has forbidden event
2) Only create rc, there also has this event
Comment 1 DeShuai Ma 2017-06-28 05:28:54 EDT
Description of problem:
Another event related issue:
When container failed to start, the event don't have any detail failed reason and keep ContainerCreating.
In ocp3.5 we have detail reason said "...Error syncing pod. .... shm_rmid_forced: invalid argument" But in ocp36 in event can't see what's wrong.

Version-Release number of selected component (if applicable):
openshift v3.6.126
kubernetes v1.6.1+5115d708d7
etcd 3.2.0

Steps:
[root@qe-dma126-master-1 ~]# oc create -f https://raw.githubusercontent.com/openshift-qe/v3-testfiles/master/pods/sysctls/pod-sysctl-safe-invalid2.yaml -n test1
pod "hello-pod" created

[root@qe-dma126-master-1 ~]# oc get po -n test1
NAME        READY     STATUS              RESTARTS   AGE
hello-pod   0/1       ContainerCreating   0          1m
[root@qe-dma126-master-1 ~]# oc describe po hello-pod -n test1
Name:			hello-pod
Namespace:		test1
Security Policy:	anyuid
Node:			qe-dma126-node-registry-router-1/10.240.0.50
Start Time:		Wed, 28 Jun 2017 05:20:00 -0400
Labels:			name=hello-pod
Annotations:		openshift.io/scc=anyuid
			security.alpha.kubernetes.io/sysctls=kernel.shm_rmid_forced=hello
Status:			Pending
IP:			
Controllers:		<none>
Containers:
  hello-pod:
    Container ID:	
    Image:		docker.io/ocpqe/hello-pod:latest
    Image ID:		
    Port:		8080/TCP
    State:		Waiting
      Reason:		ContainerCreating
    Ready:		False
    Restart Count:	0
    Environment:	<none>
    Mounts:
      /tmp from tmp (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-qv1mb (ro)
Conditions:
  Type		Status
  Initialized 	True 
  Ready 	False 
  PodScheduled 	True 
Volumes:
  tmp:
    Type:	EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:	
  default-token-qv1mb:
    Type:	Secret (a volume populated by a Secret)
    SecretName:	default-token-qv1mb
    Optional:	false
QoS Class:	BestEffort
Node-Selectors:	<none>
Tolerations:	<none>
Events:
  FirstSeen	LastSeen	Count	From						SubObjectPath	Type		Reason			Message
  ---------	--------	-----	----						-------------	--------	------			-------
  1m		1m		1	default-scheduler						Normal		Scheduled		Successfully assigned hello-pod to qe-dma126-node-registry-router-1
  1m		7s		32	kubelet, qe-dma126-node-registry-router-1			Warning		DNSSearchForming	Found and omitted duplicated dns domain in host search line: 'cluster.local' during merging with cluster dns domains
  1m		5s		32	kubelet, qe-dma126-node-registry-router-1			Normal		SandboxChanged		Pod sandbox changed, it will be killed and re-created.
  1m		4s		33	kubelet, qe-dma126-node-registry-router-1			Warning		FailedSync		Error syncing pod

Expected results:
Should show container failed reason "/proc/sys/kernel/shm_rmid_forced: invalid argument" in event.

Additional info:
[root@qe-dma126-master-1 ~]# docker run --sysctl='kernel.shm_rmid_forced=hello' docker.io/ocpqe/hello-pod
container_linux.go:247: starting container process caused "process_linux.go:359: container init caused \"write /proc/sys/kernel/shm_rmid_forced: invalid argument\""
/usr/bin/docker-current: Error response from daemon: invalid header field value "oci runtime error: container_linux.go:247: starting container process caused \"process_linux.go:359: container init caused \\\"write /proc/sys/kernel/shm_rmid_forced: invalid argument\\\"\"\n".
Comment 2 Yan Du 2017-06-29 04:49:39 EDT
Met same issue with invalid annotation in pod:
If we add some invalid egress-bandwidth/ingress-bandwidth in pod,
eg:      "annotations": {
            "kubernetes.io/egress-bandwidth": "-10M",
            "kubernetes.io/ingress-bandwidth": "-3M"

We could get some meaningful message when describe the pod in OCP-3.5:
[root@host-8-175-59 ~]# oc describe pod iperf
Name:			iperf
Namespace:		d1
Security Policy:	anyuid
Node:			host-8-175-81.host.centralci.eng.rdu2.redhat.com/10.8.175.81
Start Time:		Thu, 29 Jun 2017 04:42:53 -0400
Labels:			<none>
Status:			Pending
IP:			
Controllers:		<none>
Containers:
  iperf:
    Container ID:	
    Image:		yadu/hello-openshift-iperf
    Image ID:		
    Port:		
    State:		Waiting
      Reason:		ContainerCreating
    Ready:		False
    Restart Count:	0
    Volume Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-4hg0k (ro)
    Environment Variables:	<none>
Conditions:
  Type		Status
  Initialized 	True 
  Ready 	False 
  PodScheduled 	True 
Volumes:
  default-token-4hg0k:
    Type:	Secret (a volume populated by a Secret)
    SecretName:	default-token-4hg0k
QoS Class:	BestEffort
Tolerations:	<none>
Events:
  FirstSeen	LastSeen	Count	From								SubObjectPath	TypeReason		Message
  ---------	--------	-----	----								-------------	--------	------		-------
  21s		21s		1	{default-scheduler }								Normal		Scheduled	Successfully assigned iperf to host-8-175-81.host.centralci.eng.rdu2.redhat.com
  20s		1s		20	{kubelet host-8-175-81.host.centralci.eng.rdu2.redhat.com}			Warning		FailedSync	Error syncing pod, skipping: failed to "SetupNetwork" for "iperf_d1" with SetupNetworkError: "Failed to setup network for pod \"iperf_d1(f107bae3-5ca6-11e7-ab56-fa163e27a692)\" using network plugins \"cni\": CNI request failed with status 400: 'failed to parse pod bandwidth: resource is unreasonably small (< 1kbit)\n'; Skipping pod"

But now in OCP-3.6, we could not get such warnings.
Comment 3 Derek Carr 2017-06-29 17:13:28 EDT
Scenario 1:

$ oc new-app --file=https://raw.githubusercontent.com/openshift-qe/v3-testfiles/master/quota/quota_template.yaml --param=CPU_VALUE\=20 --param=MEM_VALUE\=1Gi --param=PV_VALUE\=10 --param=POD_VALUE\=10 --param=RC_VALUE\=20 --param=RQ_VALUE\=1 --param=SECRET_VALUE\=10 --param=SVC_VALUE\=5 
$ oc run hello-pod --image=openshift/hello-openshift --replicas=1
$ oc describe dc/hello-pod
see no event

wait ~3m, and i see the following event:

$ oc describe dc
Name:		hello-pod
Namespace:	myproject
Created:	4 minutes ago
Labels:		run=hello-pod
Annotations:	<none>
Latest Version:	1
Selector:	run=hello-pod
Replicas:	1
Triggers:	Config
Strategy:	Rolling
Template:
Pod Template:
  Labels:	run=hello-pod
  Containers:
   hello-pod:
    Image:		openshift/hello-openshift
    Port:		
    Environment:	<none>
    Mounts:		<none>
  Volumes:		<none>

Deployment #1 (latest):
	Name:		hello-pod-1
	Created:	4 minutes ago
	Status:		New
	Replicas:	0 current / 0 desired
	Selector:	deployment=hello-pod-1,deploymentconfig=hello-pod,run=hello-pod
	Labels:		openshift.io/deployment-config.name=hello-pod,run=hello-pod
	Pods Status:	0 Running / 0 Waiting / 0 Succeeded / 0 Failed

Events:
  FirstSeen	LastSeen	Count	From				SubObjectPath	Type		Reason			Message
  ---------	--------	-----	----				-------------	--------	------			-------
  4m		4m		1	deploymentconfig-controller			Normal		DeploymentCreated	Created new replication controller "hello-pod-1" for version 1
  1m		1m		1	deployer-controller				Warning		FailedRetry		hello-pod-1: About to stop retrying "hello-pod-1": couldn't create deployer pod for "myproject/hello-pod-1": pods "hello-pod-1-deploy" is forbidden: failed quota: myquota: must specify cpu,memory
Comment 4 Derek Carr 2017-06-29 17:18:07 EDT
To improve the experience, I have opened a PR to send a FailedCreate event when the deployment controller is unable to create the deploy pod right away.

See: https://github.com/openshift/origin/pull/14970

$ oc describe dc
Name:		hello-pod
Namespace:	myproject
Created:	2 minutes ago
Labels:		run=hello-pod
Annotations:	<none>
Latest Version:	1
Selector:	run=hello-pod
Replicas:	1
Triggers:	Config
Strategy:	Rolling
Template:
Pod Template:
  Labels:	run=hello-pod
  Containers:
   hello-pod:
    Image:		openshift/hello-openshift
    Port:		
    Environment:	<none>
    Mounts:		<none>
  Volumes:		<none>

Deployment #1 (latest):
	Name:		hello-pod-1
	Created:	2 minutes ago
	Status:		New
	Replicas:	0 current / 0 desired
	Selector:	deployment=hello-pod-1,deploymentconfig=hello-pod,run=hello-pod
	Labels:		openshift.io/deployment-config.name=hello-pod,run=hello-pod
	Pods Status:	0 Running / 0 Waiting / 0 Succeeded / 0 Failed

Events:
  FirstSeen	LastSeen	Count	From				SubObjectPath	Type		Reason			Message
  ---------	--------	-----	----				-------------	--------	------			-------
  2m		2m		1	deploymentconfig-controller			Normal		DeploymentCreated	Created new replication controller "hello-pod-1" for version 1
  13s		13s		1	deployer-controller				Warning		FailedRetry		Stop retrying: couldn't create deployer pod for "myproject/hello-pod-1": pods "hello-pod-1-deploy" is forbidden: failed quota: myquota: must specify cpu,memory
  2m		13s		16	deployer-controller				Warning		FailedCreate		Error creating deployer pod: pods "hello-pod-1-deploy" is forbidden: failed quota: myquota: must specify cpu,memory
Comment 5 Derek Carr 2017-06-30 10:36:46 EDT
The kubelet FailedSync event was intentionally changed to reduce etcd event spam in PR: https://github.com/openshift/origin/pull/14693

we will look to get a more distinct network failed event in the future.
Comment 6 Eric Paris 2017-06-30 11:54:28 EDT
We consider
https://github.com/openshift/origin/pull/14970
The fix for this PR
Comment 8 DeShuai Ma 2017-07-05 04:44:48 EDT
Verified on openshift v3.6.133.

# oc run hello-pod --image=docker.io/ocpqe/hello-pod --replicas 1 -n dma
deploymentconfig "hello-pod" created
[root@ip-172-18-4-137 ~]# oc get dc -n dma
NAME        REVISION   DESIRED   CURRENT   TRIGGERED BY
hello-pod   1          1         0         config
[root@ip-172-18-4-137 ~]# oc describe dc/hello-pod -n dma
Name:		hello-pod
Namespace:	dma
Created:	30 seconds ago
Labels:		run=hello-pod
Annotations:	<none>
Latest Version:	1
Selector:	run=hello-pod
Replicas:	1
Triggers:	Config
Strategy:	Rolling
Template:
Pod Template:
  Labels:	run=hello-pod
  Containers:
   hello-pod:
    Image:		docker.io/ocpqe/hello-pod
    Port:		
    Environment:	<none>
    Mounts:		<none>
  Volumes:		<none>

Deployment #1 (latest):
	Name:		hello-pod-1
	Created:	30 seconds ago
	Status:		New
	Replicas:	0 current / 0 desired
	Selector:	deployment=hello-pod-1,deploymentconfig=hello-pod,run=hello-pod
	Labels:		openshift.io/deployment-config.name=hello-pod,run=hello-pod
	Pods Status:	0 Running / 0 Waiting / 0 Succeeded / 0 Failed

Events:
  FirstSeen	LastSeen	Count	From				SubObjectPath	Type		Reason			Message
  ---------	--------	-----	----				-------------	--------	------			-------
  30s		30s		1	deploymentconfig-controller			Normal		DeploymentCreated	Created new replication controller "hello-pod-1" for version 1
  30s		9s		13	deployer-controller				Warning		FailedCreate		Error creating deployer pod: pods "hello-pod-1-deploy" is forbidden: failed quota: myquota: must specify cpu,memory
Comment 12 errata-xmlrpc 2017-11-28 16:58:46 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:3188

Note You need to log in before you can comment on or make changes to this bug.