Hide Forgot
Description of problem: When quota forbid dc create pod, describe dc there is no event record. For rc there is 'FailedCreate' event Version-Release number of selected component (if applicable): openshift v3.6.126 kubernetes v1.6.1+5115d708d7 etcd 3.2.0 How reproducible: Always Steps to Reproduce: 1.Create quota in project #oc new-app --file=https://raw.githubusercontent.com/openshift-qe/v3-testfiles/master/quota/quota_template.yaml --param=CPU_VALUE\=20 --param=MEM_VALUE\=1Gi --param=PV_VALUE\=10 --param=POD_VALUE\=10 --param=RC_VALUE\=20 --param=RQ_VALUE\=1 --param=SECRET_VALUE\=10 --param=SVC_VALUE\=5 -n dma 2.Create dc without cpu,memory info [root@qe-dma126-master-1 dma]# oc run hello-pod --image=docker.io/ocpqe/hello-pod --replicas 1 -n dma deploymentconfig "hello-pod" created 3.Describe the dc [root@qe-dma126-master-1 dma]# oc get dc -n dma NAME REVISION DESIRED CURRENT TRIGGERED BY hello-pod 1 1 0 config [root@qe-dma126-master-1 dma]# oc describe dc/hello-pod -n dma Name: hello-pod Namespace: dma Created: 2 minutes ago Labels: run=hello-pod Annotations: <none> Latest Version: 1 Selector: run=hello-pod Replicas: 1 Triggers: Config Strategy: Rolling Template: Pod Template: Labels: run=hello-pod Containers: hello-pod: Image: docker.io/ocpqe/hello-pod Port: Environment: <none> Mounts: <none> Volumes: <none> Deployment #1 (latest): Name: hello-pod-1 Created: 2 minutes ago Status: New Replicas: 0 current / 0 desired Selector: deployment=hello-pod-1,deploymentconfig=hello-pod,run=hello-pod Labels: openshift.io/deployment-config.name=hello-pod,run=hello-pod Pods Status: 0 Running / 0 Waiting / 0 Succeeded / 0 Failed Events: FirstSeen LastSeen Count From SubObjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ ------- 2m 2m 1 deploymentconfig-controller Normal DeploymentCreated Created new replication controller "hello-pod-1" for version 1 Actual results: Expected results: This is should have event: Error creating: pods "hello-pod-" is forbidden: failed quota: myquota: must specify cpu,memory Additional info: 1) Only create pod, there has forbidden event 2) Only create rc, there also has this event
Description of problem: Another event related issue: When container failed to start, the event don't have any detail failed reason and keep ContainerCreating. In ocp3.5 we have detail reason said "...Error syncing pod. .... shm_rmid_forced: invalid argument" But in ocp36 in event can't see what's wrong. Version-Release number of selected component (if applicable): openshift v3.6.126 kubernetes v1.6.1+5115d708d7 etcd 3.2.0 Steps: [root@qe-dma126-master-1 ~]# oc create -f https://raw.githubusercontent.com/openshift-qe/v3-testfiles/master/pods/sysctls/pod-sysctl-safe-invalid2.yaml -n test1 pod "hello-pod" created [root@qe-dma126-master-1 ~]# oc get po -n test1 NAME READY STATUS RESTARTS AGE hello-pod 0/1 ContainerCreating 0 1m [root@qe-dma126-master-1 ~]# oc describe po hello-pod -n test1 Name: hello-pod Namespace: test1 Security Policy: anyuid Node: qe-dma126-node-registry-router-1/10.240.0.50 Start Time: Wed, 28 Jun 2017 05:20:00 -0400 Labels: name=hello-pod Annotations: openshift.io/scc=anyuid security.alpha.kubernetes.io/sysctls=kernel.shm_rmid_forced=hello Status: Pending IP: Controllers: <none> Containers: hello-pod: Container ID: Image: docker.io/ocpqe/hello-pod:latest Image ID: Port: 8080/TCP State: Waiting Reason: ContainerCreating Ready: False Restart Count: 0 Environment: <none> Mounts: /tmp from tmp (rw) /var/run/secrets/kubernetes.io/serviceaccount from default-token-qv1mb (ro) Conditions: Type Status Initialized True Ready False PodScheduled True Volumes: tmp: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: default-token-qv1mb: Type: Secret (a volume populated by a Secret) SecretName: default-token-qv1mb Optional: false QoS Class: BestEffort Node-Selectors: <none> Tolerations: <none> Events: FirstSeen LastSeen Count From SubObjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ ------- 1m 1m 1 default-scheduler Normal Scheduled Successfully assigned hello-pod to qe-dma126-node-registry-router-1 1m 7s 32 kubelet, qe-dma126-node-registry-router-1 Warning DNSSearchForming Found and omitted duplicated dns domain in host search line: 'cluster.local' during merging with cluster dns domains 1m 5s 32 kubelet, qe-dma126-node-registry-router-1 Normal SandboxChanged Pod sandbox changed, it will be killed and re-created. 1m 4s 33 kubelet, qe-dma126-node-registry-router-1 Warning FailedSync Error syncing pod Expected results: Should show container failed reason "/proc/sys/kernel/shm_rmid_forced: invalid argument" in event. Additional info: [root@qe-dma126-master-1 ~]# docker run --sysctl='kernel.shm_rmid_forced=hello' docker.io/ocpqe/hello-pod container_linux.go:247: starting container process caused "process_linux.go:359: container init caused \"write /proc/sys/kernel/shm_rmid_forced: invalid argument\"" /usr/bin/docker-current: Error response from daemon: invalid header field value "oci runtime error: container_linux.go:247: starting container process caused \"process_linux.go:359: container init caused \\\"write /proc/sys/kernel/shm_rmid_forced: invalid argument\\\"\"\n".
Met same issue with invalid annotation in pod: If we add some invalid egress-bandwidth/ingress-bandwidth in pod, eg: "annotations": { "kubernetes.io/egress-bandwidth": "-10M", "kubernetes.io/ingress-bandwidth": "-3M" We could get some meaningful message when describe the pod in OCP-3.5: [root@host-8-175-59 ~]# oc describe pod iperf Name: iperf Namespace: d1 Security Policy: anyuid Node: host-8-175-81.host.centralci.eng.rdu2.redhat.com/10.8.175.81 Start Time: Thu, 29 Jun 2017 04:42:53 -0400 Labels: <none> Status: Pending IP: Controllers: <none> Containers: iperf: Container ID: Image: yadu/hello-openshift-iperf Image ID: Port: State: Waiting Reason: ContainerCreating Ready: False Restart Count: 0 Volume Mounts: /var/run/secrets/kubernetes.io/serviceaccount from default-token-4hg0k (ro) Environment Variables: <none> Conditions: Type Status Initialized True Ready False PodScheduled True Volumes: default-token-4hg0k: Type: Secret (a volume populated by a Secret) SecretName: default-token-4hg0k QoS Class: BestEffort Tolerations: <none> Events: FirstSeen LastSeen Count From SubObjectPath TypeReason Message --------- -------- ----- ---- ------------- -------- ------ ------- 21s 21s 1 {default-scheduler } Normal Scheduled Successfully assigned iperf to host-8-175-81.host.centralci.eng.rdu2.redhat.com 20s 1s 20 {kubelet host-8-175-81.host.centralci.eng.rdu2.redhat.com} Warning FailedSync Error syncing pod, skipping: failed to "SetupNetwork" for "iperf_d1" with SetupNetworkError: "Failed to setup network for pod \"iperf_d1(f107bae3-5ca6-11e7-ab56-fa163e27a692)\" using network plugins \"cni\": CNI request failed with status 400: 'failed to parse pod bandwidth: resource is unreasonably small (< 1kbit)\n'; Skipping pod" But now in OCP-3.6, we could not get such warnings.
Scenario 1: $ oc new-app --file=https://raw.githubusercontent.com/openshift-qe/v3-testfiles/master/quota/quota_template.yaml --param=CPU_VALUE\=20 --param=MEM_VALUE\=1Gi --param=PV_VALUE\=10 --param=POD_VALUE\=10 --param=RC_VALUE\=20 --param=RQ_VALUE\=1 --param=SECRET_VALUE\=10 --param=SVC_VALUE\=5 $ oc run hello-pod --image=openshift/hello-openshift --replicas=1 $ oc describe dc/hello-pod see no event wait ~3m, and i see the following event: $ oc describe dc Name: hello-pod Namespace: myproject Created: 4 minutes ago Labels: run=hello-pod Annotations: <none> Latest Version: 1 Selector: run=hello-pod Replicas: 1 Triggers: Config Strategy: Rolling Template: Pod Template: Labels: run=hello-pod Containers: hello-pod: Image: openshift/hello-openshift Port: Environment: <none> Mounts: <none> Volumes: <none> Deployment #1 (latest): Name: hello-pod-1 Created: 4 minutes ago Status: New Replicas: 0 current / 0 desired Selector: deployment=hello-pod-1,deploymentconfig=hello-pod,run=hello-pod Labels: openshift.io/deployment-config.name=hello-pod,run=hello-pod Pods Status: 0 Running / 0 Waiting / 0 Succeeded / 0 Failed Events: FirstSeen LastSeen Count From SubObjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ ------- 4m 4m 1 deploymentconfig-controller Normal DeploymentCreated Created new replication controller "hello-pod-1" for version 1 1m 1m 1 deployer-controller Warning FailedRetry hello-pod-1: About to stop retrying "hello-pod-1": couldn't create deployer pod for "myproject/hello-pod-1": pods "hello-pod-1-deploy" is forbidden: failed quota: myquota: must specify cpu,memory
To improve the experience, I have opened a PR to send a FailedCreate event when the deployment controller is unable to create the deploy pod right away. See: https://github.com/openshift/origin/pull/14970 $ oc describe dc Name: hello-pod Namespace: myproject Created: 2 minutes ago Labels: run=hello-pod Annotations: <none> Latest Version: 1 Selector: run=hello-pod Replicas: 1 Triggers: Config Strategy: Rolling Template: Pod Template: Labels: run=hello-pod Containers: hello-pod: Image: openshift/hello-openshift Port: Environment: <none> Mounts: <none> Volumes: <none> Deployment #1 (latest): Name: hello-pod-1 Created: 2 minutes ago Status: New Replicas: 0 current / 0 desired Selector: deployment=hello-pod-1,deploymentconfig=hello-pod,run=hello-pod Labels: openshift.io/deployment-config.name=hello-pod,run=hello-pod Pods Status: 0 Running / 0 Waiting / 0 Succeeded / 0 Failed Events: FirstSeen LastSeen Count From SubObjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ ------- 2m 2m 1 deploymentconfig-controller Normal DeploymentCreated Created new replication controller "hello-pod-1" for version 1 13s 13s 1 deployer-controller Warning FailedRetry Stop retrying: couldn't create deployer pod for "myproject/hello-pod-1": pods "hello-pod-1-deploy" is forbidden: failed quota: myquota: must specify cpu,memory 2m 13s 16 deployer-controller Warning FailedCreate Error creating deployer pod: pods "hello-pod-1-deploy" is forbidden: failed quota: myquota: must specify cpu,memory
The kubelet FailedSync event was intentionally changed to reduce etcd event spam in PR: https://github.com/openshift/origin/pull/14693 we will look to get a more distinct network failed event in the future.
We consider https://github.com/openshift/origin/pull/14970 The fix for this PR
Verified on openshift v3.6.133. # oc run hello-pod --image=docker.io/ocpqe/hello-pod --replicas 1 -n dma deploymentconfig "hello-pod" created [root@ip-172-18-4-137 ~]# oc get dc -n dma NAME REVISION DESIRED CURRENT TRIGGERED BY hello-pod 1 1 0 config [root@ip-172-18-4-137 ~]# oc describe dc/hello-pod -n dma Name: hello-pod Namespace: dma Created: 30 seconds ago Labels: run=hello-pod Annotations: <none> Latest Version: 1 Selector: run=hello-pod Replicas: 1 Triggers: Config Strategy: Rolling Template: Pod Template: Labels: run=hello-pod Containers: hello-pod: Image: docker.io/ocpqe/hello-pod Port: Environment: <none> Mounts: <none> Volumes: <none> Deployment #1 (latest): Name: hello-pod-1 Created: 30 seconds ago Status: New Replicas: 0 current / 0 desired Selector: deployment=hello-pod-1,deploymentconfig=hello-pod,run=hello-pod Labels: openshift.io/deployment-config.name=hello-pod,run=hello-pod Pods Status: 0 Running / 0 Waiting / 0 Succeeded / 0 Failed Events: FirstSeen LastSeen Count From SubObjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ ------- 30s 30s 1 deploymentconfig-controller Normal DeploymentCreated Created new replication controller "hello-pod-1" for version 1 30s 9s 13 deployer-controller Warning FailedCreate Error creating deployer pod: pods "hello-pod-1-deploy" is forbidden: failed quota: myquota: must specify cpu,memory
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:3188