Hide Forgot
Description of problem: The pod was stuck in ContainerCreating state. Which was build with a build config. Events noted: Events: FirstSeen LastSeen Count From SubObjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ ------- 4m 4m 1 default-scheduler Normal Scheduled Successfully assigned mongodb-replicaset-blockchain-7-build to ip-10-x-x-x.abc 4m 4m 12 kubelet, ip-10-x-x-x.abc Normal SandboxChanged Pod sandbox changed, it will be killed and re-created. 4m 4m 13 kubelet, ip-10-x-x-x.abc Warning FailedSync Error syncing pod On node the following error message was logged number of times: <snip> un 18 17:16:30 xyz.com atomic-openshift-node[68991]: E0618 17:16:30.569465 69062 pod_workers.go:182] Error syncing pod 4d850f4a-72dd-11e8-8add-02a300ce655d ("mongodb-replica-5-build_abc(4d850f4a-72dd-11e8-8add-02a300ce655d)"), skipping: failed to "CreatePodSandbox" for "mongodb-replica-5-build_abc(4d850f4a-72dd-11e8-8add-02a300ce655d)" with CreatePodSandboxError: "CreatePodSandbox for pod \"mongodb-replica-5-build_abc(4d850f4a-72dd-11e8-8add-02a300ce655d)\" failed: rpc error: code = 2 desc = failed to start sandbox container for pod \"mongodb-replica-5-build\": Error response from daemon: {\"message\":\"invalid header field value \\\"oci runtime error: container_linux.go:247: starting container process caused \\\\\\\"process_linux.go:286: decoding sync type from init pipe caused \\\\\\\\\\\\\\\"read parent: connection reset by peer\\\\\\\\\\\\\\\"\\\\\\\"\\\\n\\\"\"}" </snip> The pod was running when the containers limits were removed from the build config. i.e for example Limits: cpu: 1 memory: 2 Actual results: Pod went into ContainerCreating state. Expected results: The logs should specify the root cause. Additional info:
This is tricky unfortunately. https://github.com/kubernetes/kubernetes/issues/49442 The problem is that the minimum memory limit is runtime dependent so we can code that knowledge into the kubelet. docker does not return a meaningful error to us from the CRI, so that is really an issue with docker. I'll send this over to Containers team to see if they want to improve the error message in that situation i.e. a container tries to start with a memroy limit below the 4MB minimum docker enforces.
We can fix this in CRI-O to improve the error message when the memory is too low.
https://github.com/kubernetes-incubator/cri-o/pull/1723 created for cri-o 1.11
Checked with cri-o-1.11.1-2.rhaos3.11.git1759204.el7.x86_64 And the fix is still not in, so move back to modified
Checked on # oc version oc v3.11.0-0.28.0 kubernetes v1.11.0+d4cacc0 features: Basic-Auth GSSAPI Kerberos SPNEGO Server https://qe-wjiang-master-etcd-1:8443 openshift v3.11.0-0.28.0 kubernetes v1.11.0+d4cacc0 with cri-o-1.11.2-1.rhaos3.11.git3eac3b2.el7.x86_64 And looks like patch still not in. # oc describe pods pod-lks6v Name: pod-lks6v Namespace: default Priority: 0 PriorityClassName: <none> Node: qe-wjiang-node-registry-router-1/10.240.0.115 Start Time: Thu, 06 Sep 2018 22:29:08 -0400 Labels: <none> Annotations: openshift.io/scc=anyuid Status: Pending IP: Containers: c1: Container ID: Image: openshift/hello-openshift:latest Image ID: Port: <none> Host Port: <none> State: Waiting Reason: ContainerCreating Ready: False Restart Count: 0 Limits: cpu: 1 memory: 1Ki Requests: cpu: 1 memory: 1Ki Environment: <none> Mounts: /var/run/secrets/kubernetes.io/serviceaccount from default-token-6s2kq (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: default-token-6s2kq: Type: Secret (a volume populated by a Secret) SecretName: default-token-6s2kq Optional: false QoS Class: Guaranteed Node-Selectors: <none> Tolerations: node.kubernetes.io/memory-pressure:NoSchedule Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 1m default-scheduler Successfully assigned default/pod-lks6v to qe-wjiang-node-registry-router-1 Warning FailedCreatePodSandBox 15s (x8 over 1m) kubelet, qe-wjiang-node-registry-router-1 Failed create pod sandbox: rpc error: code = Unknown desc = signal: killed
Checked with cri-o-1.11.4-2.rhaos3.11.gite0c89d8.el7_5.x86_64 on # oc version oc v3.11.6 kubernetes v1.11.0+d4cacc0 features: Basic-Auth GSSAPI Kerberos SPNEGO Server https://qe-wjiang-master-etcd-1:8443 openshift v3.11.6 kubernetes v1.11.0+d4cacc0 And the issue still not fixed in 1.11.4-2. # oc describe pods cluster-capacity-stub-container Name: cluster-capacity-stub-container Namespace: default Priority: 0 PriorityClassName: <none> Node: qe-wjiang-master-etcd-1/10.240.0.112 Start Time: Mon, 17 Sep 2018 04:33:56 -0400 Labels: <none> Annotations: openshift.io/scc=anyuid Status: Pending IP: Containers: cluster-capacity-stub-container: Container ID: Image: gcr.io/google_containers/pause:2.0 Image ID: Port: <none> Host Port: <none> State: Waiting Reason: ContainerCreating Ready: False Restart Count: 0 Limits: cpu: 100m memory: 1Ki Requests: cpu: 100m memory: 1Ki Environment: <none> Mounts: /var/run/secrets/kubernetes.io/serviceaccount from default-token-sbldp (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: default-token-sbldp: Type: Secret (a volume populated by a Secret) SecretName: default-token-sbldp Optional: false QoS Class: Guaranteed Node-Selectors: node-role.kubernetes.io/master=true Tolerations: node.kubernetes.io/memory-pressure:NoSchedule Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 11s default-scheduler Successfully assigned default/cluster-capacity-stub-container to qe-wjiang-master-etcd-1 Warning FailedCreatePodSandBox 11s kubelet, qe-wjiang-master-etcd-1 Failed create pod sandbox: rpc error: code = Unknown desc = signal: killed
Checked with $ oc version oc v4.0.0-0.82.0 kubernetes v1.11.0+6855010f70 features: Basic-Auth GSSAPI Kerberos SPNEGO Server https://192.168.99.6:8443 openshift v4.0.0-0.81.0 kubernetes v1.11.0+ba8f89f8f9 $ oc get nodes -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME app-node-0.wjiang-ocp.example.com Ready compute 1h v1.11.0+6855010f70 192.168.99.12 <none> Red Hat Enterprise Linux Server 7.6 (Maipo) 3.10.0-957.1.3.el7.x86_64 cri-o://1.12.0 infra-node-0.wjiang-ocp.example.com Ready infra 1h v1.11.0+6855010f70 192.168.99.15 <none> Red Hat Enterprise Linux Server 7.6 (Maipo) 3.10.0-957.1.3.el7.x86_64 cri-o://1.12.0 master-0.wjiang-ocp.example.com Ready master 1h v1.11.0+6855010f70 192.168.99.6 <none> Red Hat Enterprise Linux Server 7.6 (Maipo) 3.10.0-957.1.3.el7.x86_64 cri-o://1.12.0 $ rpm -qa|grep -i cri-o cri-o-1.12.0-3.rhaos4.0.git774a29e.el7.x86_64 And still can not got the expected error message: $ oc describe pods h-3-x975w Name: h-3-x975w Namespace: default Priority: 0 PriorityClassName: <none> Node: app-node-0.wjiang-ocp.example.com/192.168.99.12 Start Time: Tue, 04 Dec 2018 23:38:02 -0500 Labels: deployment=h-3 deploymentconfig=h run=h Annotations: openshift.io/deployment-config.latest-version=3 openshift.io/deployment-config.name=h openshift.io/deployment.name=h-3 openshift.io/scc=restricted Status: Pending IP: Controlled By: ReplicationController/h-3 Containers: h: Container ID: Image: openshift/hello-openshift Image ID: Port: <none> Host Port: <none> State: Waiting Reason: ContainerCreating Ready: False Restart Count: 0 Limits: cpu: 100m memory: 2Mi Requests: cpu: 100m memory: 2Mi Environment: <none> Mounts: /var/run/secrets/kubernetes.io/serviceaccount from default-token-wrxzg (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: default-token-wrxzg: Type: Secret (a volume populated by a Secret) SecretName: default-token-wrxzg Optional: false QoS Class: Guaranteed Node-Selectors: <none> Tolerations: node.kubernetes.io/memory-pressure:NoSchedule Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 10s default-scheduler Successfully assigned default/h-3-x975w to app-node-0.wjiang-ocp.example.com Warning FailedCreatePodSandBox 9s kubelet, app-node-0.wjiang-ocp.example.com Failed create pod sandbox: rpc error: code = Unknown desc = error reading container (probably exited) json message: EOF
Checked and still same output as https://bugzilla.redhat.com/show_bug.cgi?id=1595256#c17 ➜ ~ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.0.0-0.nightly-2019-04-10-182914 True False 128m Cluster version is 4.0.0-0.nightly-2019-04-10-182914 ➜ ~ oc describe pods -l run=h Name: h-1-qllgc Namespace: default Priority: 0 PriorityClassName: <none> Node: compute-0.gpei-vsp-0412.qe.devcluster.openshift.com/139.178.73.5 Start Time: Fri, 12 Apr 2019 17:29:11 +0800 Labels: deployment=h-1 deploymentconfig=h run=h Annotations: openshift.io/deployment-config.latest-version: 1 openshift.io/deployment-config.name: h openshift.io/deployment.name: h-1 Status: Pending IP: Controlled By: ReplicationController/h-1 Containers: h: Container ID: Image: openshift/hello-openshift:latest Image ID: Port: <none> Host Port: <none> State: Waiting Reason: ContainerCreating Ready: False Restart Count: 0 Limits: cpu: 100m memory: 1Mi Requests: cpu: 100m memory: 1Mi Environment: <none> Mounts: /var/run/secrets/kubernetes.io/serviceaccount from default-token-b99kl (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: default-token-b99kl: Type: Secret (a volume populated by a Secret) SecretName: default-token-b99kl Optional: false QoS Class: Guaranteed Node-Selectors: <none> Tolerations: node.kubernetes.io/memory-pressure:NoSchedule node.kubernetes.io/not-ready:NoExecute for 300s node.kubernetes.io/unreachable:NoExecute for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 36s default-scheduler Successfully assigned default/h-1-qllgc to compute-0.gpei-vsp-0412.qe.devcluster.openshift.com Warning FailedCreatePodSandBox 5s (x3 over 34s) kubelet, compute-0.gpei-vsp-0412.qe.devcluster.openshift.com Failed create pod sandbox: rpc error: code = Unknown desc = error reading container (probably exited) json message: EOF ➜ ~ oc version Client Version: version.Info{Major:"4", Minor:"1+", GitVersion:"v4.1.0", GitCommit:"7ad663e77", GitTreeState:"", BuildDate:"2019-04-11T22:43:58Z", GoVersion:"", Compiler:"", Platform:""} Server Version: version.Info{Major:"1", Minor:"13+", GitVersion:"v1.13.4+f174899", GitCommit:"f174899", GitTreeState:"clean", BuildDate:"2019-04-10T17:18:27Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"}
Recent changes in runc have needed a bump in minimum required memory. We will update the code to account for that.
Being fixed in https://github.com/cri-o/cri-o/pull/2263
The fix was merged and a new cri-o has been built: https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=882052
Checked with 4.1.0-0.nightly-2019-04-22-005054, and the issue finally fixed, thanks. # oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.1.0-0.nightly-2019-04-22-005054 True False 130m Cluster version is 4.1.0-0.nightly-2019-04-22-005054 # oc get nodes -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME ip-10-0-135-161.eu-west-1.compute.internal Ready worker 139m v1.13.4+da48e8391 10.0.135.161 <none> Red Hat Enterprise Linux CoreOS 410.8.20190418.1 (Ootpa) 4.18.0-80.el8.x86_64 cri-o://1.13.6-4.rhaos4.1.gita4b40b7.el8 ip-10-0-140-63.eu-west-1.compute.internal Ready master 144m v1.13.4+da48e8391 10.0.140.63 <none> Red Hat Enterprise Linux CoreOS 410.8.20190418.1 (Ootpa) 4.18.0-80.el8.x86_64 cri-o://1.13.6-4.rhaos4.1.gita4b40b7.el8 ip-10-0-147-114.eu-west-1.compute.internal Ready worker 139m v1.13.4+da48e8391 10.0.147.114 <none> Red Hat Enterprise Linux CoreOS 410.8.20190418.1 (Ootpa) 4.18.0-80.el8.x86_64 cri-o://1.13.6-4.rhaos4.1.gita4b40b7.el8 ip-10-0-157-194.eu-west-1.compute.internal Ready master 144m v1.13.4+da48e8391 10.0.157.194 <none> Red Hat Enterprise Linux CoreOS 410.8.20190418.1 (Ootpa) 4.18.0-80.el8.x86_64 cri-o://1.13.6-4.rhaos4.1.gita4b40b7.el8 ip-10-0-163-90.eu-west-1.compute.internal Ready worker 139m v1.13.4+da48e8391 10.0.163.90 <none> Red Hat Enterprise Linux CoreOS 410.8.20190418.1 (Ootpa) 4.18.0-80.el8.x86_64 cri-o://1.13.6-4.rhaos4.1.gita4b40b7.el8 ip-10-0-167-208.eu-west-1.compute.internal Ready master 144m v1.13.4+da48e8391 10.0.167.208 <none> Red Hat Enterprise Linux CoreOS 410.8.20190418.1 (Ootpa) 4.18.0-80.el8.x86_64 cri-o://1.13.6-4.rhaos4.1.gita4b40b7.el8 # oc describe pods h-1-dn9jm Name: h-1-dn9jm Namespace: default Priority: 0 PriorityClassName: <none> Node: ip-10-0-135-161.eu-west-1.compute.internal/10.0.135.161 Start Time: Mon, 22 Apr 2019 00:55:33 -0400 Labels: deployment=h-1 deploymentconfig=h run=h Annotations: openshift.io/deployment-config.latest-version: 1 openshift.io/deployment-config.name: h openshift.io/deployment.name: h-1 Status: Pending IP: Controlled By: ReplicationController/h-1 Containers: h: Container ID: Image: openshift/hello-openshift:latest Image ID: Port: <none> Host Port: <none> State: Waiting Reason: ContainerCreating Ready: False Restart Count: 0 Limits: cpu: 100m memory: 1Mi Requests: cpu: 100m memory: 1Mi Environment: <none> Mounts: /var/run/secrets/kubernetes.io/serviceaccount from default-token-p8297 (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: default-token-p8297: Type: Secret (a volume populated by a Secret) SecretName: default-token-p8297 Optional: false QoS Class: Guaranteed Node-Selectors: <none> Tolerations: node.kubernetes.io/memory-pressure:NoSchedule node.kubernetes.io/not-ready:NoExecute for 300s node.kubernetes.io/unreachable:NoExecute for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled <invalid> default-scheduler Successfully assigned default/h-1-dn9jm to ip-10-0-135-161.eu-west-1.compute.internal Warning FailedCreatePodSandBox <invalid> kubelet, ip-10-0-135-161.eu-west-1.compute.internal Failed create pod sandbox: rpc error: code = Unknown desc = pod set memory limit 1048576 too low; should be at least 4194304
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0758