Bug 1465801
Summary: | Some events record in ocp36 is different from ocp35 | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | DeShuai Ma <dma> |
Component: | Node | Assignee: | Derek Carr <decarr> |
Status: | CLOSED ERRATA | QA Contact: | DeShuai Ma <dma> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 3.6.0 | CC: | aos-bugs, eparis, jokerman, mmccomas, wmeng, xtian, yadu |
Target Milestone: | --- | ||
Target Release: | 3.7.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | No Doc Update | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2017-11-28 21:58:46 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
DeShuai Ma
2017-06-28 08:55:18 UTC
Description of problem: Another event related issue: When container failed to start, the event don't have any detail failed reason and keep ContainerCreating. In ocp3.5 we have detail reason said "...Error syncing pod. .... shm_rmid_forced: invalid argument" But in ocp36 in event can't see what's wrong. Version-Release number of selected component (if applicable): openshift v3.6.126 kubernetes v1.6.1+5115d708d7 etcd 3.2.0 Steps: [root@qe-dma126-master-1 ~]# oc create -f https://raw.githubusercontent.com/openshift-qe/v3-testfiles/master/pods/sysctls/pod-sysctl-safe-invalid2.yaml -n test1 pod "hello-pod" created [root@qe-dma126-master-1 ~]# oc get po -n test1 NAME READY STATUS RESTARTS AGE hello-pod 0/1 ContainerCreating 0 1m [root@qe-dma126-master-1 ~]# oc describe po hello-pod -n test1 Name: hello-pod Namespace: test1 Security Policy: anyuid Node: qe-dma126-node-registry-router-1/10.240.0.50 Start Time: Wed, 28 Jun 2017 05:20:00 -0400 Labels: name=hello-pod Annotations: openshift.io/scc=anyuid security.alpha.kubernetes.io/sysctls=kernel.shm_rmid_forced=hello Status: Pending IP: Controllers: <none> Containers: hello-pod: Container ID: Image: docker.io/ocpqe/hello-pod:latest Image ID: Port: 8080/TCP State: Waiting Reason: ContainerCreating Ready: False Restart Count: 0 Environment: <none> Mounts: /tmp from tmp (rw) /var/run/secrets/kubernetes.io/serviceaccount from default-token-qv1mb (ro) Conditions: Type Status Initialized True Ready False PodScheduled True Volumes: tmp: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: default-token-qv1mb: Type: Secret (a volume populated by a Secret) SecretName: default-token-qv1mb Optional: false QoS Class: BestEffort Node-Selectors: <none> Tolerations: <none> Events: FirstSeen LastSeen Count From SubObjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ ------- 1m 1m 1 default-scheduler Normal Scheduled Successfully assigned hello-pod to qe-dma126-node-registry-router-1 1m 7s 32 kubelet, qe-dma126-node-registry-router-1 Warning DNSSearchForming Found and omitted duplicated dns domain in host search line: 'cluster.local' during merging with cluster dns domains 1m 5s 32 kubelet, qe-dma126-node-registry-router-1 Normal SandboxChanged Pod sandbox changed, it will be killed and re-created. 1m 4s 33 kubelet, qe-dma126-node-registry-router-1 Warning FailedSync Error syncing pod Expected results: Should show container failed reason "/proc/sys/kernel/shm_rmid_forced: invalid argument" in event. Additional info: [root@qe-dma126-master-1 ~]# docker run --sysctl='kernel.shm_rmid_forced=hello' docker.io/ocpqe/hello-pod container_linux.go:247: starting container process caused "process_linux.go:359: container init caused \"write /proc/sys/kernel/shm_rmid_forced: invalid argument\"" /usr/bin/docker-current: Error response from daemon: invalid header field value "oci runtime error: container_linux.go:247: starting container process caused \"process_linux.go:359: container init caused \\\"write /proc/sys/kernel/shm_rmid_forced: invalid argument\\\"\"\n". Met same issue with invalid annotation in pod: If we add some invalid egress-bandwidth/ingress-bandwidth in pod, eg: "annotations": { "kubernetes.io/egress-bandwidth": "-10M", "kubernetes.io/ingress-bandwidth": "-3M" We could get some meaningful message when describe the pod in OCP-3.5: [root@host-8-175-59 ~]# oc describe pod iperf Name: iperf Namespace: d1 Security Policy: anyuid Node: host-8-175-81.host.centralci.eng.rdu2.redhat.com/10.8.175.81 Start Time: Thu, 29 Jun 2017 04:42:53 -0400 Labels: <none> Status: Pending IP: Controllers: <none> Containers: iperf: Container ID: Image: yadu/hello-openshift-iperf Image ID: Port: State: Waiting Reason: ContainerCreating Ready: False Restart Count: 0 Volume Mounts: /var/run/secrets/kubernetes.io/serviceaccount from default-token-4hg0k (ro) Environment Variables: <none> Conditions: Type Status Initialized True Ready False PodScheduled True Volumes: default-token-4hg0k: Type: Secret (a volume populated by a Secret) SecretName: default-token-4hg0k QoS Class: BestEffort Tolerations: <none> Events: FirstSeen LastSeen Count From SubObjectPath TypeReason Message --------- -------- ----- ---- ------------- -------- ------ ------- 21s 21s 1 {default-scheduler } Normal Scheduled Successfully assigned iperf to host-8-175-81.host.centralci.eng.rdu2.redhat.com 20s 1s 20 {kubelet host-8-175-81.host.centralci.eng.rdu2.redhat.com} Warning FailedSync Error syncing pod, skipping: failed to "SetupNetwork" for "iperf_d1" with SetupNetworkError: "Failed to setup network for pod \"iperf_d1(f107bae3-5ca6-11e7-ab56-fa163e27a692)\" using network plugins \"cni\": CNI request failed with status 400: 'failed to parse pod bandwidth: resource is unreasonably small (< 1kbit)\n'; Skipping pod" But now in OCP-3.6, we could not get such warnings. Scenario 1: $ oc new-app --file=https://raw.githubusercontent.com/openshift-qe/v3-testfiles/master/quota/quota_template.yaml --param=CPU_VALUE\=20 --param=MEM_VALUE\=1Gi --param=PV_VALUE\=10 --param=POD_VALUE\=10 --param=RC_VALUE\=20 --param=RQ_VALUE\=1 --param=SECRET_VALUE\=10 --param=SVC_VALUE\=5 $ oc run hello-pod --image=openshift/hello-openshift --replicas=1 $ oc describe dc/hello-pod see no event wait ~3m, and i see the following event: $ oc describe dc Name: hello-pod Namespace: myproject Created: 4 minutes ago Labels: run=hello-pod Annotations: <none> Latest Version: 1 Selector: run=hello-pod Replicas: 1 Triggers: Config Strategy: Rolling Template: Pod Template: Labels: run=hello-pod Containers: hello-pod: Image: openshift/hello-openshift Port: Environment: <none> Mounts: <none> Volumes: <none> Deployment #1 (latest): Name: hello-pod-1 Created: 4 minutes ago Status: New Replicas: 0 current / 0 desired Selector: deployment=hello-pod-1,deploymentconfig=hello-pod,run=hello-pod Labels: openshift.io/deployment-config.name=hello-pod,run=hello-pod Pods Status: 0 Running / 0 Waiting / 0 Succeeded / 0 Failed Events: FirstSeen LastSeen Count From SubObjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ ------- 4m 4m 1 deploymentconfig-controller Normal DeploymentCreated Created new replication controller "hello-pod-1" for version 1 1m 1m 1 deployer-controller Warning FailedRetry hello-pod-1: About to stop retrying "hello-pod-1": couldn't create deployer pod for "myproject/hello-pod-1": pods "hello-pod-1-deploy" is forbidden: failed quota: myquota: must specify cpu,memory To improve the experience, I have opened a PR to send a FailedCreate event when the deployment controller is unable to create the deploy pod right away. See: https://github.com/openshift/origin/pull/14970 $ oc describe dc Name: hello-pod Namespace: myproject Created: 2 minutes ago Labels: run=hello-pod Annotations: <none> Latest Version: 1 Selector: run=hello-pod Replicas: 1 Triggers: Config Strategy: Rolling Template: Pod Template: Labels: run=hello-pod Containers: hello-pod: Image: openshift/hello-openshift Port: Environment: <none> Mounts: <none> Volumes: <none> Deployment #1 (latest): Name: hello-pod-1 Created: 2 minutes ago Status: New Replicas: 0 current / 0 desired Selector: deployment=hello-pod-1,deploymentconfig=hello-pod,run=hello-pod Labels: openshift.io/deployment-config.name=hello-pod,run=hello-pod Pods Status: 0 Running / 0 Waiting / 0 Succeeded / 0 Failed Events: FirstSeen LastSeen Count From SubObjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ ------- 2m 2m 1 deploymentconfig-controller Normal DeploymentCreated Created new replication controller "hello-pod-1" for version 1 13s 13s 1 deployer-controller Warning FailedRetry Stop retrying: couldn't create deployer pod for "myproject/hello-pod-1": pods "hello-pod-1-deploy" is forbidden: failed quota: myquota: must specify cpu,memory 2m 13s 16 deployer-controller Warning FailedCreate Error creating deployer pod: pods "hello-pod-1-deploy" is forbidden: failed quota: myquota: must specify cpu,memory The kubelet FailedSync event was intentionally changed to reduce etcd event spam in PR: https://github.com/openshift/origin/pull/14693 we will look to get a more distinct network failed event in the future. We consider https://github.com/openshift/origin/pull/14970 The fix for this PR Verified on openshift v3.6.133. # oc run hello-pod --image=docker.io/ocpqe/hello-pod --replicas 1 -n dma deploymentconfig "hello-pod" created [root@ip-172-18-4-137 ~]# oc get dc -n dma NAME REVISION DESIRED CURRENT TRIGGERED BY hello-pod 1 1 0 config [root@ip-172-18-4-137 ~]# oc describe dc/hello-pod -n dma Name: hello-pod Namespace: dma Created: 30 seconds ago Labels: run=hello-pod Annotations: <none> Latest Version: 1 Selector: run=hello-pod Replicas: 1 Triggers: Config Strategy: Rolling Template: Pod Template: Labels: run=hello-pod Containers: hello-pod: Image: docker.io/ocpqe/hello-pod Port: Environment: <none> Mounts: <none> Volumes: <none> Deployment #1 (latest): Name: hello-pod-1 Created: 30 seconds ago Status: New Replicas: 0 current / 0 desired Selector: deployment=hello-pod-1,deploymentconfig=hello-pod,run=hello-pod Labels: openshift.io/deployment-config.name=hello-pod,run=hello-pod Pods Status: 0 Running / 0 Waiting / 0 Succeeded / 0 Failed Events: FirstSeen LastSeen Count From SubObjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ ------- 30s 30s 1 deploymentconfig-controller Normal DeploymentCreated Created new replication controller "hello-pod-1" for version 1 30s 9s 13 deployer-controller Warning FailedCreate Error creating deployer pod: pods "hello-pod-1-deploy" is forbidden: failed quota: myquota: must specify cpu,memory Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:3188 |