Bug 1465801
| Summary: | Some events record in ocp36 is different from ocp35 | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | DeShuai Ma <dma> |
| Component: | Node | Assignee: | Derek Carr <decarr> |
| Status: | CLOSED ERRATA | QA Contact: | DeShuai Ma <dma> |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 3.6.0 | CC: | aos-bugs, eparis, jokerman, mmccomas, wmeng, xtian, yadu |
| Target Milestone: | --- | ||
| Target Release: | 3.7.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | No Doc Update | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2017-11-28 21:58:46 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
DeShuai Ma
2017-06-28 08:55:18 UTC
Description of problem: Another event related issue: When container failed to start, the event don't have any detail failed reason and keep ContainerCreating. In ocp3.5 we have detail reason said "...Error syncing pod. .... shm_rmid_forced: invalid argument" But in ocp36 in event can't see what's wrong. Version-Release number of selected component (if applicable): openshift v3.6.126 kubernetes v1.6.1+5115d708d7 etcd 3.2.0 Steps: [root@qe-dma126-master-1 ~]# oc create -f https://raw.githubusercontent.com/openshift-qe/v3-testfiles/master/pods/sysctls/pod-sysctl-safe-invalid2.yaml -n test1 pod "hello-pod" created [root@qe-dma126-master-1 ~]# oc get po -n test1 NAME READY STATUS RESTARTS AGE hello-pod 0/1 ContainerCreating 0 1m [root@qe-dma126-master-1 ~]# oc describe po hello-pod -n test1 Name: hello-pod Namespace: test1 Security Policy: anyuid Node: qe-dma126-node-registry-router-1/10.240.0.50 Start Time: Wed, 28 Jun 2017 05:20:00 -0400 Labels: name=hello-pod Annotations: openshift.io/scc=anyuid security.alpha.kubernetes.io/sysctls=kernel.shm_rmid_forced=hello Status: Pending IP: Controllers: <none> Containers: hello-pod: Container ID: Image: docker.io/ocpqe/hello-pod:latest Image ID: Port: 8080/TCP State: Waiting Reason: ContainerCreating Ready: False Restart Count: 0 Environment: <none> Mounts: /tmp from tmp (rw) /var/run/secrets/kubernetes.io/serviceaccount from default-token-qv1mb (ro) Conditions: Type Status Initialized True Ready False PodScheduled True Volumes: tmp: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: default-token-qv1mb: Type: Secret (a volume populated by a Secret) SecretName: default-token-qv1mb Optional: false QoS Class: BestEffort Node-Selectors: <none> Tolerations: <none> Events: FirstSeen LastSeen Count From SubObjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ ------- 1m 1m 1 default-scheduler Normal Scheduled Successfully assigned hello-pod to qe-dma126-node-registry-router-1 1m 7s 32 kubelet, qe-dma126-node-registry-router-1 Warning DNSSearchForming Found and omitted duplicated dns domain in host search line: 'cluster.local' during merging with cluster dns domains 1m 5s 32 kubelet, qe-dma126-node-registry-router-1 Normal SandboxChanged Pod sandbox changed, it will be killed and re-created. 1m 4s 33 kubelet, qe-dma126-node-registry-router-1 Warning FailedSync Error syncing pod Expected results: Should show container failed reason "/proc/sys/kernel/shm_rmid_forced: invalid argument" in event. Additional info: [root@qe-dma126-master-1 ~]# docker run --sysctl='kernel.shm_rmid_forced=hello' docker.io/ocpqe/hello-pod container_linux.go:247: starting container process caused "process_linux.go:359: container init caused \"write /proc/sys/kernel/shm_rmid_forced: invalid argument\"" /usr/bin/docker-current: Error response from daemon: invalid header field value "oci runtime error: container_linux.go:247: starting container process caused \"process_linux.go:359: container init caused \\\"write /proc/sys/kernel/shm_rmid_forced: invalid argument\\\"\"\n". Met same issue with invalid annotation in pod:
If we add some invalid egress-bandwidth/ingress-bandwidth in pod,
eg: "annotations": {
"kubernetes.io/egress-bandwidth": "-10M",
"kubernetes.io/ingress-bandwidth": "-3M"
We could get some meaningful message when describe the pod in OCP-3.5:
[root@host-8-175-59 ~]# oc describe pod iperf
Name: iperf
Namespace: d1
Security Policy: anyuid
Node: host-8-175-81.host.centralci.eng.rdu2.redhat.com/10.8.175.81
Start Time: Thu, 29 Jun 2017 04:42:53 -0400
Labels: <none>
Status: Pending
IP:
Controllers: <none>
Containers:
iperf:
Container ID:
Image: yadu/hello-openshift-iperf
Image ID:
Port:
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Volume Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-4hg0k (ro)
Environment Variables: <none>
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
Volumes:
default-token-4hg0k:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-4hg0k
QoS Class: BestEffort
Tolerations: <none>
Events:
FirstSeen LastSeen Count From SubObjectPath TypeReason Message
--------- -------- ----- ---- ------------- -------- ------ -------
21s 21s 1 {default-scheduler } Normal Scheduled Successfully assigned iperf to host-8-175-81.host.centralci.eng.rdu2.redhat.com
20s 1s 20 {kubelet host-8-175-81.host.centralci.eng.rdu2.redhat.com} Warning FailedSync Error syncing pod, skipping: failed to "SetupNetwork" for "iperf_d1" with SetupNetworkError: "Failed to setup network for pod \"iperf_d1(f107bae3-5ca6-11e7-ab56-fa163e27a692)\" using network plugins \"cni\": CNI request failed with status 400: 'failed to parse pod bandwidth: resource is unreasonably small (< 1kbit)\n'; Skipping pod"
But now in OCP-3.6, we could not get such warnings.
Scenario 1: $ oc new-app --file=https://raw.githubusercontent.com/openshift-qe/v3-testfiles/master/quota/quota_template.yaml --param=CPU_VALUE\=20 --param=MEM_VALUE\=1Gi --param=PV_VALUE\=10 --param=POD_VALUE\=10 --param=RC_VALUE\=20 --param=RQ_VALUE\=1 --param=SECRET_VALUE\=10 --param=SVC_VALUE\=5 $ oc run hello-pod --image=openshift/hello-openshift --replicas=1 $ oc describe dc/hello-pod see no event wait ~3m, and i see the following event: $ oc describe dc Name: hello-pod Namespace: myproject Created: 4 minutes ago Labels: run=hello-pod Annotations: <none> Latest Version: 1 Selector: run=hello-pod Replicas: 1 Triggers: Config Strategy: Rolling Template: Pod Template: Labels: run=hello-pod Containers: hello-pod: Image: openshift/hello-openshift Port: Environment: <none> Mounts: <none> Volumes: <none> Deployment #1 (latest): Name: hello-pod-1 Created: 4 minutes ago Status: New Replicas: 0 current / 0 desired Selector: deployment=hello-pod-1,deploymentconfig=hello-pod,run=hello-pod Labels: openshift.io/deployment-config.name=hello-pod,run=hello-pod Pods Status: 0 Running / 0 Waiting / 0 Succeeded / 0 Failed Events: FirstSeen LastSeen Count From SubObjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ ------- 4m 4m 1 deploymentconfig-controller Normal DeploymentCreated Created new replication controller "hello-pod-1" for version 1 1m 1m 1 deployer-controller Warning FailedRetry hello-pod-1: About to stop retrying "hello-pod-1": couldn't create deployer pod for "myproject/hello-pod-1": pods "hello-pod-1-deploy" is forbidden: failed quota: myquota: must specify cpu,memory To improve the experience, I have opened a PR to send a FailedCreate event when the deployment controller is unable to create the deploy pod right away. See: https://github.com/openshift/origin/pull/14970 $ oc describe dc Name: hello-pod Namespace: myproject Created: 2 minutes ago Labels: run=hello-pod Annotations: <none> Latest Version: 1 Selector: run=hello-pod Replicas: 1 Triggers: Config Strategy: Rolling Template: Pod Template: Labels: run=hello-pod Containers: hello-pod: Image: openshift/hello-openshift Port: Environment: <none> Mounts: <none> Volumes: <none> Deployment #1 (latest): Name: hello-pod-1 Created: 2 minutes ago Status: New Replicas: 0 current / 0 desired Selector: deployment=hello-pod-1,deploymentconfig=hello-pod,run=hello-pod Labels: openshift.io/deployment-config.name=hello-pod,run=hello-pod Pods Status: 0 Running / 0 Waiting / 0 Succeeded / 0 Failed Events: FirstSeen LastSeen Count From SubObjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ ------- 2m 2m 1 deploymentconfig-controller Normal DeploymentCreated Created new replication controller "hello-pod-1" for version 1 13s 13s 1 deployer-controller Warning FailedRetry Stop retrying: couldn't create deployer pod for "myproject/hello-pod-1": pods "hello-pod-1-deploy" is forbidden: failed quota: myquota: must specify cpu,memory 2m 13s 16 deployer-controller Warning FailedCreate Error creating deployer pod: pods "hello-pod-1-deploy" is forbidden: failed quota: myquota: must specify cpu,memory The kubelet FailedSync event was intentionally changed to reduce etcd event spam in PR: https://github.com/openshift/origin/pull/14693 we will look to get a more distinct network failed event in the future. We consider https://github.com/openshift/origin/pull/14970 The fix for this PR Verified on openshift v3.6.133.
# oc run hello-pod --image=docker.io/ocpqe/hello-pod --replicas 1 -n dma
deploymentconfig "hello-pod" created
[root@ip-172-18-4-137 ~]# oc get dc -n dma
NAME REVISION DESIRED CURRENT TRIGGERED BY
hello-pod 1 1 0 config
[root@ip-172-18-4-137 ~]# oc describe dc/hello-pod -n dma
Name: hello-pod
Namespace: dma
Created: 30 seconds ago
Labels: run=hello-pod
Annotations: <none>
Latest Version: 1
Selector: run=hello-pod
Replicas: 1
Triggers: Config
Strategy: Rolling
Template:
Pod Template:
Labels: run=hello-pod
Containers:
hello-pod:
Image: docker.io/ocpqe/hello-pod
Port:
Environment: <none>
Mounts: <none>
Volumes: <none>
Deployment #1 (latest):
Name: hello-pod-1
Created: 30 seconds ago
Status: New
Replicas: 0 current / 0 desired
Selector: deployment=hello-pod-1,deploymentconfig=hello-pod,run=hello-pod
Labels: openshift.io/deployment-config.name=hello-pod,run=hello-pod
Pods Status: 0 Running / 0 Waiting / 0 Succeeded / 0 Failed
Events:
FirstSeen LastSeen Count From SubObjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
30s 30s 1 deploymentconfig-controller Normal DeploymentCreated Created new replication controller "hello-pod-1" for version 1
30s 9s 13 deployer-controller Warning FailedCreate Error creating deployer pod: pods "hello-pod-1-deploy" is forbidden: failed quota: myquota: must specify cpu,memory
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:3188 |