Bug 1469572
Summary: | Build pod status shows wrong value ExitCode:0 | ||||||
---|---|---|---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Vikas Laad <vlaad> | ||||
Component: | Node | Assignee: | Seth Jennings <sjenning> | ||||
Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | DeShuai Ma <dma> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 3.6.0 | CC: | aos-bugs, decarr, jokerman, mifiedle, mmccomas, vlaad, wmeng | ||||
Target Milestone: | --- | ||||||
Target Release: | 3.6.z | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2017-07-12 15:58:58 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
From the attached pod YAML file, the CLI is operating as expected. If the pod.status.containerStatuses[x].state.terminated.reason value is populated, we show that value. If the pod.status.containerStatuses[x].state.terminated.reason is empty, we show the ExitCode:%d value as reported. The pod.status.containerStatuses[x].state.terminated reason value is populated to "Completed" by default in dockershim docker_container.go if the container had a startedAt and finishedAt value reported from the container runtime. Absent those values, no reason is provided. From the attached pod YAML file, we can see that finishedAt and startedAt are actually null: state: terminated: containerID: docker://da46510645225c5d3cd86cc59e08ff232c35b08c15edff18673de27c003b5157 exitCode: 0 finishedAt: null startedAt: null The timestamps are taken from the output of a docker inspect call on the container. Are there any node logs that report: "failed to parse timestamp for container: %q: %v" Hi Derek, I do not see this string on any of the nodes in logs. I suspect that this is the docker inspect call timing out and not filling in the container status. This results in everything except the ConatinerID being empty: https://github.com/openshift/origin/blob/master/vendor/k8s.io/kubernetes/pkg/kubelet/kubelet_pods.go#L1265-L1273 Without the log entry to prove it, the only thing I can think is to look at the kubelet latency metrics on the docker inspect calls and see if the 99th percentile is up against the timeout. I do not think this should block a release as it had no impact on the actual workload running to completion. Instead, the kubelet was unable to derive the state of the container from the inspect call. Absent a log, I do not know how to proceed. If we encounter this bug again, can we re-open with log detail so we can verify what we believe happened? I still have the env, I just did not find that string in logs. Please let me know if someone wants to look at the env. All the pods that have this issue are over a day old. # oc get pods --all-namespaces | grep ExitCode:0 proj13 cakephp-mysql-example-33-build 0/1 ExitCode:0 0 1d proj20 cakephp-mysql-example-33-build 0/1 ExitCode:0 0 1d proj22 cakephp-mysql-example-33-build 0/1 ExitCode:0 0 1d proj26 cakephp-mysql-example-33-build 0/1 ExitCode:0 0 1d proj3 cakephp-mysql-example-33-build 0/1 ExitCode:0 0 1d proj8 cakephp-mysql-example-33-build 0/1 ExitCode:0 0 1d I doubt you will find anything in the logs on any of them with the verbosity of the logging. We need a fresh pod that exhibits the behavior and capture the node log with all the context around it if we are to look into this. |
Created attachment 1296263 [details] pod yaml Description of problem: After doing lots of concurrent builds some build pod status show wrong value "ExitCode:0". In this case I saw it when I did around 20 rounds of 30 concurrent builds on 2 m4.xlarge compute nodes. Version-Release number of selected component (if applicable): openshift v3.6.114 kubernetes v1.6.1+5115d708d7 etcd 3.2.0 Client: Version: 1.12.6 API version: 1.24 Package version: docker-1.12.6-28.git1398f24.el7.x86_64 Go version: go1.7.4 Git commit: 1398f24/1.12.6 Built: Wed May 17 01:16:44 2017 OS/Arch: linux/amd64 Server: Version: 1.12.6 API version: 1.24 Package version: docker-1.12.6-28.git1398f24.el7.x86_64 Go version: go1.7.4 Git commit: 1398f24/1.12.6 Built: Wed May 17 01:16:44 2017 OS/Arch: linux/amd64 Actual results: Build pod status show wrong value. Expected results: Build pod status should show correct value (Complete, Error etc) Additional info: Please see pod yaml attached.