Bug 1537237
| Summary: | Get unexpected error when viewing pod logs for deleted pod | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Clayton Coleman <ccoleman> |
| Component: | Node | Assignee: | Joel Smith <joelsmith> |
| Status: | CLOSED ERRATA | QA Contact: | DeShuai Ma <dma> |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 3.9.0 | CC: | aos-bugs, avagarwa, jokerman, mmccomas, sjenning, wjiang |
| Target Milestone: | --- | ||
| Target Release: | 3.9.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2018-03-28 14:21:18 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
Origin issue: https://github.com/openshift/origin/issues/18173 Setting severity high because this is an API regression (the client gets an unexpected error for missing containers, instead of the proper error). I was able to reproduce this, but my results were not quite as described. I was only ever to reproduce this with a container that had exited and was waiting to be restarted (like, in a crash loop backoff situation).
For testing, I ran "oc get logs mypod" in a loop to observe the logging behavior during various stages of the pod lifecycle.
On 3.9, I get these 3 phases:
1. The pod exists and the docker container exists. I get the logs back.
2. The pod is being deleted and the container no longer exists. I get the error mentioned above:
failed to get container status {"" ""}: rpc error: code = OutOfRange desc = EOF
3. The pod finishes terminating. I get the error:
Error from server (NotFound): pods "mypod" not found
On 3.7, I get the same 3 phases, but I get a different error message for phase 2:
failed to get container status {"" ""}: rpc error: code = 2 desc = json: cannot unmarshal array into Go value of type types.ContainerJSON
So I'm not sure that this is much of a regression, at least when compared to 3.7. I'm starting now to look into where to catch the error to return something better.
Opened upstream issue: https://github.com/kubernetes/kubernetes/issues/59296 Opened upstream PR: https://github.com/kubernetes/kubernetes/pull/59297 Just to close the loop, in Comment #3 above, I say in my phase 2 section that the container no longer exists, but as it turns out, the container (and its logs) do exist. It's just that the container ID is not present in the current container status while the pod is terminating. The container ID is available via the container state's lastState field (and so the fix is to look for the ID there if it can't be found in the current state). Checked with # openshift version openshift v3.9.0-0.47.0 kubernetes v1.9.1+a0ce1bc657 etcd 3.2.8 and the error message look good now # oc logs -f hello-1-ctq5r Error from server (BadRequest): container "hello" in pod "hello-1-ctq5r" is terminated Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:0489 |
3.9 master, using json-file driver with docker 1.12.6-68 I had just deleted this pod. I get this when I call logs briefly $ oc logs -n kube-system sts/prometheus -c prometheus failed to get container status {"" ""}: rpc error: code = OutOfRange desc = EOF We should never be leaking these sorts of errors, the standard behavior should be that kubelet returns a known error type and the api server passes it back to the client.