Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1537237 - Get unexpected error when viewing pod logs for deleted pod
Get unexpected error when viewing pod logs for deleted pod
Status: CLOSED ERRATA
Product: OpenShift Container Platform
Classification: Red Hat
Component: Pod (Show other bugs)
3.9.0
Unspecified Unspecified
unspecified Severity high
: ---
: 3.9.0
Assigned To: Joel Smith
DeShuai Ma
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2018-01-22 12:58 EST by Clayton Coleman
Modified: 2018-03-28 10:22 EDT (History)
6 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2018-03-28 10:21:18 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2018:0489 None None None 2018-03-28 10:22 EDT

  None (edit)
Description Clayton Coleman 2018-01-22 12:58:01 EST
3.9 master, using json-file driver with docker 1.12.6-68

I had just deleted this pod. I get this when I call logs briefly

$ oc logs -n kube-system sts/prometheus -c prometheus
failed to get container status {"" ""}: rpc error: code = OutOfRange desc = EOF

We should never be leaking these sorts of errors, the standard behavior should be that kubelet returns a known error type and the api server passes it back to the client.
Comment 1 Seth Jennings 2018-01-22 14:04:58 EST
Origin issue:
https://github.com/openshift/origin/issues/18173
Comment 2 Clayton Coleman 2018-01-23 11:37:20 EST
Setting severity high because this is an API regression (the client gets an unexpected error for missing containers, instead of the proper error).
Comment 3 Joel Smith 2018-01-29 13:36:27 EST
I was able to reproduce this, but my results were not quite as described. I was only ever to reproduce this with a container that had exited and was waiting to be restarted (like, in a crash loop backoff situation).

For testing, I ran "oc get logs mypod" in a loop to observe the logging behavior during various stages of the pod lifecycle.

On 3.9, I get these 3 phases:

1. The pod exists and the docker container exists. I get the logs back.
2. The pod is being deleted and the container no longer exists. I get the error mentioned above:
failed to get container status {"" ""}: rpc error: code = OutOfRange desc = EOF
3. The pod finishes terminating. I get the error:
Error from server (NotFound): pods "mypod" not found

On 3.7, I get the same 3 phases, but I get a different error message for phase 2:
failed to get container status {"" ""}: rpc error: code = 2 desc = json: cannot unmarshal array into Go value of type types.ContainerJSON

So I'm not sure that this is much of a regression, at least when compared to 3.7. I'm starting now to look into where to catch the error to return something better.
Comment 4 Joel Smith 2018-02-02 19:32:24 EST
Opened upstream issue:
https://github.com/kubernetes/kubernetes/issues/59296

Opened upstream PR:
https://github.com/kubernetes/kubernetes/pull/59297
Comment 5 Avesh Agarwal 2018-02-08 11:13:41 EST
Origin PR: https://github.com/openshift/origin/pull/18515
Comment 6 Joel Smith 2018-02-08 22:00:44 EST
Just to close the loop, in Comment #3 above, I say in my phase 2 section that the container no longer exists, but as it turns out, the container (and its logs) do exist. It's just that the container ID is not present in the current container status while the pod is terminating. The container ID is available via the container state's lastState field (and so the fix is to look for the ID there if it can't be found in the current state).
Comment 8 weiwei jiang 2018-02-22 02:05:17 EST
Checked with 
# openshift version 
openshift v3.9.0-0.47.0
kubernetes v1.9.1+a0ce1bc657
etcd 3.2.8

and the error message look good now
# oc logs -f hello-1-ctq5r 
Error from server (BadRequest): container "hello" in pod "hello-1-ctq5r" is terminated
Comment 11 errata-xmlrpc 2018-03-28 10:21:18 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0489

Note You need to log in before you can comment on or make changes to this bug.