Bug 1537237 - Get unexpected error when viewing pod logs for deleted pod
Summary: Get unexpected error when viewing pod logs for deleted pod
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 3.9.0
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 3.9.0
Assignee: Joel Smith
QA Contact: DeShuai Ma
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-01-22 17:58 UTC by Clayton Coleman
Modified: 2018-03-28 14:22 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-03-28 14:21:18 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2018:0489 0 None None None 2018-03-28 14:22:06 UTC

Description Clayton Coleman 2018-01-22 17:58:01 UTC
3.9 master, using json-file driver with docker 1.12.6-68

I had just deleted this pod. I get this when I call logs briefly

$ oc logs -n kube-system sts/prometheus -c prometheus
failed to get container status {"" ""}: rpc error: code = OutOfRange desc = EOF

We should never be leaking these sorts of errors, the standard behavior should be that kubelet returns a known error type and the api server passes it back to the client.

Comment 1 Seth Jennings 2018-01-22 19:04:58 UTC
Origin issue:
https://github.com/openshift/origin/issues/18173

Comment 2 Clayton Coleman 2018-01-23 16:37:20 UTC
Setting severity high because this is an API regression (the client gets an unexpected error for missing containers, instead of the proper error).

Comment 3 Joel Smith 2018-01-29 18:36:27 UTC
I was able to reproduce this, but my results were not quite as described. I was only ever to reproduce this with a container that had exited and was waiting to be restarted (like, in a crash loop backoff situation).

For testing, I ran "oc get logs mypod" in a loop to observe the logging behavior during various stages of the pod lifecycle.

On 3.9, I get these 3 phases:

1. The pod exists and the docker container exists. I get the logs back.
2. The pod is being deleted and the container no longer exists. I get the error mentioned above:
failed to get container status {"" ""}: rpc error: code = OutOfRange desc = EOF
3. The pod finishes terminating. I get the error:
Error from server (NotFound): pods "mypod" not found

On 3.7, I get the same 3 phases, but I get a different error message for phase 2:
failed to get container status {"" ""}: rpc error: code = 2 desc = json: cannot unmarshal array into Go value of type types.ContainerJSON

So I'm not sure that this is much of a regression, at least when compared to 3.7. I'm starting now to look into where to catch the error to return something better.

Comment 4 Joel Smith 2018-02-03 00:32:24 UTC
Opened upstream issue:
https://github.com/kubernetes/kubernetes/issues/59296

Opened upstream PR:
https://github.com/kubernetes/kubernetes/pull/59297

Comment 5 Avesh Agarwal 2018-02-08 16:13:41 UTC
Origin PR: https://github.com/openshift/origin/pull/18515

Comment 6 Joel Smith 2018-02-09 03:00:44 UTC
Just to close the loop, in Comment #3 above, I say in my phase 2 section that the container no longer exists, but as it turns out, the container (and its logs) do exist. It's just that the container ID is not present in the current container status while the pod is terminating. The container ID is available via the container state's lastState field (and so the fix is to look for the ID there if it can't be found in the current state).

Comment 8 weiwei jiang 2018-02-22 07:05:17 UTC
Checked with 
# openshift version 
openshift v3.9.0-0.47.0
kubernetes v1.9.1+a0ce1bc657
etcd 3.2.8

and the error message look good now
# oc logs -f hello-1-ctq5r 
Error from server (BadRequest): container "hello" in pod "hello-1-ctq5r" is terminated

Comment 11 errata-xmlrpc 2018-03-28 14:21:18 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0489


Note You need to log in before you can comment on or make changes to this bug.