Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1537237

Summary:	Get unexpected error when viewing pod logs for deleted pod
Product:	OpenShift Container Platform	Reporter:	Clayton Coleman <ccoleman>
Component:	Node	Assignee:	Joel Smith <joelsmith>
Status:	CLOSED ERRATA	QA Contact:	DeShuai Ma <dma>
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	3.9.0	CC:	aos-bugs, avagarwa, jokerman, mmccomas, sjenning, wjiang
Target Milestone:	---
Target Release:	3.9.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2018-03-28 14:21:18 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Clayton Coleman 2018-01-22 17:58:01 UTC

3.9 master, using json-file driver with docker 1.12.6-68

I had just deleted this pod. I get this when I call logs briefly

$ oc logs -n kube-system sts/prometheus -c prometheus
failed to get container status {"" ""}: rpc error: code = OutOfRange desc = EOF

We should never be leaking these sorts of errors, the standard behavior should be that kubelet returns a known error type and the api server passes it back to the client.

Comment 1 Seth Jennings 2018-01-22 19:04:58 UTC

Origin issue:
https://github.com/openshift/origin/issues/18173

Comment 2 Clayton Coleman 2018-01-23 16:37:20 UTC

Setting severity high because this is an API regression (the client gets an unexpected error for missing containers, instead of the proper error).

Comment 3 Joel Smith 2018-01-29 18:36:27 UTC

I was able to reproduce this, but my results were not quite as described. I was only ever to reproduce this with a container that had exited and was waiting to be restarted (like, in a crash loop backoff situation).

For testing, I ran "oc get logs mypod" in a loop to observe the logging behavior during various stages of the pod lifecycle.

On 3.9, I get these 3 phases:

1. The pod exists and the docker container exists. I get the logs back.
2. The pod is being deleted and the container no longer exists. I get the error mentioned above:
failed to get container status {"" ""}: rpc error: code = OutOfRange desc = EOF
3. The pod finishes terminating. I get the error:
Error from server (NotFound): pods "mypod" not found

On 3.7, I get the same 3 phases, but I get a different error message for phase 2:
failed to get container status {"" ""}: rpc error: code = 2 desc = json: cannot unmarshal array into Go value of type types.ContainerJSON

So I'm not sure that this is much of a regression, at least when compared to 3.7. I'm starting now to look into where to catch the error to return something better.

Comment 4 Joel Smith 2018-02-03 00:32:24 UTC

Opened upstream issue:
https://github.com/kubernetes/kubernetes/issues/59296

Opened upstream PR:
https://github.com/kubernetes/kubernetes/pull/59297

Comment 5 Avesh Agarwal 2018-02-08 16:13:41 UTC

Origin PR: https://github.com/openshift/origin/pull/18515

Comment 6 Joel Smith 2018-02-09 03:00:44 UTC

Just to close the loop, in Comment #3 above, I say in my phase 2 section that the container no longer exists, but as it turns out, the container (and its logs) do exist. It's just that the container ID is not present in the current container status while the pod is terminating. The container ID is available via the container state's lastState field (and so the fix is to look for the ID there if it can't be found in the current state).

Comment 8 weiwei jiang 2018-02-22 07:05:17 UTC

Checked with 
# openshift version 
openshift v3.9.0-0.47.0
kubernetes v1.9.1+a0ce1bc657
etcd 3.2.8

and the error message look good now
# oc logs -f hello-1-ctq5r 
Error from server (BadRequest): container "hello" in pod "hello-1-ctq5r" is terminated

Comment 11 errata-xmlrpc 2018-03-28 14:21:18 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0489