3.9 master, using json-file driver with docker 1.12.6-68 I had just deleted this pod. I get this when I call logs briefly $ oc logs -n kube-system sts/prometheus -c prometheus failed to get container status {"" ""}: rpc error: code = OutOfRange desc = EOF We should never be leaking these sorts of errors, the standard behavior should be that kubelet returns a known error type and the api server passes it back to the client.
Origin issue: https://github.com/openshift/origin/issues/18173
Setting severity high because this is an API regression (the client gets an unexpected error for missing containers, instead of the proper error).
I was able to reproduce this, but my results were not quite as described. I was only ever to reproduce this with a container that had exited and was waiting to be restarted (like, in a crash loop backoff situation). For testing, I ran "oc get logs mypod" in a loop to observe the logging behavior during various stages of the pod lifecycle. On 3.9, I get these 3 phases: 1. The pod exists and the docker container exists. I get the logs back. 2. The pod is being deleted and the container no longer exists. I get the error mentioned above: failed to get container status {"" ""}: rpc error: code = OutOfRange desc = EOF 3. The pod finishes terminating. I get the error: Error from server (NotFound): pods "mypod" not found On 3.7, I get the same 3 phases, but I get a different error message for phase 2: failed to get container status {"" ""}: rpc error: code = 2 desc = json: cannot unmarshal array into Go value of type types.ContainerJSON So I'm not sure that this is much of a regression, at least when compared to 3.7. I'm starting now to look into where to catch the error to return something better.
Opened upstream issue: https://github.com/kubernetes/kubernetes/issues/59296 Opened upstream PR: https://github.com/kubernetes/kubernetes/pull/59297
Origin PR: https://github.com/openshift/origin/pull/18515
Just to close the loop, in Comment #3 above, I say in my phase 2 section that the container no longer exists, but as it turns out, the container (and its logs) do exist. It's just that the container ID is not present in the current container status while the pod is terminating. The container ID is available via the container state's lastState field (and so the fix is to look for the ID there if it can't be found in the current state).
Checked with # openshift version openshift v3.9.0-0.47.0 kubernetes v1.9.1+a0ce1bc657 etcd 3.2.8 and the error message look good now # oc logs -f hello-1-ctq5r Error from server (BadRequest): container "hello" in pod "hello-1-ctq5r" is terminated
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:0489