Bug 1274598 - The deleted pod will keep in Terminating status till the image fully pulled to the node
Summary: The deleted pod will keep in Terminating status till the image fully pulled t...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: OKD
Classification: Red Hat
Component: Pod
Version: 3.x
Hardware: Unspecified
OS: Unspecified
medium
low
Target Milestone: ---
: ---
Assignee: Jan Chaloupka
QA Contact: Jianwei Hou
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-10-23 06:13 UTC by Meng Bo
Modified: 2016-03-16 21:53 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-03-09 15:19:42 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Meng Bo 2015-10-23 06:13:00 UTC
Description of problem:
Try to create some pod with large image, like ubuntu/fedora. (The images should be existed in docker registry)
Delete the pod before the image gets pulled on node.
The pod will keep in terminating status till the image gets fully pulled on node.

Version-Release number of selected component (if applicable):
openshift v1.0.6-873-ga24b901-dirty
kubernetes v1.2.0-alpha.1-1107-g4c8e6f4
etcd 2.1.2


How reproducible:
always

Steps to Reproduce:
1. Create pod with large image which dose not exist on node but exists on docker hub
eg.
{
  "kind": "Pod",
  "apiVersion":"v1",
  "metadata": {
        "name": "hello-pod",
        "labels": {
                "name": "hello-pod"
        }
  },
  "spec": {
      "containers": [{
        "name": "hello-pod",
        "image": "pweil/hello-nginx-docker"
      }]
  }
}

2. Delete the pod before the docker image gets fully pulled on node
3. Check the pod status

Actual results:
The pod will keep in terminating till the image gets fully pulled on node.

Expected results:
The pod should be deleted directly after the default grace_period 30s without relying on the image status on node.

Additional info:
[user1@master ~]$ oc get po -o wide
NAME        READY     STATUS    RESTARTS   AGE       NODE
hello-pod   0/1       Pending   0          3s        node3.bmeng.local
[user1@master ~]$ oc get po -o wide
NAME        READY     STATUS        RESTARTS   AGE       NODE
hello-pod   0/1       Terminating   0          9s        node3.bmeng.local
[user1@master ~]$ oc get po -o wide
NAME        READY     STATUS        RESTARTS   AGE       NODE
hello-pod   0/1       Terminating   0          55s       node3.bmeng.local
[user1@master ~]$ oc get po -o wide
NAME        READY     STATUS        RESTARTS   AGE       NODE
hello-pod   0/1       Terminating   0          2m        node3.bmeng.local


The pod can be deleted directly when adding option --grace_period=0

Comment 1 Andy Goldstein 2015-10-29 19:00:14 UTC
The syncPod function pulls the image, then tries to run the container. My guess is that there's some state while the image is being pulled that is keeping the pod from being deleted, and then that state changes after the image has been pulled, allowing the deletion. Marking UpcomingRelease as this isn't a 3.1 blocker. We'll take a look again once 3.1 is out the door...

Comment 2 Andy Goldstein 2016-01-05 14:11:21 UTC
This is an upstream kubelet issue. Moving to Node component. Lowering severity as this isn't a critical issue.

Comment 3 Avesh Agarwal 2016-03-09 14:04:07 UTC
Hi Jan,

If you need help with this, please let me know. I spent sometime on it yesterday but forgot to ask if you are still actively working on this, if not, let me know.

Comment 4 Jan Chaloupka 2016-03-09 14:55:13 UTC
Hi Avesh,

I spent some time tracing the call path to dockermanager methods. Atm, I am more involved in CI so if you almost know what is going on and what needs to be fixed, you can take the issues.

Jan

Comment 5 Avesh Agarwal 2016-03-09 15:03:46 UTC
Hi Jan,

It seems to me that I know what is going on in the code to some extent. I noticed that even being in Terminated state, the pod's container(s) goes through creation/start/killing stages, something seems incorrect to me. In particular, it seems that the steps, pulling image/creation/start/killing stages are happening in sequence and once the SyncPod comes out of this sequence, then only kubelet realizes the Pod is in Terminating state and then it starts to kill the pod's container(s).

Anyway, that said, I do not have any solution yet.

Thanks
Avesh

Comment 6 Andy Goldstein 2016-03-09 15:06:15 UTC
This doesn't really feel like a bug to me. If you create a pod but don't want to wait for it to run (which includes pulling the image), deleting with a grace period of 0 will get you what you want.

Comment 7 Avesh Agarwal 2016-03-09 15:12:21 UTC
Hi Andy, 

I had similar thoughts that it is not a bug w.r.t pulling image. 

But right now it seems that I am noticing another issue, once a Pod is in Terminating state, why its containers have to created and started after the image has been pulled successfully?

IOW:

1. Image is being pulled
<Pod goes into Terminating state>
2. Creation/Start of pod's container

So again why the 2nd step has to be carried out, if kubelet could know that the Pod is already in Terminating state? I think its happening as the steps 1 and 2 are happening in sequence. Not sure if this sequence could be broken.

Comment 8 Andy Goldstein 2016-03-09 15:19:42 UTC
Yes, it's because once SyncPod starts for a given pod, it continues to completion (start infra container, pull container image, create container, run container). I don't think it's worthwhile to add the complexity to check the pod's status in between when we start pulling the image and when we go to start the container. I'm going to go ahead and close this as working as designed.

Meng Bo, if you feel strongly that this behavior needs to be adjusted, please open an issue in the Kubernetes GitHub repository and make sure to @ mention at least me (ncdc).

Comment 9 Meng Bo 2016-03-14 03:11:21 UTC
@Andy

It dose not such important to fix this issue or not.
But I do not know why the --grace-period=0 can break the sequence(as comment#5), and --grace-period=<any_other_value_or_default_value> will not.

Comment 10 Andy Goldstein 2016-03-14 14:45:29 UTC
A grace period of 0 means "I want to delete this immediately." The pod is removed from the apiserver when grace is 0. I'd be willing to bet that the node is still pulling the image even when you delete with a grace of 0, because there's no way to short-circuit the sync loop in the node.


Note You need to log in before you can comment on or make changes to this bug.