Bug 1318681

Summary: The pod's state is different from web UI and CLI
Product: OpenShift Container Platform Reporter: Eric Rich <erich>
Component: Management ConsoleAssignee: Samuel Padgett <spadgett>
Status: CLOSED ERRATA QA Contact: Yadan Pei <yapei>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.1.0CC: agoldste, aos-bugs, dma, jokerman, mmccomas, nicholas_schuetz, tdawson
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 1318497 Environment:
Last Closed: 2016-05-12 16:33:19 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1318497    
Bug Blocks:    
Attachments:
Description Flags
webconsole output none

Description Eric Rich 2016-03-17 13:40:59 UTC
+++ This bug was initially created as a clone of Bug #1318497 +++

Description of problem:

pod shows "Terminating" by CLI, web UI has "Pending" status, according to the user's report.

Version-Release number of selected component (if applicable):

- v3.1.1.6

How reproducible:

- Unclear

Actual results:

- Pods remains as "Terminating" status, while UI states "Pending"

Expected results:

- Pods state with UI and CLI should match

Additional info:

Comment 1 Andy Goldstein 2016-03-17 19:04:41 UTC
Avesh, please review the support case, triage, and let me know if you think this is something we need to handle or if it's a console (UI) issue.

Comment 2 Avesh Agarwal 2016-03-17 20:35:39 UTC
I tested it with latest origin with both UI and CLI, in some way I can reproduce it but there is a difference, the CLI (oc get and oc describe shows Terminating which is correct), however UI shows "creatingcontainer (I forgot to capture the screen, will post shortly). So right now it really seems UI issue, unless there is another way to reproduce

Output from CLI:

[root@localhost origin]# oc get pods
NAME        READY     STATUS        RESTARTS   AGE
hello-pod   0/1       Terminating   0          37s
[root@localhost origin]# oc describe pod
Name:                           hello-pod
Namespace:                      default
Node:                           192.168.122.253/192.168.122.253
Start Time:                     Thu, 17 Mar 2016 16:16:00 -0400
Labels:                         name=hello-pod
Status:                         Terminating (expires Thu, 17 Mar 2016 16:16:40 -0400)
Termination Grace Period:       30s
IP:
Controllers:                    <none>
Containers:
  hello-pod:
    Container ID:
    Image:              pweil/hello-nginx-docker
    Image ID:
    Port:
    QoS Tier:
      cpu:              BestEffort
      memory:           BestEffort
    State:              Waiting
      Reason:           ContainerCreating
    Ready:              False
    Restart Count:      0
    Environment Variables:
Conditions:
  Type          Status
  Ready         False
Volumes:
  default-token-28x1v:
    Type:       Secret (a volume populated by a Secret)
    SecretName: default-token-28x1v
Events:
  FirstSeen     LastSeen        Count   From                            SubobjectPath                   Type            Reason          Message
  ---------     --------        -----   ----                            -------------                   --------        ------          -------
  38s           38s             1       {default-scheduler }                                            Normal          Scheduled       Successfully assigned hello-pod to 192.168.122.253
  34s           34s             1       {kubelet 192.168.122.253}       spec.containers{hello-pod}      Normal          Pulling         pulling image "pweil/hello-nginx-docker"

Comment 3 Avesh Agarwal 2016-03-17 20:36:10 UTC
I will test soon on 3.1.1.6 but wanted to test UI first with latest origin.

Comment 4 Andy Goldstein 2016-03-17 20:52:20 UTC
We need an sosreport before we can debug much more

Comment 5 Avesh Agarwal 2016-03-17 21:42:27 UTC
Summary: I tried some different steps to reproduce it with latest origin, and this time pod (running busybox container with sleep command with some very high value like 9999999) was in Running state. When I deleted with grace period with 200 seconds, I could see that both oc get/describe shows the pod in terminating but the web UI shows "Running state.

Steps:
1. One master and one node with latest origin
2. busybox image was already pulled. 
3. oc create -f oc create -f /root/test-pod.yaml
cat /root/test-pod.yaml:
apiVersion: v1
kind: Pod
metadata:
  name: busybox
  namespace: test
spec:  # specification of the pod's contents
  restartPolicy: Never
  containers:
  - name: busybox
    image: "busybox"
    command: ["sleep", "999999"]
4. oc delete -f /root/test-pod.yaml  --grace-period=200 
  

The difference this time with the previous one is that in the previous was image was being pulled when oc delete was executed, whereas in this image already exists and pod is in Running state but during oc delete gets stuck in terminating until grace period as the container does not exit before that. 
Output:

[root@localhost origin]# oc describe pod -n test
Name:				busybox
Namespace:			test
Node:				192.168.122.253/192.168.122.253
Start Time:			Thu, 17 Mar 2016 17:30:11 -0400
Labels:				<none>
Status:				Terminating (expires Thu, 17 Mar 2016 17:34:22 -0400)
Termination Grace Period:	200s
IP:				172.17.0.2
Controllers:			<none>
Containers:
  busybox:
    Container ID:	docker://2c385747bee291749e19f531a9c75c75bd58f78ec1c6430ebed188221f85b912
    Image:		busybox
    Image ID:		docker://559d41a5eba1c166e63a27a766321d494946ad220c1636be26c83b23ff7549f2
    Port:		
    Command:
      sleep
      999999
    QoS Tier:
      memory:		BestEffort
      cpu:		BestEffort
    State:		Running
      Started:		Thu, 17 Mar 2016 17:30:14 -0400
    Ready:		True
    Restart Count:	0
    Environment Variables:
Conditions:
  Type		Status
  Ready 	True 
Volumes:
  default-token-mrbqp:
    Type:	Secret (a volume populated by a Secret)
    SecretName:	default-token-mrbqp
Events:
  FirstSeen	LastSeen	Count	From				SubobjectPath			Type		Reason		Message
  ---------	--------	-----	----				-------------			--------	------		-------
  2m		2m		1	{default-scheduler }						Normal		Scheduled	Successfully assigned busybox to 192.168.122.253
  2m		2m		1	{kubelet 192.168.122.253}	spec.containers{busybox}	Normal		Pulling		pulling image "busybox"
  2m		2m		1	{kubelet 192.168.122.253}	spec.containers{busybox}	Normal		Pulled		Successfully pulled image "busybox"
  2m		2m		1	{kubelet 192.168.122.253}	spec.containers{busybox}	Normal		Created		Created container with docker id 2c385747bee2
  2m		2m		1	{kubelet 192.168.122.253}	spec.containers{busybox}	Normal		Started		Started container with docker id 2c385747bee2


[root@localhost origin]# oc get pod -n test
NAME      READY     STATUS        RESTARTS   AGE
busybox   1/1       Terminating   0          2m

Comment 6 Avesh Agarwal 2016-03-17 21:45:25 UTC
I have attached a pdf for webconsole output showing running state.

Comment 7 Avesh Agarwal 2016-03-17 21:45:51 UTC
Created attachment 1137530 [details]
webconsole output

Comment 8 Avesh Agarwal 2016-03-17 21:47:54 UTC
If you compare the timesstamp in oc describe and webconsole, you could see that they are showing different states at the same time (around 17:30:14).

Comment 9 Avesh Agarwal 2016-03-17 21:56:10 UTC
So it seems to me that in both tests, webconsole seems to be reporting the state that was before running oc delete but not updating to Terminating after oc delete.

Comment 10 Andy Goldstein 2016-03-18 13:01:58 UTC
Jessica, sending this your way since it appears to be UI related

Comment 11 Samuel Padgett 2016-03-18 16:21:13 UTC
https://github.com/openshift/origin/pull/8127

Comment 12 Yadan Pei 2016-03-21 06:28:38 UTC
checked on latest 3.2 puddle AtomicOpenShift-errata/3.2/latest/RH7-RHAOS-3.2/x86_64/os/Packages/

Fixed not merged yet, will check when new puddle is ready

Comment 13 Yadan Pei 2016-03-23 06:42:39 UTC
Checked on 2016-03-21.4 puddle, 

1) Create "busybox" pod
$ oc create -f test-pod.yaml

2) Check pod status through CLI
$ oc get pods
NAME             READY     STATUS              RESTARTS   AGE
busybox          0/1       ContainerCreating   0          14s

On web console, it's also ContainerCreating

3) Delete pod 
$ oc delete pod busybox --grace-period=200

4) Check pod status after deleting
$ oc get pods
NAME             READY     STATUS        RESTARTS   AGE
busybox          1/1       Terminating   0          1m

5) On web console, it is shown as "Terminating" also 

Move to VERIFIED

Comment 15 errata-xmlrpc 2016-05-12 16:33:19 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2016:1064