Bug 1343157

Summary: Project delete leads to unexpected items in namespace
Product: OpenShift Container Platform Reporter: Vikas Laad <vlaad>
Component: NodeAssignee: Derek Carr <decarr>
Status: CLOSED DUPLICATE QA Contact: DeShuai Ma <dma>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.2.0CC: aos-bugs, jokerman, mmccomas
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-08-15 17:23:23 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
journalctl_atomic-openshift-master
none
journalctl_atomic-openshift-node
none
grep for project eap64-mysql-s2i-user-402-171-469-216 in /var/log/messages
none
different pods stuck on the cluster none

Description Vikas Laad 2016-06-06 16:02:54 UTC
Created attachment 1165272 [details]
journalctl_atomic-openshift-master

Description of problem:
Pods stuck in Terminating state, logs are filled with following error

Jun  6 00:42:24 ip-172-31-39-29 atomic-openshift-master: E0606 00:42:24.950699   11186 namespace_controller.go:139] unexpected items still remain in namespace: eap64-mysql-s2i-user-402-171-469-216 for gvr: { v1 pods}

Version-Release number of selected component (if applicable):
openshift v3.2.0.45
kubernetes v1.2.0-36-g4a3f9c5
etc 2.2.5

Docker version 1.10.3-25.el7 

How reproducible:
First attempt to run reliability tests against docker 1.10

Steps to Reproduce:
1. Create a cluster with 1 infra, 1 master and 2 nodes
2. Start reliability tests which continuously creates/access/deletes projects
3. Monitor the cluster for the duration of tests

Actual results:
Running into issue where pods stuck in Terminating state after "oc delete project" was issued.

Expected results:
Project should be deleted without problem.

Additional info:
This bug is similar to https://bugzilla.redhat.com/show_bug.cgi?id=1322538

Comment 1 Vikas Laad 2016-06-06 16:05:18 UTC
Created attachment 1165274 [details]
journalctl_atomic-openshift-node

Comment 2 Vikas Laad 2016-06-06 16:07:39 UTC
Created attachment 1165276 [details]
grep for project eap64-mysql-s2i-user-402-171-469-216 in /var/log/messages

Comment 3 Vikas Laad 2016-06-06 16:11:07 UTC
Created attachment 1165278 [details]
different pods stuck on the cluster

Comment 4 Derek Carr 2016-08-15 17:23:23 UTC
The logs showed a single pod stuck in Terminating status: eap-app-5-s05at

The master logs showd the namespace controller repeatedly observing that the pod was not yet deleted (it was stuck pending deletion from the kubelet or node controller in the case that the node was no longer healthy).  

There was no specific log action from the kubelet for pod: eap-app-5-s05at.  

It's not possible to know given the current logs if this was the only pod stuck terminating on this node.  If so, its possible the symptom for this bug is the same as https://bugzilla.redhat.com/show_bug.cgi?id=1364243 - where pods in terminating status would not get deleted by node controller if they were the only pod on the node.

I am closing this issue as duplicate of 1364243.

If the symptom repeats itself, please include the YAML output for `oc get pods --all-namespaces`, and the YAML output for `oc get nodes` so we can check heartbeats and pod->node assignment.

*** This bug has been marked as a duplicate of bug 1364243 ***