Bug 1343157

Summary:

Project delete leads to unexpected items in namespace

Product:

OpenShift Container Platform

Reporter:

Vikas Laad <vlaad>

Component:

Node

Assignee:

Derek Carr <decarr>

Status:

CLOSED DUPLICATE

QA Contact:

DeShuai Ma <dma>

Severity:

medium

Docs Contact:

Priority:

medium

Version:

3.2.0

CC:

aos-bugs, jokerman, mmccomas

Target Milestone:

---

Target Release:

---

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2016-08-15 17:23:23 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
journalctl_atomic-openshift-master	none
journalctl_atomic-openshift-node	none
grep for project eap64-mysql-s2i-user-402-171-469-216 in /var/log/messages	none
different pods stuck on the cluster	none

Description Vikas Laad 2016-06-06 16:02:54 UTC

Created attachment 1165272 [details]
journalctl_atomic-openshift-master

Description of problem:
Pods stuck in Terminating state, logs are filled with following error

Jun  6 00:42:24 ip-172-31-39-29 atomic-openshift-master: E0606 00:42:24.950699   11186 namespace_controller.go:139] unexpected items still remain in namespace: eap64-mysql-s2i-user-402-171-469-216 for gvr: { v1 pods}

Version-Release number of selected component (if applicable):
openshift v3.2.0.45
kubernetes v1.2.0-36-g4a3f9c5
etc 2.2.5

Docker version 1.10.3-25.el7 

How reproducible:
First attempt to run reliability tests against docker 1.10

Steps to Reproduce:
1. Create a cluster with 1 infra, 1 master and 2 nodes
2. Start reliability tests which continuously creates/access/deletes projects
3. Monitor the cluster for the duration of tests

Actual results:
Running into issue where pods stuck in Terminating state after "oc delete project" was issued.

Expected results:
Project should be deleted without problem.

Additional info:
This bug is similar to https://bugzilla.redhat.com/show_bug.cgi?id=1322538

Comment 1 Vikas Laad 2016-06-06 16:05:18 UTC

Created attachment 1165274 [details]
journalctl_atomic-openshift-node

Comment 2 Vikas Laad 2016-06-06 16:07:39 UTC

Created attachment 1165276 [details]
grep for project eap64-mysql-s2i-user-402-171-469-216 in /var/log/messages

Comment 3 Vikas Laad 2016-06-06 16:11:07 UTC

Created attachment 1165278 [details]
different pods stuck on the cluster

Comment 4 Derek Carr 2016-08-15 17:23:23 UTC

The logs showed a single pod stuck in Terminating status: eap-app-5-s05at

The master logs showd the namespace controller repeatedly observing that the pod was not yet deleted (it was stuck pending deletion from the kubelet or node controller in the case that the node was no longer healthy).  

There was no specific log action from the kubelet for pod: eap-app-5-s05at.  

It's not possible to know given the current logs if this was the only pod stuck terminating on this node.  If so, its possible the symptom for this bug is the same as https://bugzilla.redhat.com/show_bug.cgi?id=1364243 - where pods in terminating status would not get deleted by node controller if they were the only pod on the node.

I am closing this issue as duplicate of 1364243.

If the symptom repeats itself, please include the YAML output for `oc get pods --all-namespaces`, and the YAML output for `oc get nodes` so we can check heartbeats and pod->node assignment.

*** This bug has been marked as a duplicate of bug 1364243 ***