Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1365657

Summary: Can not delete project after shutdown the nodes
Product: OpenShift Container Platform Reporter: Weibin Liang <weliang>
Component: NodeAssignee: Andy Goldstein <agoldste>
Status: CLOSED DUPLICATE QA Contact: DeShuai Ma <dma>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 3.2.0CC: aos-bugs, jokerman, mmccomas
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-08-12 15:07:12 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Weibin Liang 2016-08-09 18:53:16 UTC
Description of problem:
Create one master and four nodes, create a new project, router,services and two pods, then shutdown two nodes which has a service pod is scheduled. When try to delete that new project, that new project can not be removed.

Version-Release number of selected component (if applicable):
[root@dhcp-41-74 ~]# oc version
oc v3.2.1.9-1-g2265530
kubernetes v1.2.0-36-g4a3f9c5
[root@dhcp-41-74 ~]# cat /etc/system-release
Red Hat Enterprise Linux Server release 7.2 (Maipo)
[root@dhcp-41-74 ~]# 

How reproducible:
Easy to reproduce, just follow below steps

Steps to Reproduce:
Can not delete project after shutdown the nodes
Create one master and four nodes, 

oc label nodes dhcp-41-142.bos.redhat.com "infra=ha-router"
oc label nodes dhcp-41-70.bos.redhat.com "infra=ha-router"
oc label nodes dhcp-41-62.bos.redhat.com "infra=ha-router"
oc label nodes dhcp-41-61.bos.redhat.com "infra=ha-router"
oc label nodes dhcp-41-74.bos.redhat.com "infra=ha-router"


[root@dhcp-41-74 ~]# oc new-project pro-ipfailover
Now using project "pro-ipfailover" on server "https://dhcp-41-74.bos.redhat.com:8443".

You can add applications to this project with the 'new-app' command. For example, try:

    $ oc new-app centos/ruby-22-centos7~https://github.com/openshift/ruby-hello-world.git

to build a new hello-world application in Ruby.
[root@dhcp-41-74 ~]# oc create serviceaccount harp -n pro-ipfailover
serviceaccount "harp" created
[root@dhcp-41-74 ~]# oadm policy add-scc-to-user privileged system:serviceaccount:pro-ipfailover:harp
[root@dhcp-41-74 ~]# 
[root@dhcp-41-74 ~]# oadm router ha-router --replicas=2 --selector="infra=ha-router" --labels="infra=ha-router" \
> --service-account=harp
info: password for stats user admin has been set to 83thdSDo72
error: serviceaccounts "harp" already exists
error: rolebinding "router-ha-router-role" already exists
deploymentconfig "ha-router" created
service "ha-router" created
[root@dhcp-41-74 ~]# 
[root@dhcp-41-74 ~]# oadm ipfailover ipf-har --replicas=4 --watch-port=80 --selector="infra=ha-router" \
> --virtual-ips="10.245.2.201-205" --credentials=/etc/origin/master/openshift-router.kubeconfig --service-account=harp --create
deploymentconfig "ipf-har" created
[root@dhcp-41-74 ~]# oc create -f https://raw.githubusercontent.com/weliang1/Openshift_Networking/master/hello-openshift-twopods.json
route "hello-route" created
service "hello-service" created
pod "hello-pod-1" created
pod "hello-pod-2" created
[root@dhcp-41-74 ~]# 
[root@dhcp-41-74 ~]# oc get project
NAME               DISPLAY NAME   STATUS
pro-ipfailover                    Active
default                           Active
logging                           Active
management-infra                  Active
openshift                         Active
openshift-infra                   Active
[root@dhcp-41-74 ~]# 

Wait for 60 seconds
shutdown two nodes which has a service pod is scheduled.

oc delete project pro-ipfailover, but pro-ipfailover in Terminating forever 
[root@dhcp-41-74 ~]# oc delete project pro-ipfailover
project "pro-ipfailover" deleted
[root@dhcp-41-74 ~]# oc get project
NAME               DISPLAY NAME   STATUS
pro-ipfailover                    Terminating
default                           Active
logging                           Active
management-infra                  Active
openshift                         Active
openshift-infra                   Active
[root@dhcp-41-74 ~]# oc new-project pro-ipfailover
Error from server: project "pro-ipfailover" already exists
sleep 30 seconds
[root@dhcp-41-74 ~]# oc get project
NAME               DISPLAY NAME   STATUS
pro-ipfailover                    Terminating
default                           Active
logging                           Active
management-infra                  Active
openshift                         Active
openshift-infra                   Active
[root@dhcp-41-74 ~]# 

Start two nodes which has a service pod is scheduled, then  project pro-ipfailover is deleted.[root@dhcp-41-74 ~]# oc get project
NAME               DISPLAY NAME   STATUS
logging                           Active
management-infra                  Active
openshift                         Active
openshift-infra                   Active
default                           Active

Actual results:
Can not delete new created project

Expected results:
New created project should be removed

Additional info:

Comment 1 David Eads 2016-08-09 19:32:19 UTC
"oc delete project pro-ipfailover, but pro-ipfailover in Terminating forever"

@decarr, this looks like some finalizer is waiting for the node to gracefully clean something?

Comment 2 Derek Carr 2016-08-12 15:06:46 UTC
Pod deletion is in phases.

If a project is terminated, the pod is marked for deletion.

Either of the following must then happen:

1. the kubelet affirms the deletion (but requires node running)
2. the node controller observes an unhealthy node, and destroys the pod.

If we depend on the node controller to perform the deletion, it will take >5 minutes for the pod to be terminated, so a sleep is insufficient.

There is a separate bug related to the node controller:

https://bugzilla.redhat.com/show_bug.cgi?id=1364243

I am marking this defect as a duplicate, but note that the test scenario of waiting 30 seconds is insufficient.  If you wait >5 minutes with no action, this bug is the same as the referenced bug.

Comment 3 Derek Carr 2016-08-12 15:07:12 UTC

*** This bug has been marked as a duplicate of bug 1364243 ***