Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1692594

Summary: OCP 4.1: oc delete project -l <label> command hangs intermittently while projects and their resources get actually deleted
Product: OpenShift Container Platform Reporter: Walid A. <wabouham>
Component: ocAssignee: Maciej Szulik <maszulik>
Status: CLOSED ERRATA QA Contact: Walid A. <wabouham>
Severity: low Docs Contact:
Priority: medium    
Version: 4.1.0CC: aos-bugs, jokerman, mfojtik, mifiedle, mmccomas, xxia
Target Milestone: ---   
Target Release: 4.2.0   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-10-16 06:27:58 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Walid A. 2019-03-26 01:21:03 UTC
Description of problem:
This is happening on an AWS m5.xlarge cluster with 3 masters and 3 worker nodes.
When deleting labelled projects with or without resources with "oc delete project -l <project_label>".
The oc command is frequently taking several minutes and at times longer than an hour to return.  This happens about 30% of the time, when deleting as few as 20 labelled projects.  The projects are getting deleted along with their resources, but the oc command does not return after all the projects and their resources are deleted.

last output of "oc delete project -l purpose=test" before it eventually returns shows:
.
.
.
project.project.openshift.io "clusterproject5" deleted
project.project.openshift.io "clusterproject6" deleted
project.project.openshift.io "clusterproject7" deleted
project.project.openshift.io "clusterproject8" deleted
project.project.openshift.io "clusterproject9" deleted

<======= hangs here for several minutes to over an hour 

Version-Release number of selected component (if applicable):
# ./openshift-install version
./openshift-install v4.0.22-201903220117-dirty
built from commit 135a5a3709461fb1bc26fb261e8f5e5f393a0e8c

# oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.0.0-0.nightly-2019-03-23-222829   True        False         19h     Cluster version is 4.0.0-0.nightly-2019-03-23-222829

# oc version
Client Version: version.Info{Major:"4", Minor:"0+", GitVersion:"v4.0.22", GitCommit:"219bbe2f0c", GitTreeState:"", BuildDate:"2019-03-10T22:23:11Z", GoVersion:"", Compiler:"", Platform:""}
Server Version: version.Info{Major:"1", Minor:"12+", GitVersion:"v1.12.4+8156b0c", GitCommit:"8156b0c", GitTreeState:"clean", BuildDate:"2019-03-22T10:37:27Z", GoVersion:"go1.10.8", Compiler:"gc", Platform:"linux/amd64"}

# oc version --short
Client Version: v4.0.22
Server Version: v1.12.4+8156b0c

# oc get nodes
NAME                                         STATUS   ROLES    AGE   VERSION
ip-10-0-135-200.us-east-2.compute.internal   Ready    master   21h   v1.12.4+d14915559e
ip-10-0-141-228.us-east-2.compute.internal   Ready    worker   20h   v1.12.4+d14915559e
ip-10-0-145-186.us-east-2.compute.internal   Ready    worker   20h   v1.12.4+d14915559e
ip-10-0-154-63.us-east-2.compute.internal    Ready    master   21h   v1.12.4+d14915559e
ip-10-0-167-72.us-east-2.compute.internal    Ready    master   21h   v1.12.4+d14915559e
ip-10-0-169-178.us-east-2.compute.internal   Ready    worker   20h   v1.12.4+d14915559e

How reproducible:
Many times, but the hanging can happen when deleting any number of projects larger than 15
I also hit this issue with earlier versions of oc client (oc_v4.0.0-0.185.0
and oc_v4.0.0-0.164.0) and on different jump hosts.



Steps to Reproduce:
1. Install an OCP 4.1 cluster on AWS (3 master, and 3 worker nodes m5.xlarge) with  OPENSHIFT_INSTALL_RELEASE_IMAGE_OVERRIDE=registry.svc.ci.openshift.org/ocp/release:4.0.0-0.nightly-2019-03-23-222829

2. Create 20 projects:
for i in {0..20}; do echo "==== Creating project i:  $i"; oc new-project clusterproject${i} ; done

3. Label the 20 projects with "project=test"
for i in {0..20}; do echo "==== Labeling project:  $i";  oc label --overwrite namespace clusterproject${i} purpose=test ; done

4. Delete all the projects labeled "purpose=test":

oc delete projects -l purpose=test

5. Repeat steps 1-3 a few times, and the 'oc delete projects -l purpose=test' command will hang 1 out of 3 times

Actual results:
All 20 projects actually get deleted but 'oc delete projects -l purpose=test' command will hang 1 out of 3 times
and can take over an hour to return
.
.

project.project.openshift.io "clusterproject0" deleted

<hangs here>

Expected results:
All 20 created projects get deleted within a 10-15 seconds


Additional info:

Comment 6 Maciej Szulik 2019-08-23 10:46:07 UTC
Is this still a problem with latest version?

Comment 7 Walid A. 2019-08-26 01:39:03 UTC
We have not seen this issue on an 4.2 AWS cluster with nightly build version:  4.2.0-0.nightly-2019-08-14-112500

Comment 8 Maciej Szulik 2019-08-26 09:32:21 UTC
Based on previous comment, moving to qa in that case.

Comment 9 Xingxing Xia 2019-08-30 06:49:13 UTC
Walid, could you verify the bug when you have time? Thx in advance

Comment 10 Walid A. 2019-08-30 08:36:16 UTC
Xingxing, this was verified on two 4.2 builds:
4.2.0-0.nightly-2019-08-22-153337
4.2.0-0.nightly-2019-08-14-112500 (comment 7)

I am also verifying on 4.1.3 today.  Will update the BZ after I run the tests. thanks.

Comment 11 Walid A. 2019-08-30 10:03:36 UTC
Verified on OCP 4.1.3
# oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.1.13    True        False         56m     Cluster version is 4.1.13

Comment 12 errata-xmlrpc 2019-10-16 06:27:58 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2922