Bug 1692594 - OCP 4.1: oc delete project -l <label> command hangs intermittently while projects and their resources get actually deleted
Summary: OCP 4.1: oc delete project -l <label> command hangs intermittently while pro...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: oc
Version: 4.1.0
Hardware: x86_64
OS: Linux
medium
low
Target Milestone: ---
: 4.2.0
Assignee: Maciej Szulik
QA Contact: Walid A.
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-03-26 01:21 UTC by Walid A.
Modified: 2019-10-16 06:28 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-10-16 06:27:58 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2019:2922 0 None None None 2019-10-16 06:28:10 UTC

Description Walid A. 2019-03-26 01:21:03 UTC
Description of problem:
This is happening on an AWS m5.xlarge cluster with 3 masters and 3 worker nodes.
When deleting labelled projects with or without resources with "oc delete project -l <project_label>".
The oc command is frequently taking several minutes and at times longer than an hour to return.  This happens about 30% of the time, when deleting as few as 20 labelled projects.  The projects are getting deleted along with their resources, but the oc command does not return after all the projects and their resources are deleted.

last output of "oc delete project -l purpose=test" before it eventually returns shows:
.
.
.
project.project.openshift.io "clusterproject5" deleted
project.project.openshift.io "clusterproject6" deleted
project.project.openshift.io "clusterproject7" deleted
project.project.openshift.io "clusterproject8" deleted
project.project.openshift.io "clusterproject9" deleted

<======= hangs here for several minutes to over an hour 

Version-Release number of selected component (if applicable):
# ./openshift-install version
./openshift-install v4.0.22-201903220117-dirty
built from commit 135a5a3709461fb1bc26fb261e8f5e5f393a0e8c

# oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.0.0-0.nightly-2019-03-23-222829   True        False         19h     Cluster version is 4.0.0-0.nightly-2019-03-23-222829

# oc version
Client Version: version.Info{Major:"4", Minor:"0+", GitVersion:"v4.0.22", GitCommit:"219bbe2f0c", GitTreeState:"", BuildDate:"2019-03-10T22:23:11Z", GoVersion:"", Compiler:"", Platform:""}
Server Version: version.Info{Major:"1", Minor:"12+", GitVersion:"v1.12.4+8156b0c", GitCommit:"8156b0c", GitTreeState:"clean", BuildDate:"2019-03-22T10:37:27Z", GoVersion:"go1.10.8", Compiler:"gc", Platform:"linux/amd64"}

# oc version --short
Client Version: v4.0.22
Server Version: v1.12.4+8156b0c

# oc get nodes
NAME                                         STATUS   ROLES    AGE   VERSION
ip-10-0-135-200.us-east-2.compute.internal   Ready    master   21h   v1.12.4+d14915559e
ip-10-0-141-228.us-east-2.compute.internal   Ready    worker   20h   v1.12.4+d14915559e
ip-10-0-145-186.us-east-2.compute.internal   Ready    worker   20h   v1.12.4+d14915559e
ip-10-0-154-63.us-east-2.compute.internal    Ready    master   21h   v1.12.4+d14915559e
ip-10-0-167-72.us-east-2.compute.internal    Ready    master   21h   v1.12.4+d14915559e
ip-10-0-169-178.us-east-2.compute.internal   Ready    worker   20h   v1.12.4+d14915559e

How reproducible:
Many times, but the hanging can happen when deleting any number of projects larger than 15
I also hit this issue with earlier versions of oc client (oc_v4.0.0-0.185.0
and oc_v4.0.0-0.164.0) and on different jump hosts.



Steps to Reproduce:
1. Install an OCP 4.1 cluster on AWS (3 master, and 3 worker nodes m5.xlarge) with  OPENSHIFT_INSTALL_RELEASE_IMAGE_OVERRIDE=registry.svc.ci.openshift.org/ocp/release:4.0.0-0.nightly-2019-03-23-222829

2. Create 20 projects:
for i in {0..20}; do echo "==== Creating project i:  $i"; oc new-project clusterproject${i} ; done

3. Label the 20 projects with "project=test"
for i in {0..20}; do echo "==== Labeling project:  $i";  oc label --overwrite namespace clusterproject${i} purpose=test ; done

4. Delete all the projects labeled "purpose=test":

oc delete projects -l purpose=test

5. Repeat steps 1-3 a few times, and the 'oc delete projects -l purpose=test' command will hang 1 out of 3 times

Actual results:
All 20 projects actually get deleted but 'oc delete projects -l purpose=test' command will hang 1 out of 3 times
and can take over an hour to return
.
.

project.project.openshift.io "clusterproject0" deleted

<hangs here>

Expected results:
All 20 created projects get deleted within a 10-15 seconds


Additional info:

Comment 6 Maciej Szulik 2019-08-23 10:46:07 UTC
Is this still a problem with latest version?

Comment 7 Walid A. 2019-08-26 01:39:03 UTC
We have not seen this issue on an 4.2 AWS cluster with nightly build version:  4.2.0-0.nightly-2019-08-14-112500

Comment 8 Maciej Szulik 2019-08-26 09:32:21 UTC
Based on previous comment, moving to qa in that case.

Comment 9 Xingxing Xia 2019-08-30 06:49:13 UTC
Walid, could you verify the bug when you have time? Thx in advance

Comment 10 Walid A. 2019-08-30 08:36:16 UTC
Xingxing, this was verified on two 4.2 builds:
4.2.0-0.nightly-2019-08-22-153337
4.2.0-0.nightly-2019-08-14-112500 (comment 7)

I am also verifying on 4.1.3 today.  Will update the BZ after I run the tests. thanks.

Comment 11 Walid A. 2019-08-30 10:03:36 UTC
Verified on OCP 4.1.3
# oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.1.13    True        False         56m     Cluster version is 4.1.13

Comment 12 errata-xmlrpc 2019-10-16 06:27:58 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2922


Note You need to log in before you can comment on or make changes to this bug.