Description of problem: This is happening on an AWS m5.xlarge cluster with 3 masters and 3 worker nodes. When deleting labelled projects with or without resources with "oc delete project -l <project_label>". The oc command is frequently taking several minutes and at times longer than an hour to return. This happens about 30% of the time, when deleting as few as 20 labelled projects. The projects are getting deleted along with their resources, but the oc command does not return after all the projects and their resources are deleted. last output of "oc delete project -l purpose=test" before it eventually returns shows: . . . project.project.openshift.io "clusterproject5" deleted project.project.openshift.io "clusterproject6" deleted project.project.openshift.io "clusterproject7" deleted project.project.openshift.io "clusterproject8" deleted project.project.openshift.io "clusterproject9" deleted <======= hangs here for several minutes to over an hour Version-Release number of selected component (if applicable): # ./openshift-install version ./openshift-install v4.0.22-201903220117-dirty built from commit 135a5a3709461fb1bc26fb261e8f5e5f393a0e8c # oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.0.0-0.nightly-2019-03-23-222829 True False 19h Cluster version is 4.0.0-0.nightly-2019-03-23-222829 # oc version Client Version: version.Info{Major:"4", Minor:"0+", GitVersion:"v4.0.22", GitCommit:"219bbe2f0c", GitTreeState:"", BuildDate:"2019-03-10T22:23:11Z", GoVersion:"", Compiler:"", Platform:""} Server Version: version.Info{Major:"1", Minor:"12+", GitVersion:"v1.12.4+8156b0c", GitCommit:"8156b0c", GitTreeState:"clean", BuildDate:"2019-03-22T10:37:27Z", GoVersion:"go1.10.8", Compiler:"gc", Platform:"linux/amd64"} # oc version --short Client Version: v4.0.22 Server Version: v1.12.4+8156b0c # oc get nodes NAME STATUS ROLES AGE VERSION ip-10-0-135-200.us-east-2.compute.internal Ready master 21h v1.12.4+d14915559e ip-10-0-141-228.us-east-2.compute.internal Ready worker 20h v1.12.4+d14915559e ip-10-0-145-186.us-east-2.compute.internal Ready worker 20h v1.12.4+d14915559e ip-10-0-154-63.us-east-2.compute.internal Ready master 21h v1.12.4+d14915559e ip-10-0-167-72.us-east-2.compute.internal Ready master 21h v1.12.4+d14915559e ip-10-0-169-178.us-east-2.compute.internal Ready worker 20h v1.12.4+d14915559e How reproducible: Many times, but the hanging can happen when deleting any number of projects larger than 15 I also hit this issue with earlier versions of oc client (oc_v4.0.0-0.185.0 and oc_v4.0.0-0.164.0) and on different jump hosts. Steps to Reproduce: 1. Install an OCP 4.1 cluster on AWS (3 master, and 3 worker nodes m5.xlarge) with OPENSHIFT_INSTALL_RELEASE_IMAGE_OVERRIDE=registry.svc.ci.openshift.org/ocp/release:4.0.0-0.nightly-2019-03-23-222829 2. Create 20 projects: for i in {0..20}; do echo "==== Creating project i: $i"; oc new-project clusterproject${i} ; done 3. Label the 20 projects with "project=test" for i in {0..20}; do echo "==== Labeling project: $i"; oc label --overwrite namespace clusterproject${i} purpose=test ; done 4. Delete all the projects labeled "purpose=test": oc delete projects -l purpose=test 5. Repeat steps 1-3 a few times, and the 'oc delete projects -l purpose=test' command will hang 1 out of 3 times Actual results: All 20 projects actually get deleted but 'oc delete projects -l purpose=test' command will hang 1 out of 3 times and can take over an hour to return . . project.project.openshift.io "clusterproject0" deleted <hangs here> Expected results: All 20 created projects get deleted within a 10-15 seconds Additional info:
Is this still a problem with latest version?
We have not seen this issue on an 4.2 AWS cluster with nightly build version: 4.2.0-0.nightly-2019-08-14-112500
Based on previous comment, moving to qa in that case.
Walid, could you verify the bug when you have time? Thx in advance
Xingxing, this was verified on two 4.2 builds: 4.2.0-0.nightly-2019-08-22-153337 4.2.0-0.nightly-2019-08-14-112500 (comment 7) I am also verifying on 4.1.3 today. Will update the BZ after I run the tests. thanks.
Verified on OCP 4.1.3 # oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.1.13 True False 56m Cluster version is 4.1.13
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2922