Description of problem: The deletion of Compliance-Operator namespace "openshift-compliance" stuck in `Terminating` state on `profilebundle.compliance` objects $ oc get project openshift-compliance; date NAME DISPLAY NAME STATUS openshift-compliance Terminating Friday 17 July 2020 11:44:28 AM IST $ date; oc api-resources --verbs=list --namespaced -o name | xargs -n 1 oc get --show-kind --ignore-not-found -nopenshift-compliance Friday 17 July 2020 11:48:06 AM IST NAME PHASE RESULT compliancescan.compliance.openshift.io/workers-scan NAME CONTENTIMAGE STATUS profilebundle.compliance.openshift.io/ocp4 quay.io/complianceascode/ocp4:latest VALID profilebundle.compliance.openshift.io/rhcos4 quay.io/complianceascode/ocp4:latest VALID $ oc get project openshift-compliance; date NAME DISPLAY NAME STATUS openshift-compliance Terminating Friday 17 July 2020 12:28:08 PM IST Version-Release number of selected component (if applicable): 4.6.0-0.nightly-2020-07-16-234354 How reproducible: Always Steps to Reproduce: 1 Create 'openshift-compliance' namespace $ oc create -f compliance-operator/deploy/ns.yaml 2. Switch to 'openshift-compliance' namespace $ oc project openshift-compliance 3. Deploy all CustomResourceDefinition. $ for f in $(ls -1 compliance-operator/deploy/crds/*crd.yaml); do oc create -f $f; done 4. Deploy compliance-operator $ oc create -f compliance-operator/deploy/ 5. Deploy compliancesuite CR $ oc create -f - <<EOF apiVersion: compliance.openshift.io/v1alpha1 kind: ComplianceSuite metadata: name: example-compliancesuite spec: autoApplyRemediations: false schedule: "0 1 * * *" scans: - name: workers-scan profile: xccdf_org.ssgproject.content_profile_moderate content: ssg-rhcos4-ds.xml contentImage: quay.io/complianceascode/ocp4:latest debug: true nodeSelector: node-role.kubernetes.io/worker: "" EOF 6. Delete the 'openshift-compliance' namespace $ oc delete project openshift-compliance 7. Check status namespace, it stuck in Terminating state $ oc get project openshift-compliance; date Actual results: The Compliance-Operator namespace "openshift-compliance" stuck in Terminating state on profilebundle.compliance objects $ oc get all No resources found in openshift-compliance namespace. $ oc get profilebundle.compliance NAME CONTENTIMAGE STATUS ocp4 quay.io/complianceascode/ocp4:latest VALID rhcos4 quay.io/complianceascode/ocp4:latest VALID $ oc get compliancescan.compliance NAME PHASE RESULT workers-scan Expected results: The Compliance-Operator namespace "openshift-compliance" should get deleted without sticking in 'Terminating' state Additional info:
Going to look into this next.
As discussed today, going to turn this into a doc issue.
Some more details on *why* we are turning this into a doc issue: - our code uses finalizers, which means that k8s just tags the objects for deletion by setting a non-zero deleteTimeStamp and lets the operator handle the rest - deleting a namespace deletes all objects in it, with no predictable order - what happens here is that the namespace is deleted, then the profilebundle objects are tagged for deletion but before the operator can process them, the operator itself goes away, so the objects are never removed There's really no sane of fixing this. And contrary to what I thought earlier, this issue does not affect deleting the operator via the UI, which would have been pretty bad.
PR: https://github.com/openshift/compliance-operator/pull/401
merged as https://github.com/openshift/compliance-operator/commit/4a3c6e5b4697a21adbf06a19071a9464225d0ad1
Looks good now, I am able to delete resources and namespace successfully with the following sequence/stpes: verified on : 4.6.0-0.nightly-2020-09-12-230035 $ oc get pods NAME READY STATUS RESTARTS AGE compliance-operator-869646dd4f-jcdkw 1/1 Running 0 3m14s ocp4-pp-6786c5f5b-6r8nn 1/1 Running 0 2m28s rhcos4-pp-78c8cc9d44-4rvzq 1/1 Running 0 2m28s [1] Remove the profilebundles which are mainly causing to namespace get stuck in the 'Terminating' state after delete the namespace. $ oc get profilebundle.compliance NAME CONTENTIMAGE STATUS ocp4 quay.io/complianceascode/ocp4:latest VALID rhcos4 quay.io/complianceascode/ocp4:latest VALID $ oc delete profilebundle.compliance ocp4 rhcos4 profilebundle.compliance.openshift.io "ocp4" deleted profilebundle.compliance.openshift.io "rhcos4" deleted $ oc get profilebundle.compliance No resources found in openshift-compliance namespace. $ oc get all NAME READY STATUS RESTARTS AGE pod/compliance-operator-869646dd4f-jcdkw 1/1 Running 0 4m19s NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/compliance-operator-metrics ClusterIP 172.30.173.209 <none> 8383/TCP,8686/TCP 3m36s NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/compliance-operator 1/1 1 1 4m26s NAME DESIRED CURRENT READY AGE replicaset.apps/compliance-operator-869646dd4f 1 1 1 4m26s [2] Delete the compliance operator deployment $ oc delete deployment.apps/compliance-operator deployment.apps "compliance-operator" deleted [3] Remove all clusterroles & clusterrolebindings resources $ oc delete clusterroles compliance-operator api-resource-collector clusterrole.rbac.authorization.k8s.io "compliance-operator" deleted clusterrole.rbac.authorization.k8s.io "api-resource-collector" deleted $ oc delete clusterrolebindings compliance-operator api-resource-collector clusterrolebinding.rbac.authorization.k8s.io "compliance-operator" deleted clusterrolebinding.rbac.authorization.k8s.io "api-resource-collector" deleted [4] Remove all CRDs $ for f in $(oc get CustomResourceDefinition |grep compliance); do oc delete CustomResourceDefinition $f; done customresourcedefinition.apiextensions.k8s.io "compliancecheckresults.compliance.openshift.io" deleted customresourcedefinition.apiextensions.k8s.io "complianceremediations.compliance.openshift.io" deleted customresourcedefinition.apiextensions.k8s.io "compliancescans.compliance.openshift.io" deleted customresourcedefinition.apiextensions.k8s.io "compliancesuites.compliance.openshift.io" deleted customresourcedefinition.apiextensions.k8s.io "profilebundles.compliance.openshift.io" deleted customresourcedefinition.apiextensions.k8s.io "profiles.compliance.openshift.io" deleted customresourcedefinition.apiextensions.k8s.io "rules.compliance.openshift.io" deleted customresourcedefinition.apiextensions.k8s.io "scansettingbindings.compliance.openshift.io" deleted customresourcedefinition.apiextensions.k8s.io "scansettings.compliance.openshift.io" deleted customresourcedefinition.apiextensions.k8s.io "tailoredprofiles.compliance.openshift.io" deleted customresourcedefinition.apiextensions.k8s.io "variables.compliance.openshift.io" deleted [5] Finally, delete the namespace. $ oc delete project openshift-compliance project.project.openshift.io "openshift-compliance" deleted $ oc get project/openshift-compliance Error from server (NotFound): namespaces "openshift-compliance" not found A workaround also works fine, if the namespace stuck in the 'Terminating' state $ oc get pods NAME READY STATUS RESTARTS AGE compliance-operator-869646dd4f-4qxk2 1/1 Running 0 5m16s ocp4-pp-6786c5f5b-lb6w2 1/1 Running 0 4m35s rhcos4-pp-78c8cc9d44-hmkvq 1/1 Running 0 4m35s [1] Delete the namespace 'openshift-compliance' $ oc delete project openshift-compliance project.project.openshift.io "openshift-compliance" deleted $ oc get project/openshift-compliance NAME DISPLAY NAME STATUS openshift-compliance Terminating [2] Check what are all the resources left in namespace and delete `finalizers` attributes manually E.g $ oc get profilebundle.compliance NAME CONTENTIMAGE STATUS ocp4 quay.io/complianceascode/ocp4:latest VALID rhcos4 quay.io/complianceascode/ocp4:latest VALID finalizers: - profilebundle.finalizers.compliance.openshift.io <<----- $ oc edit profilebundle.compliance profilebundle.compliance.openshift.io/ocp4 edited profilebundle.compliance.openshift.io/rhcos4 edited [3] Check the namespace status $ oc get project/openshift-compliance Error from server (NotFound): namespaces "openshift-compliance" not found
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196
I know it's an old and already fixed bug, but I had the same issue. I'm using ArgoCD to manage compliance operator and while removing finalizers on dependent resources helped, ArgoCD now can't re-create the namespace. ArgoCD creates it and then immediately deletes it, leading to "namespace openshift-compliance not found" error when attempting to create objects in that namespace. I noticed that it only happens when app.kubernetes.io/instance label is set on the namespace. Does anyone have any idea what it might be?