Bug 1858186 - The deletion of Compliance-Operator namespace stuck in `Terminating` state on `profilebundle.compliance` objects
Summary: The deletion of Compliance-Operator namespace stuck in `Terminating` state on...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Compliance Operator
Version: 4.6
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.6.0
Assignee: Jakub Hrozek
QA Contact: Prashant Dhamdhere
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-07-17 07:03 UTC by Prashant Dhamdhere
Modified: 2022-03-18 05:22 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-10-27 16:15:30 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2020:4196 0 None None None 2020-10-27 16:16:01 UTC

Description Prashant Dhamdhere 2020-07-17 07:03:13 UTC
Description of problem:


The deletion of Compliance-Operator namespace "openshift-compliance" stuck in `Terminating` state on `profilebundle.compliance` objects 

$ oc get project openshift-compliance; date 
NAME                   DISPLAY NAME   STATUS 
openshift-compliance                  Terminating 
Friday 17 July 2020 11:44:28 AM IST 

$ date; oc api-resources --verbs=list --namespaced -o name | xargs -n 1 oc get --show-kind --ignore-not-found -nopenshift-compliance 
Friday 17 July 2020 11:48:06 AM IST 
NAME                                                  PHASE   RESULT 
compliancescan.compliance.openshift.io/workers-scan            
NAME                                           CONTENTIMAGE                           STATUS 
profilebundle.compliance.openshift.io/ocp4     quay.io/complianceascode/ocp4:latest   VALID 
profilebundle.compliance.openshift.io/rhcos4   quay.io/complianceascode/ocp4:latest   VALID 

$ oc get project openshift-compliance; date
NAME                   DISPLAY NAME   STATUS
openshift-compliance                  Terminating
Friday 17 July 2020 12:28:08 PM IST


Version-Release number of selected component (if applicable):

4.6.0-0.nightly-2020-07-16-234354 

How reproducible:

Always

Steps to Reproduce:

1  Create 'openshift-compliance' namespace 

$ oc create -f compliance-operator/deploy/ns.yaml   

2. Switch to 'openshift-compliance' namespace 

$ oc project openshift-compliance 

3. Deploy all CustomResourceDefinition. 

$ for f in $(ls -1 compliance-operator/deploy/crds/*crd.yaml); do oc create -f $f; done 

4. Deploy compliance-operator 

$ oc create -f compliance-operator/deploy/ 

5. Deploy compliancesuite CR  

$ oc create -f - <<EOF 
apiVersion: compliance.openshift.io/v1alpha1 
kind: ComplianceSuite 
metadata: 
  name: example-compliancesuite 
spec: 
  autoApplyRemediations: false 
  schedule: "0 1 * * *" 
  scans: 
    - name: workers-scan 
      profile: xccdf_org.ssgproject.content_profile_moderate 
      content: ssg-rhcos4-ds.xml 
      contentImage: quay.io/complianceascode/ocp4:latest 
      debug: true 
      nodeSelector: 
        node-role.kubernetes.io/worker: "" 
EOF 

6. Delete the 'openshift-compliance' namespace 

$ oc delete project openshift-compliance 

7. Check status namespace, it stuck in Terminating state 

$ oc get project openshift-compliance; date  


Actual results:

The Compliance-Operator namespace "openshift-compliance" stuck in Terminating state on profilebundle.compliance objects

$ oc get all
No resources found in openshift-compliance namespace.

$ oc get profilebundle.compliance
NAME     CONTENTIMAGE                           STATUS
ocp4     quay.io/complianceascode/ocp4:latest   VALID
rhcos4   quay.io/complianceascode/ocp4:latest   VALID

$ oc get compliancescan.compliance
NAME           PHASE   RESULT
workers-scan           


Expected results:

The Compliance-Operator namespace "openshift-compliance" should get deleted without sticking in 'Terminating' state 


Additional info:

Comment 2 Jakub Hrozek 2020-08-12 15:40:47 UTC
Going to look into this next.

Comment 3 Jakub Hrozek 2020-08-17 11:01:31 UTC
As discussed today, going to turn this into a doc issue.

Comment 4 Jakub Hrozek 2020-08-17 11:19:10 UTC
Some more details on *why* we are turning this into a doc issue:
 - our code uses finalizers, which means that k8s just tags the objects for deletion by setting a non-zero deleteTimeStamp and lets the operator handle the rest
 - deleting a namespace deletes all objects in it, with no predictable order
 - what happens here is that the namespace is deleted, then the profilebundle objects are tagged for deletion but before the operator can process them, the operator itself goes away, so the objects are never removed

There's really no sane of fixing this. And contrary to what I thought earlier, this issue does not affect deleting the operator via the UI, which would have been pretty bad.

Comment 5 Jakub Hrozek 2020-08-17 11:28:32 UTC
PR: https://github.com/openshift/compliance-operator/pull/401

Comment 9 Prashant Dhamdhere 2020-09-15 12:40:04 UTC
Looks good now, I am able to delete resources and namespace successfully with the following sequence/stpes:


verified on : 4.6.0-0.nightly-2020-09-12-230035


$ oc get pods
NAME                                   READY   STATUS    RESTARTS   AGE
compliance-operator-869646dd4f-jcdkw   1/1     Running   0          3m14s
ocp4-pp-6786c5f5b-6r8nn                1/1     Running   0          2m28s
rhcos4-pp-78c8cc9d44-4rvzq             1/1     Running   0          2m28s

[1] Remove the profilebundles which are mainly causing to namespace get stuck in the 'Terminating' state 
    after delete the namespace.

$ oc get profilebundle.compliance
NAME     CONTENTIMAGE                           STATUS
ocp4     quay.io/complianceascode/ocp4:latest   VALID
rhcos4   quay.io/complianceascode/ocp4:latest   VALID

$ oc delete profilebundle.compliance ocp4 rhcos4
profilebundle.compliance.openshift.io "ocp4" deleted
profilebundle.compliance.openshift.io "rhcos4" deleted

$ oc get profilebundle.compliance
No resources found in openshift-compliance namespace.

$ oc get all
NAME                                       READY   STATUS    RESTARTS   AGE
pod/compliance-operator-869646dd4f-jcdkw   1/1     Running   0          4m19s

NAME                                  TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)             AGE
service/compliance-operator-metrics   ClusterIP   172.30.173.209   <none>        8383/TCP,8686/TCP   3m36s

NAME                                  READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/compliance-operator   1/1     1            1           4m26s

NAME                                             DESIRED   CURRENT   READY   AGE
replicaset.apps/compliance-operator-869646dd4f   1         1         1       4m26s

[2] Delete the compliance operator deployment 

$ oc delete deployment.apps/compliance-operator
deployment.apps "compliance-operator" deleted

[3] Remove all clusterroles & clusterrolebindings resources

$ oc delete clusterroles compliance-operator api-resource-collector
clusterrole.rbac.authorization.k8s.io "compliance-operator" deleted
clusterrole.rbac.authorization.k8s.io "api-resource-collector" deleted

$ oc delete clusterrolebindings compliance-operator api-resource-collector
clusterrolebinding.rbac.authorization.k8s.io "compliance-operator" deleted
clusterrolebinding.rbac.authorization.k8s.io "api-resource-collector" deleted

[4] Remove all CRDs

$ for f in $(oc get CustomResourceDefinition |grep compliance); do oc delete CustomResourceDefinition $f; done
customresourcedefinition.apiextensions.k8s.io "compliancecheckresults.compliance.openshift.io" deleted
customresourcedefinition.apiextensions.k8s.io "complianceremediations.compliance.openshift.io" deleted
customresourcedefinition.apiextensions.k8s.io "compliancescans.compliance.openshift.io" deleted
customresourcedefinition.apiextensions.k8s.io "compliancesuites.compliance.openshift.io" deleted
customresourcedefinition.apiextensions.k8s.io "profilebundles.compliance.openshift.io" deleted
customresourcedefinition.apiextensions.k8s.io "profiles.compliance.openshift.io" deleted
customresourcedefinition.apiextensions.k8s.io "rules.compliance.openshift.io" deleted
customresourcedefinition.apiextensions.k8s.io "scansettingbindings.compliance.openshift.io" deleted
customresourcedefinition.apiextensions.k8s.io "scansettings.compliance.openshift.io" deleted
customresourcedefinition.apiextensions.k8s.io "tailoredprofiles.compliance.openshift.io" deleted
customresourcedefinition.apiextensions.k8s.io "variables.compliance.openshift.io" deleted

[5] Finally, delete the namespace.

$ oc delete project openshift-compliance
project.project.openshift.io "openshift-compliance" deleted

$ oc get project/openshift-compliance
Error from server (NotFound): namespaces "openshift-compliance" not found


A workaround also works fine, if the namespace stuck in the 'Terminating' state 

$ oc get pods 
NAME                                   READY   STATUS    RESTARTS   AGE
compliance-operator-869646dd4f-4qxk2   1/1     Running   0          5m16s
ocp4-pp-6786c5f5b-lb6w2                1/1     Running   0          4m35s
rhcos4-pp-78c8cc9d44-hmkvq             1/1     Running   0          4m35s


[1] Delete the namespace 'openshift-compliance'

$ oc delete project openshift-compliance
project.project.openshift.io "openshift-compliance" deleted

$ oc get project/openshift-compliance
NAME                   DISPLAY NAME   STATUS
openshift-compliance                  Terminating


[2] Check what are all the resources left in namespace and delete `finalizers` attributes manually

E.g

$ oc get profilebundle.compliance
NAME     CONTENTIMAGE                           STATUS
ocp4     quay.io/complianceascode/ocp4:latest   VALID
rhcos4   quay.io/complianceascode/ocp4:latest   VALID

finalizers:
- profilebundle.finalizers.compliance.openshift.io  <<-----

$ oc edit profilebundle.compliance
profilebundle.compliance.openshift.io/ocp4 edited
profilebundle.compliance.openshift.io/rhcos4 edited

[3] Check the namespace status

$ oc get project/openshift-compliance
Error from server (NotFound): namespaces "openshift-compliance" not found

Comment 11 errata-xmlrpc 2020-10-27 16:15:30 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196

Comment 12 Aleksey Usov 2022-03-18 05:22:15 UTC
I know it's an old and already fixed bug, but I had the same issue. I'm using ArgoCD to manage compliance operator and while removing finalizers on dependent resources helped, ArgoCD now can't re-create the namespace. ArgoCD creates it and then immediately deletes it, leading to "namespace openshift-compliance not found" error when attempting to create objects in that namespace. I noticed that it only happens when app.kubernetes.io/instance label is set on the namespace.
Does anyone have any idea what it might be?


Note You need to log in before you can comment on or make changes to this bug.