Bug 1858186
| Summary: | The deletion of Compliance-Operator namespace stuck in `Terminating` state on `profilebundle.compliance` objects | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Prashant Dhamdhere <pdhamdhe> |
| Component: | Compliance Operator | Assignee: | Jakub Hrozek <jhrozek> |
| Status: | CLOSED ERRATA | QA Contact: | Prashant Dhamdhere <pdhamdhe> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 4.6 | CC: | ausov, josorior, mrogers, nkinder, xiyuan |
| Target Milestone: | --- | ||
| Target Release: | 4.6.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2020-10-27 16:15:30 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
Going to look into this next. As discussed today, going to turn this into a doc issue. Some more details on *why* we are turning this into a doc issue: - our code uses finalizers, which means that k8s just tags the objects for deletion by setting a non-zero deleteTimeStamp and lets the operator handle the rest - deleting a namespace deletes all objects in it, with no predictable order - what happens here is that the namespace is deleted, then the profilebundle objects are tagged for deletion but before the operator can process them, the operator itself goes away, so the objects are never removed There's really no sane of fixing this. And contrary to what I thought earlier, this issue does not affect deleting the operator via the UI, which would have been pretty bad. merged as https://github.com/openshift/compliance-operator/commit/4a3c6e5b4697a21adbf06a19071a9464225d0ad1 Looks good now, I am able to delete resources and namespace successfully with the following sequence/stpes:
verified on : 4.6.0-0.nightly-2020-09-12-230035
$ oc get pods
NAME READY STATUS RESTARTS AGE
compliance-operator-869646dd4f-jcdkw 1/1 Running 0 3m14s
ocp4-pp-6786c5f5b-6r8nn 1/1 Running 0 2m28s
rhcos4-pp-78c8cc9d44-4rvzq 1/1 Running 0 2m28s
[1] Remove the profilebundles which are mainly causing to namespace get stuck in the 'Terminating' state
after delete the namespace.
$ oc get profilebundle.compliance
NAME CONTENTIMAGE STATUS
ocp4 quay.io/complianceascode/ocp4:latest VALID
rhcos4 quay.io/complianceascode/ocp4:latest VALID
$ oc delete profilebundle.compliance ocp4 rhcos4
profilebundle.compliance.openshift.io "ocp4" deleted
profilebundle.compliance.openshift.io "rhcos4" deleted
$ oc get profilebundle.compliance
No resources found in openshift-compliance namespace.
$ oc get all
NAME READY STATUS RESTARTS AGE
pod/compliance-operator-869646dd4f-jcdkw 1/1 Running 0 4m19s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/compliance-operator-metrics ClusterIP 172.30.173.209 <none> 8383/TCP,8686/TCP 3m36s
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/compliance-operator 1/1 1 1 4m26s
NAME DESIRED CURRENT READY AGE
replicaset.apps/compliance-operator-869646dd4f 1 1 1 4m26s
[2] Delete the compliance operator deployment
$ oc delete deployment.apps/compliance-operator
deployment.apps "compliance-operator" deleted
[3] Remove all clusterroles & clusterrolebindings resources
$ oc delete clusterroles compliance-operator api-resource-collector
clusterrole.rbac.authorization.k8s.io "compliance-operator" deleted
clusterrole.rbac.authorization.k8s.io "api-resource-collector" deleted
$ oc delete clusterrolebindings compliance-operator api-resource-collector
clusterrolebinding.rbac.authorization.k8s.io "compliance-operator" deleted
clusterrolebinding.rbac.authorization.k8s.io "api-resource-collector" deleted
[4] Remove all CRDs
$ for f in $(oc get CustomResourceDefinition |grep compliance); do oc delete CustomResourceDefinition $f; done
customresourcedefinition.apiextensions.k8s.io "compliancecheckresults.compliance.openshift.io" deleted
customresourcedefinition.apiextensions.k8s.io "complianceremediations.compliance.openshift.io" deleted
customresourcedefinition.apiextensions.k8s.io "compliancescans.compliance.openshift.io" deleted
customresourcedefinition.apiextensions.k8s.io "compliancesuites.compliance.openshift.io" deleted
customresourcedefinition.apiextensions.k8s.io "profilebundles.compliance.openshift.io" deleted
customresourcedefinition.apiextensions.k8s.io "profiles.compliance.openshift.io" deleted
customresourcedefinition.apiextensions.k8s.io "rules.compliance.openshift.io" deleted
customresourcedefinition.apiextensions.k8s.io "scansettingbindings.compliance.openshift.io" deleted
customresourcedefinition.apiextensions.k8s.io "scansettings.compliance.openshift.io" deleted
customresourcedefinition.apiextensions.k8s.io "tailoredprofiles.compliance.openshift.io" deleted
customresourcedefinition.apiextensions.k8s.io "variables.compliance.openshift.io" deleted
[5] Finally, delete the namespace.
$ oc delete project openshift-compliance
project.project.openshift.io "openshift-compliance" deleted
$ oc get project/openshift-compliance
Error from server (NotFound): namespaces "openshift-compliance" not found
A workaround also works fine, if the namespace stuck in the 'Terminating' state
$ oc get pods
NAME READY STATUS RESTARTS AGE
compliance-operator-869646dd4f-4qxk2 1/1 Running 0 5m16s
ocp4-pp-6786c5f5b-lb6w2 1/1 Running 0 4m35s
rhcos4-pp-78c8cc9d44-hmkvq 1/1 Running 0 4m35s
[1] Delete the namespace 'openshift-compliance'
$ oc delete project openshift-compliance
project.project.openshift.io "openshift-compliance" deleted
$ oc get project/openshift-compliance
NAME DISPLAY NAME STATUS
openshift-compliance Terminating
[2] Check what are all the resources left in namespace and delete `finalizers` attributes manually
E.g
$ oc get profilebundle.compliance
NAME CONTENTIMAGE STATUS
ocp4 quay.io/complianceascode/ocp4:latest VALID
rhcos4 quay.io/complianceascode/ocp4:latest VALID
finalizers:
- profilebundle.finalizers.compliance.openshift.io <<-----
$ oc edit profilebundle.compliance
profilebundle.compliance.openshift.io/ocp4 edited
profilebundle.compliance.openshift.io/rhcos4 edited
[3] Check the namespace status
$ oc get project/openshift-compliance
Error from server (NotFound): namespaces "openshift-compliance" not found
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196 I know it's an old and already fixed bug, but I had the same issue. I'm using ArgoCD to manage compliance operator and while removing finalizers on dependent resources helped, ArgoCD now can't re-create the namespace. ArgoCD creates it and then immediately deletes it, leading to "namespace openshift-compliance not found" error when attempting to create objects in that namespace. I noticed that it only happens when app.kubernetes.io/instance label is set on the namespace. Does anyone have any idea what it might be? |
Description of problem: The deletion of Compliance-Operator namespace "openshift-compliance" stuck in `Terminating` state on `profilebundle.compliance` objects $ oc get project openshift-compliance; date NAME DISPLAY NAME STATUS openshift-compliance Terminating Friday 17 July 2020 11:44:28 AM IST $ date; oc api-resources --verbs=list --namespaced -o name | xargs -n 1 oc get --show-kind --ignore-not-found -nopenshift-compliance Friday 17 July 2020 11:48:06 AM IST NAME PHASE RESULT compliancescan.compliance.openshift.io/workers-scan NAME CONTENTIMAGE STATUS profilebundle.compliance.openshift.io/ocp4 quay.io/complianceascode/ocp4:latest VALID profilebundle.compliance.openshift.io/rhcos4 quay.io/complianceascode/ocp4:latest VALID $ oc get project openshift-compliance; date NAME DISPLAY NAME STATUS openshift-compliance Terminating Friday 17 July 2020 12:28:08 PM IST Version-Release number of selected component (if applicable): 4.6.0-0.nightly-2020-07-16-234354 How reproducible: Always Steps to Reproduce: 1 Create 'openshift-compliance' namespace $ oc create -f compliance-operator/deploy/ns.yaml 2. Switch to 'openshift-compliance' namespace $ oc project openshift-compliance 3. Deploy all CustomResourceDefinition. $ for f in $(ls -1 compliance-operator/deploy/crds/*crd.yaml); do oc create -f $f; done 4. Deploy compliance-operator $ oc create -f compliance-operator/deploy/ 5. Deploy compliancesuite CR $ oc create -f - <<EOF apiVersion: compliance.openshift.io/v1alpha1 kind: ComplianceSuite metadata: name: example-compliancesuite spec: autoApplyRemediations: false schedule: "0 1 * * *" scans: - name: workers-scan profile: xccdf_org.ssgproject.content_profile_moderate content: ssg-rhcos4-ds.xml contentImage: quay.io/complianceascode/ocp4:latest debug: true nodeSelector: node-role.kubernetes.io/worker: "" EOF 6. Delete the 'openshift-compliance' namespace $ oc delete project openshift-compliance 7. Check status namespace, it stuck in Terminating state $ oc get project openshift-compliance; date Actual results: The Compliance-Operator namespace "openshift-compliance" stuck in Terminating state on profilebundle.compliance objects $ oc get all No resources found in openshift-compliance namespace. $ oc get profilebundle.compliance NAME CONTENTIMAGE STATUS ocp4 quay.io/complianceascode/ocp4:latest VALID rhcos4 quay.io/complianceascode/ocp4:latest VALID $ oc get compliancescan.compliance NAME PHASE RESULT workers-scan Expected results: The Compliance-Operator namespace "openshift-compliance" should get deleted without sticking in 'Terminating' state Additional info: