Description of problem: Once the operator is installed in the cluster I try to delete it using `oc delete ns/assisted-installer` and the command never ends. Version-Release number of selected component (if applicable): How reproducible: I tried only once, this is the only env I have and I can't redeploy the operator without destroying the whole cluster. Steps to Reproduce: 1.install the operator 2.`oc delete ns/assisted-installer` Actual results: The command never ends. $ oc describe ns/assisted-installer: ``` Name: assisted-installer Labels: kubernetes.io/metadata.name=assisted-installer name=assisted-installer Annotations: openshift.io/sa.scc.mcs: s0:c25,c20 openshift.io/sa.scc.supplemental-groups: 1000640000/10000 openshift.io/sa.scc.uid-range: 1000640000/10000 Status: Terminating No resource quota. No LimitRange resource. ``` Expected results: The namespace should be deleted. Additional info:
# oc get ns/assisted-installer -o yaml apiVersion: v1 kind: Namespace metadata: annotations: kubectl.kubernetes.io/last-applied-configuration: | {"apiVersion":"v1","kind":"Namespace","metadata":{"annotations":{},"labels":{"name":"assisted-installer"},"name":"assisted-installer"}} openshift.io/sa.scc.mcs: s0:c25,c20 openshift.io/sa.scc.supplemental-groups: 1000640000/10000 openshift.io/sa.scc.uid-range: 1000640000/10000 creationTimestamp: "2021-06-07T06:04:52Z" deletionTimestamp: "2021-06-07T09:49:29Z" labels: kubernetes.io/metadata.name: assisted-installer name: assisted-installer name: assisted-installer resourceVersion: "319665" uid: 40edd6ab-8361-47fd-9740-c7aaff74682c spec: finalizers: - kubernetes status: conditions: - lastTransitionTime: "2021-06-07T09:49:41Z" message: All resources successfully discovered reason: ResourcesDiscovered status: "False" type: NamespaceDeletionDiscoveryFailure - lastTransitionTime: "2021-06-07T09:49:41Z" message: All legacy kube types successfully parsed reason: ParsedGroupVersions status: "False" type: NamespaceDeletionGroupVersionParsingFailure - lastTransitionTime: "2021-06-07T09:50:07Z" message: All content successfully deleted, may be waiting on finalization reason: ContentDeleted status: "False" type: NamespaceDeletionContentFailure - lastTransitionTime: "2021-06-07T09:49:41Z" message: 'Some resources are remaining: agentclusterinstalls.extensions.hive.openshift.io has 1 resource instances, agents.agent-install.openshift.io has 1 resource instances, clusterdeployments.hive.openshift.io has 1 resource instances' reason: SomeResourcesRemain status: "True" type: NamespaceContentRemaining - lastTransitionTime: "2021-06-07T09:49:41Z" message: 'Some content in the namespace has finalizers remaining: agent.agent-install.openshift.io/ai-deprovision in 1 resource instances, agentclusterinstall.agent-install.openshift.io/ai-deprovision in 1 resource instances, clusterdeployments.agent-install.openshift.io/ai-deprovision in 1 resource instances' reason: SomeFinalizersRemain status: "True" type: NamespaceFinalizersRemaining phase: Terminating
Looks like this is happening because the assisted-service pod may have been deleted before the rest of the resources. This causes the rest of the resources to be stuck as the finalizers may not complete. > I tried only once, this is the only env I have and I can't redeploy the operator without destroying the whole cluster. You should be able to remove the finalizers from the various resources. For 4.8 == We may not be able to provide a better user experience for this case in 4.8.0. Instead, we could document how to remove the finalizers and what a "healthy" cleanup workflow looks like: For 4.9 == we may want to think about something that would provide a better user experience. The above being said I think this is not specific to the operator/platform but rather to the integration with Hive and the other CRs.
How does hive handle it? they probably have the same issue @dgoodwin ?
Hive doesn't do anything to handle this, I believe this is working as expected. Finalizers block deletion until the controllers that placed them remove them. If the controllers are dead, the finalizers can't stay around, I don't think there's any viable option there. Best you could do would be to document a clean teardown process.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:3759