Bug 1968423
| Summary: | [master] CR finalizers block resource deletions if the assisted-service POD is not available | |||
|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Yoni Bettan <ybettan> | |
| Component: | assisted-installer | Assignee: | Fred Rolland <frolland> | |
| assisted-installer sub component: | Deployment Operator | QA Contact: | bjacot | |
| Status: | CLOSED ERRATA | Docs Contact: | ||
| Severity: | low | |||
| Priority: | low | CC: | aos-bugs, fpercoco, frolland | |
| Version: | 4.8 | Keywords: | Triaged | |
| Target Milestone: | --- | |||
| Target Release: | 4.9.0 | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | KNI-EDGE-JUKE-4.8 AI-Team-Hive | |||
| Fixed In Version: | Doc Type: | No Doc Update | ||
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1969967 (view as bug list) | Environment: | ||
| Last Closed: | 2021-10-18 17:32:56 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1969967 | |||
# oc get ns/assisted-installer -o yaml
apiVersion: v1
kind: Namespace
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"v1","kind":"Namespace","metadata":{"annotations":{},"labels":{"name":"assisted-installer"},"name":"assisted-installer"}}
openshift.io/sa.scc.mcs: s0:c25,c20
openshift.io/sa.scc.supplemental-groups: 1000640000/10000
openshift.io/sa.scc.uid-range: 1000640000/10000
creationTimestamp: "2021-06-07T06:04:52Z"
deletionTimestamp: "2021-06-07T09:49:29Z"
labels:
kubernetes.io/metadata.name: assisted-installer
name: assisted-installer
name: assisted-installer
resourceVersion: "319665"
uid: 40edd6ab-8361-47fd-9740-c7aaff74682c
spec:
finalizers:
- kubernetes
status:
conditions:
- lastTransitionTime: "2021-06-07T09:49:41Z"
message: All resources successfully discovered
reason: ResourcesDiscovered
status: "False"
type: NamespaceDeletionDiscoveryFailure
- lastTransitionTime: "2021-06-07T09:49:41Z"
message: All legacy kube types successfully parsed
reason: ParsedGroupVersions
status: "False"
type: NamespaceDeletionGroupVersionParsingFailure
- lastTransitionTime: "2021-06-07T09:50:07Z"
message: All content successfully deleted, may be waiting on finalization
reason: ContentDeleted
status: "False"
type: NamespaceDeletionContentFailure
- lastTransitionTime: "2021-06-07T09:49:41Z"
message: 'Some resources are remaining: agentclusterinstalls.extensions.hive.openshift.io
has 1 resource instances, agents.agent-install.openshift.io has 1 resource instances,
clusterdeployments.hive.openshift.io has 1 resource instances'
reason: SomeResourcesRemain
status: "True"
type: NamespaceContentRemaining
- lastTransitionTime: "2021-06-07T09:49:41Z"
message: 'Some content in the namespace has finalizers remaining: agent.agent-install.openshift.io/ai-deprovision
in 1 resource instances, agentclusterinstall.agent-install.openshift.io/ai-deprovision
in 1 resource instances, clusterdeployments.agent-install.openshift.io/ai-deprovision
in 1 resource instances'
reason: SomeFinalizersRemain
status: "True"
type: NamespaceFinalizersRemaining
phase: Terminating
Looks like this is happening because the assisted-service pod may have been deleted before the rest of the resources. This causes the rest of the resources to be stuck as the finalizers may not complete.
> I tried only once, this is the only env I have and I can't redeploy the operator without destroying the whole cluster.
You should be able to remove the finalizers from the various resources.
For 4.8
==
We may not be able to provide a better user experience for this case in 4.8.0. Instead, we could document how to remove the finalizers and what a "healthy" cleanup workflow looks like:
For 4.9
==
we may want to think about something that would provide a better user experience.
The above being said I think this is not specific to the operator/platform but rather to the integration with Hive and the other CRs.
How does hive handle it? they probably have the same issue @dgoodwin ? Hive doesn't do anything to handle this, I believe this is working as expected. Finalizers block deletion until the controllers that placed them remove them. If the controllers are dead, the finalizers can't stay around, I don't think there's any viable option there. Best you could do would be to document a clean teardown process. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:3759 |
Description of problem: Once the operator is installed in the cluster I try to delete it using `oc delete ns/assisted-installer` and the command never ends. Version-Release number of selected component (if applicable): How reproducible: I tried only once, this is the only env I have and I can't redeploy the operator without destroying the whole cluster. Steps to Reproduce: 1.install the operator 2.`oc delete ns/assisted-installer` Actual results: The command never ends. $ oc describe ns/assisted-installer: ``` Name: assisted-installer Labels: kubernetes.io/metadata.name=assisted-installer name=assisted-installer Annotations: openshift.io/sa.scc.mcs: s0:c25,c20 openshift.io/sa.scc.supplemental-groups: 1000640000/10000 openshift.io/sa.scc.uid-range: 1000640000/10000 Status: Terminating No resource quota. No LimitRange resource. ``` Expected results: The namespace should be deleted. Additional info: