Bug 1969967 - [4.8.0] CR finalizers block resource deletions if the assisted-service POD is not available
Summary: [4.8.0] CR finalizers block resource deletions if the assisted-service POD is...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Advanced Cluster Management for Kubernetes
Classification: Red Hat
Component: Infrastructure Operator
Version: rhacm-2.3
Hardware: Unspecified
OS: Unspecified
low
low
Target Milestone: ---
: rhacm-2.3.1
Assignee: Michael Filanov
QA Contact: bjacot
Derek
URL:
Whiteboard: KNI-EDGE-JUKE-4.8 AI-Team-Hive
Depends On: 1968423
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-06-09 14:45 UTC by Michael Filanov
Modified: 2023-09-15 01:09 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of: 1968423
Environment:
Last Closed: 2021-10-08 03:15:42 UTC
Target Upstream Version:
Embargoed:
ming: rhacm-2.3+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github open-cluster-management backlog issues 15294 0 None None None 2021-08-16 17:58:08 UTC
Github openshift assisted-service pull 2080 0 None open [ocm-2.3] Bug 1969967: Document kube api teardown process 2021-06-24 05:10:35 UTC
Red Hat Bugzilla 1968423 1 low CLOSED [master] CR finalizers block resource deletions if the assisted-service POD is not available 2021-10-18 17:33:23 UTC

Description Michael Filanov 2021-06-09 14:45:37 UTC
+++ This bug was initially created as a clone of Bug #1968423 +++

Description of problem:

Once the operator is installed in the cluster I try to delete it using `oc delete ns/assisted-installer` and the command never ends.

Version-Release number of selected component (if applicable):


How reproducible:

I tried only once, this is the only env I have and I can't redeploy the operator without destroying the whole cluster.


Steps to Reproduce:
1.install the operator
2.`oc delete ns/assisted-installer`

Actual results:

The command never ends.

$ oc describe ns/assisted-installer:
```
Name:         assisted-installer
Labels:       kubernetes.io/metadata.name=assisted-installer
              name=assisted-installer
Annotations:  openshift.io/sa.scc.mcs: s0:c25,c20
              openshift.io/sa.scc.supplemental-groups: 1000640000/10000
              openshift.io/sa.scc.uid-range: 1000640000/10000
Status:       Terminating

No resource quota.

No LimitRange resource.
```


Expected results:

The namespace should be deleted.


Additional info:

--- Additional comment from ybettan on 20210607T11:45:23

# oc get ns/assisted-installer -o yaml
apiVersion: v1
kind: Namespace
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"v1","kind":"Namespace","metadata":{"annotations":{},"labels":{"name":"assisted-installer"},"name":"assisted-installer"}}
    openshift.io/sa.scc.mcs: s0:c25,c20
    openshift.io/sa.scc.supplemental-groups: 1000640000/10000
    openshift.io/sa.scc.uid-range: 1000640000/10000
  creationTimestamp: "2021-06-07T06:04:52Z"
  deletionTimestamp: "2021-06-07T09:49:29Z"
  labels:
    kubernetes.io/metadata.name: assisted-installer
    name: assisted-installer
  name: assisted-installer
  resourceVersion: "319665"
  uid: 40edd6ab-8361-47fd-9740-c7aaff74682c
spec:
  finalizers:
  - kubernetes
status:
  conditions:
  - lastTransitionTime: "2021-06-07T09:49:41Z"
    message: All resources successfully discovered
    reason: ResourcesDiscovered
    status: "False"
    type: NamespaceDeletionDiscoveryFailure
  - lastTransitionTime: "2021-06-07T09:49:41Z"
    message: All legacy kube types successfully parsed
    reason: ParsedGroupVersions
    status: "False"
    type: NamespaceDeletionGroupVersionParsingFailure
  - lastTransitionTime: "2021-06-07T09:50:07Z"
    message: All content successfully deleted, may be waiting on finalization
    reason: ContentDeleted
    status: "False"
    type: NamespaceDeletionContentFailure
  - lastTransitionTime: "2021-06-07T09:49:41Z"
    message: 'Some resources are remaining: agentclusterinstalls.extensions.hive.openshift.io
      has 1 resource instances, agents.agent-install.openshift.io has 1 resource instances,
      clusterdeployments.hive.openshift.io has 1 resource instances'
    reason: SomeResourcesRemain
    status: "True"
    type: NamespaceContentRemaining
  - lastTransitionTime: "2021-06-07T09:49:41Z"
    message: 'Some content in the namespace has finalizers remaining: agent.agent-install.openshift.io/ai-deprovision
      in 1 resource instances, agentclusterinstall.agent-install.openshift.io/ai-deprovision
      in 1 resource instances, clusterdeployments.agent-install.openshift.io/ai-deprovision
      in 1 resource instances'
    reason: SomeFinalizersRemain
    status: "True"
    type: NamespaceFinalizersRemaining
  phase: Terminating

--- Additional comment from fpercoco on 20210609T06:09:42

Looks like this is happening because the assisted-service pod may have been deleted before the rest of the resources. This causes the rest of the resources to be stuck as the finalizers may not complete.

> I tried only once, this is the only env I have and I can't redeploy the operator without destroying the whole cluster.

You should be able to remove the finalizers from the various resources.


For 4.8
==

We may not be able to provide a better user experience for this case in 4.8.0. Instead, we could document how to remove the finalizers and what a "healthy" cleanup workflow looks like:



For 4.9
==

we may want to think about something that would provide a better user experience.

The above being said I think this is not specific to the operator/platform but rather to the integration with Hive and the other CRs.

--- Additional comment from mfilanov on 20210609T10:20:46

How does hive handle it? they probably have the same issue @dgoodwin ?

--- Additional comment from dgoodwin on 20210609T11:19:02

Hive doesn't do anything to handle this, I believe this is working as expected. Finalizers block deletion until the controllers that placed them remove them. If the controllers are dead, the finalizers can't stay around, I don't think there's any viable option there. Best you could do would be to document a clean teardown process.

Comment 5 bjacot 2021-07-28 18:11:00 UTC
need to be verified on a downstream ACM version.  ACM has not cut us a new release image. This bug should not block the release.

Comment 8 bjacot 2021-08-05 14:37:11 UTC
bug needs to be verified on downstream acm.  should not block the release.

Comment 11 bjacot 2021-08-12 17:01:57 UTC
Moving bug off openshift product to RHACM.  this bug will need to be verified on a downtream RHACM with Assisted service image.

Comment 13 Ronnie Lazar 2021-08-17 09:09:53 UTC
@frolland Do you know why this clone is on the ACM product and not AI's component under OCP?

Comment 14 David Zager 2021-08-17 19:49:09 UTC
Looks like @bjacot moved it onto RHACM. Assuming that was done because it needs to be verified on RHACM but this update hasn't yet been included in an advisory.

I think this should still be against the AI component under OCP. What do you think @bjacot

Comment 16 juhsu 2021-09-23 21:13:31 UTC
Since Crystal mentioned this was fixed in ACM 2.3.1 (ie. the GA), can the bugzilla be closed?

Comment 17 Mike Ng 2021-09-23 21:26:22 UTC
G2Bsync 925985645 comment 
 CrystalChun Thu, 23 Sep 2021 16:49:56 UTC 
 G2Bsync Was already included in ACM 2.3 GA
Picked up in https://github.com/open-cluster-management/backlog/issues/13873

Comment 18 Yona First 2021-10-07 09:57:50 UTC
It doesn't look like there's a way to reproduce this downstream.
RHACM has AI bundled in it. Deleting namespace of RHACM requires more steps than just oc delete ns/rhacm (see RHACM documentation here: https://github.com/open-cluster-management/rhacm-docs/blob/64223d8f987ed0a1d9e3d886e3da93cec1dd0fb9/install/uninstall.adoc). Deleting RHACM through the appropriate steps removes AI. According to @bjacot who has tried it, this method works.

Comment 19 Red Hat Bugzilla 2023-09-15 01:09:33 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days


Note You need to log in before you can comment on or make changes to this bug.