Bug 2084732 - A special resource that was created in OCP 4.9 can't be deleted after an upgrade to 4.10
Summary: A special resource that was created in OCP 4.9 can't be deleted after an upgr...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Special Resource Operator
Version: 4.10
Hardware: Unspecified
OS: Unspecified
high
urgent
Target Milestone: ---
: 4.11.0
Assignee: yevgeny shnaidman
QA Contact: Udi Kalifon
URL:
Whiteboard:
Depends On:
Blocks: 2086432
TreeView+ depends on / blocked
 
Reported: 2022-05-12 16:18 UTC by Udi Kalifon
Modified: 2022-08-10 11:11 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
: 2086432 (view as bug list)
Environment:
Last Closed: 2022-08-10 11:11:36 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift special-resource-operator pull 208 0 None open Bug 2084732: fix SRO CR finalization in case of an upgrade 2022-05-15 11:13:48 UTC
Red Hat Product Errata RHSA-2022:5069 0 None None None 2022-08-10 11:11:48 UTC

Description Udi Kalifon 2022-05-12 16:18:55 UTC
Description of problem:
Deleting a special resource in OCP 4.10 hangs if the SR was created in 4.9 and went through an upgrade. When calling "oc delete sr/simple-kmod" from the CLI - the call never returns, and the resource and the created daemonset and pods etc' are never removed. In the log you see "Marked to be deleted" and it also looks like the reconciliation finished:

INFO    reconcile: UPDATE       Reconciling SpecialResource(s) in all Namespaces
INFO    reconcile: UPDATE       Marked to be deleted, reconciling finalizer
INFO    cache   Nodes cached    {"name": "worker-0-0"}
INFO    cache   Nodes cached    {"name": "worker-0-1"}
INFO    cache   Nodes cached    {"name": "worker-0-2"}
INFO    cache   Node list:      {"length": 3}   
INFO    cache   Nodes   {"num": 3}
INFO    warning         OnError: node Conflict Label specialresource.openshift.io/state-simple-kmod-1000 err %!s(<nil>)  
INFO    reconcile: UPDATE       Successfully finalized  {"SpecialResource:": "simple-kmod"}
INFO    status          Reconciling ClusterOperator     
INFO    status          Adding to relatedObjects        {"namespace": "simple-kmod"}
INFO    status          RECONCILE SUCCESS: Reconcile

This is because finalizers were changed between 4.9 and 4.10, and the finalizer from 4.9 is never removed since SRO in 4.10 is not aware of it.


Version-Release number of selected component (if applicable):
OCP 4.9 -> 4.10


How reproducible:
probably 100%


Steps to Reproduce:
1. Install OCP 4.9, and install NFD and SRO from OLM
2. Deploy the simple-kmod example
3. Upgrade to the latest 4.10
4. oc delete sr/simple-kmod


Actual results:
The SR is never deleted


Expected results:
The SR and all related resources should be removed

Comment 2 Udi Kalifon 2022-06-15 13:16:31 UTC
Verified using a custom-built image of SRO 4.10. Thanks Yevgeny!

Comment 4 errata-xmlrpc 2022-08-10 11:11:36 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069


Note You need to log in before you can comment on or make changes to this bug.