Bug 2086432

Summary: A special resource that was created in OCP 4.9 can't be deleted after an upgrade to 4.10
Product: OpenShift Container Platform Reporter: yevgeny shnaidman <yshnaidm>
Component: Special Resource OperatorAssignee: yevgeny shnaidman <yshnaidm>
Status: CLOSED ERRATA QA Contact: Udi Kalifon <ukalifon>
Severity: urgent Docs Contact:
Priority: high    
Version: 4.10CC: bthurber, ukalifon, yshnaidm
Target Milestone: ---   
Target Release: 4.10.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 2084732 Environment:
Last Closed: 2022-06-28 11:50:26 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2084732    
Bug Blocks:    

Description yevgeny shnaidman 2022-05-16 06:40:22 UTC
+++ This bug was initially created as a clone of Bug #2084732 +++

Description of problem:
Deleting a special resource in OCP 4.10 hangs if the SR was created in 4.9 and went through an upgrade. When calling "oc delete sr/simple-kmod" from the CLI - the call never returns, and the resource and the created daemonset and pods etc' are never removed. In the log you see "Marked to be deleted" and it also looks like the reconciliation finished:

INFO    reconcile: UPDATE       Reconciling SpecialResource(s) in all Namespaces
INFO    reconcile: UPDATE       Marked to be deleted, reconciling finalizer
INFO    cache   Nodes cached    {"name": "worker-0-0"}
INFO    cache   Nodes cached    {"name": "worker-0-1"}
INFO    cache   Nodes cached    {"name": "worker-0-2"}
INFO    cache   Node list:      {"length": 3}   
INFO    cache   Nodes   {"num": 3}
INFO    warning         OnError: node Conflict Label specialresource.openshift.io/state-simple-kmod-1000 err %!s(<nil>)  
INFO    reconcile: UPDATE       Successfully finalized  {"SpecialResource:": "simple-kmod"}
INFO    status          Reconciling ClusterOperator     
INFO    status          Adding to relatedObjects        {"namespace": "simple-kmod"}
INFO    status          RECONCILE SUCCESS: Reconcile

This is because finalizers were changed between 4.9 and 4.10, and the finalizer from 4.9 is never removed since SRO in 4.10 is not aware of it.


Version-Release number of selected component (if applicable):
OCP 4.9 -> 4.10


How reproducible:
probably 100%


Steps to Reproduce:
1. Install OCP 4.9, and install NFD and SRO from OLM
2. Deploy the simple-kmod example
3. Upgrade to the latest 4.10
4. oc delete sr/simple-kmod


Actual results:
The SR is never deleted


Expected results:
The SR and all related resources should be removed

Comment 3 Udi Kalifon 2022-06-24 13:43:08 UTC
Verified using a custom-built operator which had the fix.

Comment 5 errata-xmlrpc 2022-06-28 11:50:26 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.10.20 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:5172