Bug 2234948

Summary: [4.13 backport] Update client-go library to avoid crash on OCP 4.14
Product: [Red Hat Storage] Red Hat OpenShift Data Foundation Reporter: Travis Nielsen <tnielsen>
Component: rookAssignee: Subham Rai <srai>
Status: CLOSED ERRATA QA Contact: Shivam Durgbuns <sdurgbun>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 4.13CC: kramdoss, odf-bz-bot, pbalogh, srai
Target Milestone: ---   
Target Release: ODF 4.13.3   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: 4.13.3-2 Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-09-27 14:22:44 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Travis Nielsen 2023-08-25 18:49:49 UTC
Description of problem (please be detailed as possible and provide log
snippests):

OCP-4.14 has a dependency on the k8s.io/client-go library which should be on version v0.26.4 or higher to avoid pods entering into the CrashLoopBackOff state when aggregated discovery is enabled on K8s 1.27+ environments, as also seen for RDR in #2228319.

Rook saw this failure upstream a few months back as seen in this issue: 
https://github.com/rook/rook/issues/12114

The upstream fix for Rook v1.11 is here: https://github.com/rook/rook/pull/12161

The fix is already in rook for ODF 4.14, but needs to be backported to 4.13.


Version of all relevant components (if applicable):

ODF 4.13

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?

Without this fix, Rook will fail when ODF 4.13 is run on OCP 4.14.

Is there any workaround available to the best of your knowledge?

No

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?

1

Can this issue reproducible?

It will be 100% reproducible if we don't get a fix out before OCP 4.14 is released.


Can this issue reproduce from the UI?

NA

If this is a regression, please provide more details to justify this:

NA

Steps to Reproduce:
1. Install OCP 4.14 
2. Install ODF 4.13


Actual results:

The operator would crash

Expected results:

The operator should not crash

Additional info:

Comment 3 Travis Nielsen 2023-08-25 18:53:15 UTC
Subham, please look at backporting https://github.com/rook/rook/pull/12161 to downstream release-4.13

Comment 4 Travis Nielsen 2023-08-25 18:56:51 UTC
Or if there are merge conflicts, perhaps there is a more scoped fix similar to the fix that ocs operator made: https://github.com/red-hat-storage/ocs-operator/commit/a35a4f970894170a9dadd525e1b590b40b63985a

Comment 11 Shivam Durgbuns 2023-09-11 10:35:33 UTC
Moving to verified, as deployment completed without any crashloopback of pods
Job: https://ocs4-jenkins-csb-odf-qe.apps.ocp-c1.prod.psi.redhat.com/job/qe-deploy-ocs-cluster/29202/

Comment 17 errata-xmlrpc 2023-09-27 14:22:44 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.13.3 security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:5376