2262087 – MCG fails to initialize the default IBM COS backingstore due to problematic node lookup logic

Bug 2262087 - MCG fails to initialize the default IBM COS backingstore due to problematic node lookup logic

Summary: MCG fails to initialize the default IBM COS backingstore due to problematic n...

Keywords:
Status:	CLOSED DUPLICATE of bug 2255557
Alias:	None
Product:	Red Hat OpenShift Data Foundation
Classification:	Red Hat Storage
Component:	Multi-Cloud Object Gateway
Sub Component:
Version:	4.13
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Nimrod Becker
QA Contact:	krishnaram Karthick
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2024-01-31 12:50 UTC by Daniel Osypenko
Modified:	2024-03-18 13:22 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2024-02-07 09:11:14 UTC
Embargoed:

Attachments	(Terms of Use)

Description Daniel Osypenko 2024-01-31 12:50:11 UTC

Description of problem (please be detailed as possible and provide log
snippests):

Deploying new cluster on multiple cloud platforms and confirmed on versions ODF 4.13, 4.14, 4.15 ocs-storagecluster is never reaching progress state (over 4h)

ocs-operator logs Reconciler Error

```
oc get storageclusters.ocs.openshift.io -A
NAMESPACE           NAME                 AGE    PHASE         EXTERNAL   CREATED AT             VERSION
openshift-storage   ocs-storagecluster   4h1m   Progressing              2024-01-31T08:41:24Z   4.13.7
```

```
oc logs ocs-operator-868455d4bb-kpl8w -n openshift-storage | grep error

{"level":"error","ts":"2024-01-31T09:21:19Z","msg":"Reconciler error","controller":"storagecluster","controllerGroup":"ocs.openshift.io","controllerKind":"StorageCluster","StorageCluster":{"name":"ocs-storagecluster","namespace":"openshift-storage"},"namespace":"openshift-storage","name":"ocs-storagecluster","reconcileID":"da39bff0-b0fb-4acd-96c1-e439c64fff80","error":"Operation cannot be fulfilled on storageclusters.ocs.openshift.io \"ocs-storagecluster\": the object has been modified; please apply your changes to the latest version and try again","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:329\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:274\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:235"}
{"level":"error","ts":"2024-01-31T09:30:26Z","msg":"Reconciler error","controller":"storagecluster","controllerGroup":"ocs.openshift.io","controllerKind":"StorageCluster","StorageCluster":{"name":"ocs-storagecluster","namespace":"openshift-storage"},"namespace":"openshift-storage","name":"ocs-storagecluster","reconcileID":"e53ad2b1-c132-438d-b796-86298c180a2e","error":"Operation cannot be fulfilled on storageclusters.ocs.openshift.io \"ocs-storagecluster\": the object has been modified; please apply your changes to the latest version and try again","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:329\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:274\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller)

```

```
oc logs odf-operator-controller-manager-f56b885d5-sh287 -n openshift-storage
Defaulted container "kube-rbac-proxy" out of: kube-rbac-proxy, manager
Flag --logtostderr has been deprecated, will be removed in a future release, see https://github.com/kubernetes/enhancements/tree/master/keps/sig-instrumentation/2845-deprecate-klog-specific-flags-in-k8s-components
W0131 08:39:37.215604       1 kube-rbac-proxy.go:156] 
==== Deprecation Warning ======================

Insecure listen address will be removed.
Using --insecure-listen-address won't be possible!

The ability to run kube-rbac-proxy without TLS certificates will be removed.
Not using --tls-cert-file and --tls-private-key-file won't be possible!

For more information, please go to https://github.com/brancz/kube-rbac-proxy/issues/187

===============================================

		
I0131 08:39:37.215763       1 kube-rbac-proxy.go:285] Valid token audiences: 
I0131 08:39:37.215834       1 kube-rbac-proxy.go:383] Generating self signed cert as no cert is provided
I0131 08:39:37.729261       1 kube-rbac-proxy.go:447] Starting TCP socket on 0.0.0.0:8443
I0131 08:39:37.729650       1 kube-rbac-proxy.go:454] Listening securely on 0.0.0.0:8443
```
Storage system is Ok, no Warnings, Errors. Cluster is in Working state

Version of all relevant components (if applicable):
Problem has been seen on multiple versions


OC version:
Client Version: 4.13.4
Kustomize Version: v4.5.7
Server Version: 4.13.0-0.nightly-2024-01-30-181028
Kubernetes Version: v1.26.13+77e61a2
-e 
OCS verison:
ocs-operator.v4.13.7-rhodf              OpenShift Container Storage   4.13.7-rhodf   ocs-operator.v4.13.6-rhodf              Succeeded
-e 
Cluster version
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.13.0-0.nightly-2024-01-30-181028   True        False         4h14m   Cluster version is 4.13.0-0.nightly-2024-01-30-181028
-e 
Rook version:
rook: v4.13.7-0.42f43768ad57d91be47327f83653c05eeb721977
go: go1.19.13
-e 
Ceph version:
ceph version 17.2.6-170.el9cp (59bbeb8815ec3aeb3c8bba1e1866f8f6729eb840) quincy (stable)

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?


Is there any workaround available to the best of your knowledge?


Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
1 

Can this issue reproducible?
yes

Can this issue reproduce from the UI?
no

If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1. Deploy cluster on one of supported cloud providers
2.
3.


Actual results:
ocs-storageclass never becomes ready

Expected results:
ocs-storageclass is ready in reasonable time upon deployment

Additional info:

must-gather logs: https://drive.google.com/file/d/1yPNggohjwcb2Ndg_cnzKmgIkROFc7KcY/view?usp=sharing

Comment 7 Daniel Osypenko 2024-02-06 12:02:30 UTC

retested with IBM Cloud and odf-operator.v4.15.0-134.stable, no issue

oc get storageclusters.ocs.openshift.io -A
NAMESPACE           NAME                 AGE   PHASE   EXTERNAL   CREATED AT             VERSION
openshift-storage   ocs-storagecluster   67m   Ready              2024-02-06T10:49:35Z   4.15.0

oc get noobaa -A
NAMESPACE           NAME     S3-ENDPOINTS                   STS-ENDPOINTS                  IMAGE                                                                                                            PHASE   AGE
openshift-storage   noobaa   ["https://10.240.0.4:30157"]   ["https://10.240.0.4:30181"]   registry.redhat.io/odf4/mcg-core-rhel9@sha256:1d79a2ac176ca6e69c3198d0e35537aaf29373440d214d324d0d433d1473d9a1   Ready   67m

oc get backingstores.noobaa.io -A
NAMESPACE           NAME                           TYPE      PHASE   AGE
openshift-storage   noobaa-default-backing-store   ibm-cos   Ready   67m

Comment 8 Nimrod Becker 2024-02-07 07:58:14 UTC

I agree with comment 6

Comment 10 Mudit Agarwal 2024-02-07 09:11:14 UTC

Please refer to #comment6

Comment 11 Mudit Agarwal 2024-02-07 09:13:02 UTC

Sorry, pressed enter too early. Please #comment5 and #comment6

*** This bug has been marked as a duplicate of bug 2255557 ***

Note You need to log in before you can comment on or make changes to this bug.