2262252 – ocs-storagecluster is in progressing state due to noobaa in configuring state

Bug 2262252 - ocs-storagecluster is in progressing state due to noobaa in configuring state

Summary: ocs-storagecluster is in progressing state due to noobaa in configuring state

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenShift Data Foundation
Classification:	Red Hat Storage
Component:	Multi-Cloud Object Gateway
Sub Component:
Version:	4.15
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Target Release:	ODF 4.15.0
Assignee:	Nimrod Becker
QA Contact:	Vijay Avuthu
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2024-02-01 11:36 UTC by Vijay Avuthu
Modified:	2024-04-01 08:26 UTC (History)
CC List:	6 users (show)
Fixed In Version:	4.15.0-134
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2024-03-19 15:32:31 UTC
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	noobaa noobaa-operator pull 1299	None	Merged	use injected root CAs also when for AWS and IBM	2024-02-07 09:20:04 UTC
Github	noobaa noobaa-operator pull 1300	None	Merged	[Backport to 5.15] use injected root CAs also when for AWS and IBM	2024-02-07 09:20:04 UTC
Red Hat Product Errata	RHSA-2024:1383	None	None	None	2024-03-19 15:32:34 UTC

Description Vijay Avuthu 2024-02-01 11:36:43 UTC

Description of problem (please be detailed as possible and provide log
snippests):

Deployment type: AWS IPI KMS THALES 1AZ RHCOS 3M 3W Cluster

ocs-storagecluster is in progressing state due to noobaa in configuring state

Version of all relevant components (if applicable):

openshift installer (4.15.0-0.nightly-2024-01-31-032716)
ocs-registry:4.15.0-126


Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
Yes

Is there any workaround available to the best of your knowledge?
No

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
1/2

Can this issue reproducible?
some times


Can this issue reproduce from the UI?
Not tried

If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1. install ODF using ocs-ci and check storagecluster status
2.
3.


Actual results:

storagecluster status:

  status:
    conditions:
    - lastHeartbeatTime: "2024-01-31T10:50:28Z"
      lastTransitionTime: "2024-01-31T10:50:28Z"
      message: Version check successful
      reason: VersionMatched
      status: "False"
      type: VersionMismatch
    - lastHeartbeatTime: "2024-01-31T10:57:55Z"
      lastTransitionTime: "2024-01-31T10:54:30Z"
      message: Reconcile completed successfully
      reason: ReconcileCompleted
      status: "True"
      type: ReconcileComplete
    - lastHeartbeatTime: "2024-01-31T10:50:28Z"
      lastTransitionTime: "2024-01-31T10:50:28Z"
      message: Initializing StorageCluster
      reason: Init
      status: "False"
      type: Available
    - lastHeartbeatTime: "2024-01-31T10:57:55Z"
      lastTransitionTime: "2024-01-31T10:50:28Z"
      message: Waiting on Nooba instance to finish initialization
      reason: NoobaaInitializing
      status: "True"
      type: Progressing
    - lastHeartbeatTime: "2024-01-31T10:50:28Z"
      lastTransitionTime: "2024-01-31T10:50:28Z"
      message: Initializing StorageCluster
      reason: Init
      status: "False"
      type: Degraded
    - lastHeartbeatTime: "2024-01-31T10:54:53Z"
      lastTransitionTime: "2024-01-31T10:53:53Z"
      message: 'CephCluster is creating: Processing OSD 2 on PVC "ocs-deviceset-0-data-02f922"'
      reason: ClusterStateCreating
      status: "False"
      type: Upgradeable
    currentMonCount: 3
.
.
.
phase: Progressing

Expected results:

storagecluster should be in Ready state



Additional info:

noobaa operator log:

2024-01-31T10:55:45.573615495Z time="2024-01-31T10:55:45Z" level=info msg="RPC: Connecting websocket (0xc001ab2de0) &{RPC:0xc0002e5770 Address:wss://noobaa-mgmt.openshift-storage.svc.cluster.local:443/rpc/ State:init WS:<nil> PendingRequests:map[] NextRequestID:0 Lock:{state:9 sema:0} ReconnectDelay:3s cancelPings:<nil>}"
2024-01-31T10:55:45.577204122Z time="2024-01-31T10:55:45Z" level=error msg="RPC: closing connection (0xc001ab2de0) &{RPC:0xc0002e5770 Address:wss://noobaa-mgmt.openshift-storage.svc.cluster.local:443/rpc/ State:init WS:<nil> PendingRequests:map[] NextRequestID:0 Lock:{state:9 sema:0} ReconnectDelay:3s cancelPings:<nil>}"
2024-01-31T10:55:45.577204122Z time="2024-01-31T10:55:45Z" level=warning msg="RPC: RemoveConnection wss://noobaa-mgmt.openshift-storage.svc.cluster.local:443/rpc/ current=0xc001ab2de0 conn=0xc001ab2de0"
2024-01-31T10:55:45.577228797Z time="2024-01-31T10:55:45Z" level=error msg="RPC: Reconnect - got error: failed to WebSocket dial: failed to send handshake request: Get \"https://noobaa-mgmt.openshift-storage.svc.cluster.local:443/rpc/\": dial tcp 172.30.49.45:443: connect: connection refused"
2024-01-31T10:55:45.577228797Z time="2024-01-31T10:55:45Z" level=error msg="âš ï¸  RPC: auth.read_auth() Call failed: RPC: connection (0xc001ab2de0) already closed &{RPC:0xc0002e5770 Address:wss://noobaa-mgmt.openshift-storage.svc.cluster.local:443/rpc/ State:closed WS:<nil> PendingRequests:map[] NextRequestID:0 Lock:{state:1 sema:0} ReconnectDelay:3s cancelPings:<nil>}"


2024-01-31T10:57:33.774421949Z time="2024-01-31T10:57:33Z" level=info msg="creating bucket nb.1706698436028.apps.j-059aikt1c33-t1.qe.rh-ocs.com" sys=openshift-storage/noobaa
2024-01-31T10:57:34.144998977Z time="2024-01-31T10:57:34Z" level=error msg="got error when trying to create bucket nb.1706698436028.apps.j-059aikt1c33-t1.qe.rh-ocs.com. error: RequestError: send request failed\ncaused by: Put \"https://s3.us-east-2.amazonaws.com/nb.1706698436028.apps.j-059aikt1c33-t1.qe.rh-ocs.com\": tls: failed to verify certificate: x509: certificate signed by unknown authority" sys=openshift-storage/noobaa
2024-01-31T10:57:34.144998977Z time="2024-01-31T10:57:34Z" level=info msg="SetPhase: temporary error during phase \"Configuring\"" sys=openshift-storage/noobaa
2024-01-31T10:57:34.145041010Z time="2024-01-31T10:57:34Z" level=warning msg="â³ Temporary Error: RequestError: send request failed\ncaused by: Put \"https://s3.us-east-2.amazonaws.com/nb.1706698436028.apps.j-059aikt1c33-t1.qe.rh-ocs.com\": tls: failed to verify certificate: x509: certificate signed by unknown authority" sys=openshift-storage/noobaa


must gather logs: https://url.corp.redhat.com/ffd5cef
job: https://url.corp.redhat.com/fdb84a4

Comment 14 Vijay Avuthu 2024-02-13 11:20:48 UTC

created new bug for the issue https://bugzilla.redhat.com/show_bug.cgi?id=2264014 

Closing this one as we didn't hit the "tls: failed to verify certificate: x509: certificate signed by unknown authority"

Comment 15 errata-xmlrpc 2024-03-19 15:32:31 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.15.0 security, enhancement, & bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2024:1383

Note You need to log in before you can comment on or make changes to this bug.