Bug 2168814

Summary: rook-ceph-operator is unable to talk with radosgw
Product: [Red Hat Storage] Red Hat OpenShift Data Foundation Reporter: Hector Vido <hvidosil>
Component: cephAssignee: Matt Benjamin (redhat) <mbenjamin>
ceph sub component: RGW QA Contact: Elad <ebenahar>
Status: CLOSED INSUFFICIENT_DATA Docs Contact:
Severity: urgent    
Priority: urgent CC: bniver, hnallurv, jquinn, jthottan, jverreng, kelwhite, kjosy, lsantann, muagarwa, ocs-bugs, odf-bz-bot, pdhange, sostapov, tasano, tnielsen, vumrao
Version: 4.10Keywords: Reopened
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-03-16 16:26:35 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Hector Vido 2023-02-10 05:51:49 UTC
Description of problem (please be detailed as possible and provide log
snippests):

An baremetal Openshift cluster with external Ceph shut off after high temperature detected.
Only 4 machines stayed on from a total of 29.

Version of all relevant components (if applicable):

Openshift: 4.10.45
ODF: 4.10.9
Ceph: 16.2.8-85

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?

Yes, the customer can't write any data into buckets, and buckets is the main storage for this datalake.

Is there any workaround available to the best of your knowledge?

Unfortunately, no.

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?

1 - very simple (machines just shut off)

Can this issue reproducible?

Don't know, it happens after the outage.

We just turned on the machines.

Can this issue reproduce from the UI?

Yes, we can see 500 error in ceph dashboard when we try to edit some buckets.


Actual results:

Can't create buckets from Openshift and operator logs keeps complaining about "failed to fetch user" and "nosuchbucket". 

Expected results:

Create buckets from Openshift.

Additional info:

Comment 25 Hector Vido 2023-02-10 21:43:29 UTC
Sorry, I told it was not a bug but it could be.

We don't change anything from our side, and this Ceph cluster was working for more our less 4 months together with Openshift.
Maybe something happened and we missed.

Best,

Comment 26 Travis Nielsen 2023-02-28 15:18:19 UTC
Not sure anything to be done here, but moving to rgw component to confirm if bucket metadata could somehow be updated to cause this.