Description of problem (please be detailed as possible and provide log snippests): An baremetal Openshift cluster with external Ceph shut off after high temperature detected. Only 4 machines stayed on from a total of 29. Version of all relevant components (if applicable): Openshift: 4.10.45 ODF: 4.10.9 Ceph: 16.2.8-85 Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? Yes, the customer can't write any data into buckets, and buckets is the main storage for this datalake. Is there any workaround available to the best of your knowledge? Unfortunately, no. Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 1 - very simple (machines just shut off) Can this issue reproducible? Don't know, it happens after the outage. We just turned on the machines. Can this issue reproduce from the UI? Yes, we can see 500 error in ceph dashboard when we try to edit some buckets. Actual results: Can't create buckets from Openshift and operator logs keeps complaining about "failed to fetch user" and "nosuchbucket". Expected results: Create buckets from Openshift. Additional info:
Sorry, I told it was not a bug but it could be. We don't change anything from our side, and this Ceph cluster was working for more our less 4 months together with Openshift. Maybe something happened and we missed. Best,
Not sure anything to be done here, but moving to rgw component to confirm if bucket metadata could somehow be updated to cause this.