+++ This bug was initially created as a clone of Bug #2183457 +++ Description of problem (please be detailed as possible and provide log snippets): [RDR] when running ceph status cmd we see 2023-03-31T08:25:31.844+0000 7f8deaffd640 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [2,1] Version of all relevant components (if applicable): OCP version:- 4.13.0-0.nightly-2023-03-29-235439 ODF version:- 4.13.0-121 CEPH version:- ceph version 17.2.5-1342.el9cp (ed07851f2c5b8d3dccadf079402f86a67cb7d3e5) quincy (stable) ACM version:- v2.7.2 Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? Yes Is there any workaround available to the best of your knowledge? Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 2 Can this issue reproducible? Yes Can this issue reproduce from the UI? If this is a regression, please provide more details to justify this: Steps to Reproduce: 1.Deploy RDR cluster with globalnet 2.add spec.network.multiClusterService.Enabled: true to storagecluster post ODF deployment 3.check ceph status via toolbox Actual results: $cephstatus 2023-03-31T08:25:31.844+0000 7f8deaffd640 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [2,1] 2023-03-31T08:25:31.844+0000 7f8deb7fe640 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [2,1] 2023-03-31T08:25:37.844+0000 7f8deaffd640 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [2,1] 2023-03-31T08:25:40.843+0000 7f8deaffd640 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [2,1] [errno 13] RADOS permission denied (error connecting to the cluster) command terminated with exit code 13 Expected results: Additional info: we have seen this in one of the managed cluster in rdr setup but not on second managed cluster --- Additional comment from RHEL Program Management on 2023-03-31 14:02:03 IST --- This bug having no release flag set previously, is now set with release flag 'odf‑4.13.0' to '?', and so is being proposed to be fixed at the ODF 4.13.0 release. Note that the 3 Acks (pm_ack, devel_ack, qa_ack), if any previously set while release flag was missing, have now been reset since the Acks are to be set against a release flag. --- Additional comment from RHEL Program Management on 2023-03-31 14:02:03 IST --- Since this bug has severity set to 'urgent', it is being proposed as a blocker for the currently set release flag. Please resolve ASAP. --- Additional comment from Pratik Surve on 2023-03-31 14:19:26 IST --- Logs:- http://rhsqe-repo.lab.eng.blr.redhat.com/ocs4qe/pratik/bz/2183457/mar31/31-03-2023_14-02-09 --- Additional comment from Santosh Pillai on 2023-03-31 21:44:45 IST --- mon logs: ebug 2023-03-31T09:40:18.696+0000 7f33c486b640 1 mon.b@0(probing) e8 handle_auth_request failed to assign global_id debug 2023-03-31T09:40:18.755+0000 7f33c486b640 1 mon.b@0(probing) e8 handle_auth_request failed to assign global_id debug 2023-03-31T09:40:18.853+0000 7f33c486b640 1 mon.b@0(probing) e8 handle_auth_request failed to assign global_id debug 2023-03-31T09:40:18.897+0000 7f33c486b640 1 mon.b@0(probing) e8 handle_auth_request failed to assign global_id debug 2023-03-31T09:40:18.956+0000 7f33c486b640 1 mon.b@0(probing) e8 handle_auth_request failed to assign global_id debug 2023-03-31T09:40:19.055+0000 7f33c486b640 1 mon.b@0(probing) e8 handle_auth_request failed to assign global_id debug 2023-03-31T09:40:19.298+0000 7f33c486b640 1 mon.b@0(probing) e8 handle_auth_request failed to assign global_id debug 2023-03-31T09:40:19.357+0000 7f33c486b640 1 mon.b@0(probing) e8 handle_auth_request failed to assign global_id debug 2023-03-31T09:40:19.449+0000 7f33ca877640 -1 mon.b@0(probing) e8 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied debug 2023-03-31T09:40:19.456+0000 7f33c486b640 1 mon.b@0(probing) e8 handle_auth_request failed to assign global_id debug 2023-03-31T09:40:20.100+0000 7f33c486b640 1 mon.b@0(probing) e8 handle_auth_request failed to assign global_id debug 2023-03-31T09:40:20.159+0000 7f33c486b640 1 mon.b@0(probing) e8 handle_auth_request failed to assign global_id debug 2023-03-31T09:40:20.258+0000 7f33c486b640 1 mon.b@0(probing) e8 handle_auth_request failed to assign global_id --- Additional comment from Santosh Pillai on 2023-04-04 13:36:19 IST --- Reinstalling the ODF cluster is the workaround for that. While I investigate what's happening to the mon quorum, this workaround can be used. --- Additional comment from Santosh Pillai on 2023-04-07 13:22:58 IST --- Still investigating. --- Additional comment from Santosh Pillai on 2023-04-07 15:48:02 IST --- osd pod is missing the socket file ``` Normal Started 3h kubelet Started container osd Normal Pulled 3h kubelet Container image "quay.io/rhceph-dev/rhceph@sha256:f916da02f59b8f73ad18eb65310333d1e3cbd1a54678ff50bf27ed9618719b63" already present on machine Normal Created 3h kubelet Created container log-collector Normal Started 3h kubelet Started container log-collector Warning Unhealthy 31s (x1076 over 3h) kubelet Startup probe failed: ceph daemon health check failed with the following output: > admin_socket: exception getting command descriptions: [Errno 2] No such file or directory ``` --- Additional comment from krishnaram Karthick on 2023-04-11 10:18:36 IST --- Removing testblocker and adding automation blocker for the following reasons. 1) With the workaround of reinstalling ODF on the affected cluster, QE should be able to proceed with the deployment. 2) However, this could be a challenge for automated deployments + testing.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat OpenShift Data Foundation 4.13.0 enhancement and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2023:3742