Description of problem (please be detailed as possible and provide log snippests): OCS/ODF Operator on OpenShift could not connect to External Ceph with "[errno 13] error connecting to the cluster" in rook container. I have checked the access from OCP cluster to the Ceph cluster and all of the servers and their ports are accessible from the OpenShift. I also validated if the keyring and user has proper access to check health from the bastion host which is part of the OpenShift cluster and it works fine. Here are the versions OpenShift Disconnected UPI deployment : Client Version: 4.6.36 Server Version: 4.6.35 Kubernetes Version: v1.19.0+b00ba52] OCS/ODF version: 4.6.5 Ceph Version: 14.2.11-181.el8cp (68fea1005601531fe60d2979c56ea63bc073c84f) nautilus (stable) Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? Yes it impacts the PoC I am working on. It is a blocker for moving forward. Is there any workaround available to the best of your knowledge? Yes. After trying to triage the issue, I found that in the log of Ceph monitors it was complaining about "cephx server client.healthchecker: attempt to reclaim global_id 64287 without presenting ticket". So based on a quick search I found that it could be caused by client version mismatch. So I set "auth_allow_insecure_global_id_reclaim" to true on the ceph side using the command `ceph config set mon auth_allow_insecure_global_id_reclaim true` and `ceph config set mon mon_warn_on_insecure_global_id_reclaim_allowed false`. Once these are set, the rook container in OCS/ODF started connecting and the whole deployment went ahead. Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 2 - It is not a complex deployment. Openshift and External Ceph was newly installed with nothing on them, OCS/ODF was installed using operator and then the storage was connected as per the doc. Can this issue reproducible? Yes. Can this issue reproduce from the UI? Yes If this is a regression, please provide more details to justify this: Steps to Reproduce: 1.Install OpenShift 4.6.35 using disconnected UPI 2.Install latest Ceph version 14.2.11-181.el8cp 3. Install OCS/ODF on OpenShift 4. Follow our official document to connect OCS/ODF to External cluster with the JSON generated from the External cluster 5. Observe the rook container log. After few minutes you should be getting "[errno 13] error connecting to the cluster" 6. As a work around , run the following commands on the Ceph cluster ceph config set mon auth_allow_insecure_global_id_reclaim true ceph config set mon mon_warn_on_insecure_global_id_reclaim_allowed false Actual results: OCS/ODF could not connect to External Ceph cluster Expected results: OCS/ODF connected to External Ceph cluster and the Storage service available in OpenShift Additional info: A possible bug similar to this is attached to this report.
Please install 4.6.6, which has the fix for this issue. *** This bug has been marked as a duplicate of bug 1974476 ***
4.6.5 is the version that I got when I pulled the Operator to the local registry on 5th July. Is it a new release that came out after that.
Yes, it was released yesterday only.