Description of problem (please be detailed as possible and provide log snippests): ======================================================================== To verify Bug 2060273 with the scenario of MGR pod repsin, restarted the MGR pod on provider to a new node. As expected with rrecent fixes, the mgr endpoint and the cephcluster monitoring endpoint changed to the new node IO on consumer Able to create PVCs as well. However, the cephcluster stays in Connecting state with no updates seen in rook-ceph-operator pod Had a live session with engineering and raising a bug based on the discussion there. Provider side ================== date --utc; oc delete pod rook-ceph-mgr-a-6ddbf6bb5-kdx25; oc get pods -o wide|grep mgr Tue Mar 29 05:15:36 PM UTC 2022 pod "rook-ceph-mgr-a-6ddbf6bb5-kdx25" deleted rook-ceph-mgr-a-6ddbf6bb5-6g7kl 1/2 Running 0 3s 10.0.180.112 ip-10-0-180-112.us-east-2.compute.internal <none> <none> Consumer side ================== +++++++++++++++++++++++ cephcluster MGR monitoring: enabled: true externalMgrEndpoints: - ip: 10.0.180.112 externalMgrPrometheusPort: 9283 +++++++++++ endpoint rook-ceph-mgr-external 10.0.180.112:9283 75m ======= storagecluster ========== NAME AGE PHASE EXTERNAL CREATED AT VERSION ocs-storagecluster 85m Connecting true 2022-03-29T16:10:08Z ======= storagesystem ========== NAME STORAGE-SYSTEM-KIND STORAGE-SYSTEM-NAME ocs-storagecluster-storagesystem storagecluster.ocs.openshift.io/v1 ocs-storagecluster -------------- ======= cephcluster ========== NAME DATADIRHOSTPATH MONCOUNT AGE PHASE MESSAGE HEALTH EXTERNAL ocs-storagecluster-cephcluster 85m Connecting Attempting to connect to an external Ceph cluster HEALTH_OK true Version of all relevant components (if applicable): ==================================================== OCP 4.9.25 ODF= 4.10.0-206 Deployer - 2.0.0-5 Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? =============================================================== No as the status is not updated but cluster is able to talk to provider Is there any workaround available to the best of your knowledge? ================================================================== restart the rook-ceph-operator pod and the cephcluster status changes to Connected since the go-routine is triggered Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? =========================================================================== 3 Can this issue reproducible? ============================== Yes Can this issue reproduce from the UI? ====================================== NA If this is a regression, please provide more details to justify this: ======================================================================= Not sure Steps to Reproduce: ====================== 1. Create an add-on based provider and consumer setup 2. On provider, respin the MGR pod and make sure it moves to another node 3. Check the monitoring endpoint on consumer and also the status of cephcluster CR 3. Actual results: ================= cephcluster CR stays in Connecting state However, now the Monitoring endpoint is updated successfully Expected results: =================== cephcluster status should be Connected and storagecluster in ready Additional info:
*** Bug 2062853 has been marked as a duplicate of this bug. ***