Description of problem (please be detailed as possible and provide log snippests): ========================================================================= Created an agent based provider with DF offering installed in the fusion-storage namespace. Only 1 OSD came up and prepare pods for the other 2 were not created. mon-b was also out of quorum, hence it was failed over and mon-d came up successfully Few repeated messages in the rook operator log 023-04-21 09:07:21.962692 I | op-osd: restarting watcher for OSD provisioning status ConfigMaps. the watcher closed the channel 2023-04-21 09:07:21.968316 I | op-osd: OSD orchestration status for PVC default-0-data-0p9ck9 is "orchestrating" 2023-04-21 09:07:21.968335 I | op-osd: OSD orchestration status for PVC default-1-data-0zj6vs is "orchestrating" 2023-04-21 09:08:17.905979 I | op-osd: waiting... 0 of 2 OSD prepare jobs have finished processing and 1 of 1 OSDs have been updated 2023-04-21 09:09:17.906212 I | op-osd: waiting... 0 of 2 OSD prepare jobs have finished processing and 1 of 1 OSDs have been updated 2023-04-21 09:10:17.905390 I | op-osd: waiting... 0 of 2 OSD prepare jobs have finished processing and 1 of 1 OSDs have been updated rook-ceph-operator-6c645c7f58-6vfr6 1/1 Running 0 6h19m 10.129.2.158 ip-10-0-21-159.us-east-2.compute.internal <none> <none> rook-ceph-osd-0-57df5c5cb4-drcf5 2/2 Running 0 6h2m 10.0.21.159 ip-10-0-21-159.us-east-2.compute.internal <none> <none> rook-ceph-osd-prepare-default-2-data-0j74zq-l4qzl 0/1 Completed 0 6h2m 10.0.21.159 ip-10-0-21-159.us-east-2.compute.internal <none> <none> >>> Restarted the rook-operator pod and the remaining OSD prepare pods and OSD came up. rook-ceph-operator-6c645c7f58-bzx78 1/1 Running 0 14m 10.128.2.85 ip-10-0-15-227.us-east-2.compute.internal <none> <none> rook-ceph-osd-0-57df5c5cb4-drcf5 2/2 Running 0 7h29m 10.0.21.159 ip-10-0-21-159.us-east-2.compute.internal <none> <none> rook-ceph-osd-1-586d467d49-cwh6z 2/2 Running 0 14m 10.0.19.7 ip-10-0-19-7.us-east-2.compute.internal <none> <none> rook-ceph-osd-2-5475f5644d-nk5rm 2/2 Running 0 14m 10.0.15.227 ip-10-0-15-227.us-east-2.compute.internal <none> <none> rook-ceph-osd-prepare-default-0-data-0p9ck9-cmnl5 0/1 Completed 0 14m 10.0.15.227 ip-10-0-15-227.us-east-2.compute.internal <none> <none> rook-ceph-osd-prepare-default-1-data-0zj6vs-kq89d 0/1 Completed 0 14m 10.0.19.7 ip-10-0-19-7.us-east-2.compute.internal <none> <none> rook-ceph-osd-prepare-default-2-data-0j74zq-l4qzl 0/1 Completed 0 7h29m 10.0.21.159 ip-10-0-21-159.us-east-2.compute.internal <none> <none> Version of all relevant components (if applicable): ==================================================== OCP (ROSA) = 4.11.36 ceph image version: "17.2.6-10 quincy" managed-fusion-agent.v2.0.11 Managed Fusion Agent 2.0.11 Succeeded observability-operator.v0.0.20 Observability Operator 0.0.20 observability-operator.v0.0.19 Succeeded ocs-operator.v4.13.0-168.stable OpenShift Container Storage 4.13.0-168.stable Succeeded ose-prometheus-operator.4.10.0 Prometheus Operator 4.10.0 Succeeded route-monitor-operator.v0.1.498-e33e391 Route Monitor Operator 0.1.498-e33e391 route-monitor-operator.v0.1.496-7e66488 Succeeded Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? ========================================================================= Yes unless the workaround is tried Is there any workaround available to the best of your knowledge? ====================================================================== >>Workaround : Restarted the rook-ceph-operator pod and the missing OSDs were created Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? Can this issue reproducible? ===================================== Observed once so far Can this issue reproduce from the UI? ========================================== NA If this is a regression, please provide more details to justify this: ============================================================================ Not sure Steps to Reproduce: ============================ 1. Create ROSA 4.11.36 cluster with m5.2xlarge instances for worker nodes 2. Install Fusion aaS agent with build quay.io/resoni/managed-fusion-agent-index:4.13.0-168. Document for reference 3. Install DF offering using the managedFusionOffering CR [1] - https://docs.google.com/document/d/1Jdx8czlMjbumvilw8nZ6LtvWOMAx3H4TfwoVwiBs0nE/edit# Actual results: ====================== Only 1 OSD and prepare pod was up. 2 didnt come up >>Workaround : Restarted the rook-ceph-operator pod and the missing OSDs were created Expected results: ========================== All 3 OSDs should be up and Running
Per offline discussion of Subham with Jilju and Rewant, this is not reproing. Please reopen if it is hit again.