Description of problem (please be detailed as possible and provide log snippests): - For a stretch cluster, an upgrade was done. It seemed to be successful, later on it was noted that the ceph components still use the older hotfix images, this was fixed by removing the "reconcileStrategy: ignore" - Afterwards, it was seen that the storagecluster is in "error" state due to below error : ----- - lastHeartbeatTime: "2024-09-28T11:07:32Z" lastTransitionTime: "2024-09-28T10:47:01Z" message: 'CephCluster error: failed to perform validation before cluster creation: expecting exactly three zones for the stretch cluster, but found 2' reason: ClusterStateError status: "True" type: Degraded ----- - Additionally, registry is unable to mount the cephfs volumes due to unable to reach mon-service, I suspect this is due to the issue with the ceph mismatching versions. The rook-ceph-csi-config was missing the mon-IPs. - We tried applying the zone failureDomainKey and failureDomainValue to the ODF nodes but no effect. - Below is the config in the storagecluster yaml : ---- failureDomain: zone failureDomainKey: topology.kubernetes.io/zone-principal failureDomainValues: - "true" <snip> kmsServerConnection: {} nodeTopologies: labels: kubernetes.io/hostname: - <node>-hnfz8 - <node>-whnrh - <node>-9xv56 - <node>-pgjxm topology.kubernetes.io/zone-principal: - "true" --------------- Version of all relevant components (if applicable): ODF 4.14.10 Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? Unable to mount volumes, ceph version mismatch Is there any workaround available to the best of your knowledge? NA Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? Can this issue reproducible? NA Can this issue reproduce from the UI? Actual results: ODF is unable to detect three zones when they are present. Expected results: ODF should detect three zones. Additional info: Next update.
The arbiter node is missing our label Name: slocp4oat101-dgmwp Roles: storage,worker Labels: beta.kubernetes.io/arch=amd64 beta.kubernetes.io/os=linux dynatrace=none env=global kind=storage kubernetes.io/arch=amd64 kubernetes.io/hostname=slocp4oat101-dgmwp kubernetes.io/os=linux node-role.kubernetes.io/storage= node-role.kubernetes.io/worker= node.openshift.io/os_id=rhcos topology.kubernetes.io/zone=arbiter topology.kubernetes.io/zone-principal=true Annotations: alpha.kubernetes.io/provided-node-ip: 10.77.5.107 Needs our label cluster.ocs.openshift.io/openshift-storage= message: 'CephCluster error: failed to perform validation before cluster creation: expecting exactly three zones for the stretch cluster, but found 2' reason: ClusterStateError Storagecluster CR nodeTopologies: labels: kubernetes.io/hostname: - slocp4odt100-hnfz8 - slocp4odt101-whnrh - slocp4odt202-9xv56 - slocp4odt203-pgjxm topology.kubernetes.io/zone-principal: - "true" phase: Error