Created attachment 2018159 [details] error in app pod events Description of problem (please be detailed as possible and provide log snippests): Upgraded ODF from 4.14 GA to 4.15, observed "unable to get monitor info from DNS SRV with service name: ceph-mon" error while creating fedora app pod Version of all relevant components (if applicable): odf: 4.15.0-147 Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? Yes, took dev [Madhu Rajanna] help to recover cluster Is there any workaround available to the best of your knowledge? Yes Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 2 Can this issue reproducible? Yes Can this issue reproduce from the UI? If this is a regression, please provide more details to justify this: Steps to Reproduce: 1. Upgrade odf from 4.14 to 4.15 2. create a fedora pod in new project. 3. the fedora app ppod creation will fail due to this issue. Actual results: App pod creation failed due to "unable to get monitor info from DNS SRV with service name: ceph-mon" issue. Expected results: App pod creation should not be filed due to "unable to get monitor info from DNS SRV with service name: ceph-mon" Additional info:
>oc get cepc get cephcluster -nopenshift-storage -oyaml ... spec: cephVersion: image: registry.redhat.io/rhceph/rhceph-6-rhel9@sha256:9dbd051cfcdb334aad33a536cc115ae1954edaea5f8cb5943ad615f1b41b0226 cleanupPolicy: sanitizeDisks: {} continueUpgradeAfterChecksEvenIfNotHealthy: true crashCollector: {} csi: cephfs: kernelMountOptions: ms_mode=prefer-crc oc get cm rook-ceph-csi-config -oyaml apiVersion: v1 data: csi-cluster-config-json: '[{"clusterID":"openshift-storage","monitors":["172.30.191.241:3300","172.30.84.6:3300","172.30.226.19:3300"],"namespace":"openshift-storage"},{"clusterID":"5bb69c306a7d011c3e91c3cec112fb7a","monitors":["172.30.191.241:3300","172.30.84.6:3300","172.30.226.19:3300"],"namespace":"openshift-storage","cephFS":{"subvolumeGroup":"csi","kernelMountOptions":"ms_mode=prefer-crc"}}]' kind: ConfigMap metadata: creationTimestamp: "2024-02-22T13:50:52Z" name: rook-ceph-csi-config namespace: openshift-storage ownerReferences: - apiVersion: apps/v1 blockOwnerDeletion: false controller: true kind: Deployment name: rook-ceph-operator uid: af2a4d1f-939d-4932-b116-c45f5f0b90c9 resourceVersion: "722929" i see that we have kernelMountOptions in the ceph cluster CR but that is not added to the csi config for all the cluster ID's workaround: delete csi configmap and rook operator pod oc delete cm rook-ceph-csi-config oc delete po/rook-ceph-operator-7c56874fb6-l8gpw
Verified with build: 4.15.0-157 kernelMountOptions value added along with mon ips, no delay observed. Hence, moving this BZ to Verified state. steps followed: 1. Watch the rook-ceph-csi config map in a terminal with the cmd "oc get cm rook-ceph-csi-config -w -o yaml" 2. oc delete pod rook-ceph-operator-79bc976c7b-dtlfx 3. oc delete cm rook-ceph-csi-config 4. Monitor the output of "oc get cm rook-ceph-csi-config -w -o yaml", it looks like below. 5. It is observed that kernelMountOptions value added along with mon ips --- apiVersion: v1 data: csi-cluster-config-json: '[]' kind: ConfigMap metadata: creationTimestamp: "2024-03-12T07:43:17Z" name: rook-ceph-csi-config namespace: openshift-storage ownerReferences: - apiVersion: ceph.rook.io/v1 blockOwnerDeletion: true controller: true kind: CephCluster name: ocs-storagecluster-cephcluster uid: 0634e3ee-cf97-456d-b0c8-fe7a2f3f4cdc resourceVersion: "882199" uid: 8ef236eb-fac3-439f-98bc-2649ba0e1b5c --- apiVersion: v1 data: csi-cluster-config-json: '[{"clusterID":"5bb69c306a7d011c3e91c3cec112fb7a","monitors":["172.30.2.199:3300","172.30.146.184:3300","172.30.52.210:3300"],"namespace":"openshift-storage","cephFS":{"subvolumeGroup":"csi","kernelMountOptions":"ms_mode=prefer-crc"},"readAffinity":{"enabled":true,"crushLocationLabels":["kubernetes.io/hostname","topology.kubernetes.io/region","topology.kubernetes.io/zone","topology.rook.io/chassis","topology.rook.io/rack","topology.rook.io/row","topology.rook.io/pdu","topology.rook.io/pod","topology.rook.io/room","topology.rook.io/datacenter"]}}]' kind: ConfigMap metadata: creationTimestamp: "2024-03-12T07:43:17Z" name: rook-ceph-csi-config namespace: openshift-storage ownerReferences: - apiVersion: ceph.rook.io/v1 blockOwnerDeletion: true controller: true kind: CephCluster name: ocs-storagecluster-cephcluster uid: 0634e3ee-cf97-456d-b0c8-fe7a2f3f4cdc resourceVersion: "882246" uid: 8ef236eb-fac3-439f-98bc-2649ba0e1b5c --- apiVersion: v1 data: csi-cluster-config-json: '[{"clusterID":"5bb69c306a7d011c3e91c3cec112fb7a","monitors":["172.30.146.184:3300","172.30.52.210:3300","172.30.2.199:3300"],"namespace":"openshift-storage","cephFS":{"subvolumeGroup":"csi","kernelMountOptions":"ms_mode=prefer-crc"},"readAffinity":{"enabled":true,"crushLocationLabels":["kubernetes.io/hostname","topology.kubernetes.io/region","topology.kubernetes.io/zone","topology.rook.io/chassis","topology.rook.io/rack","topology.rook.io/row","topology.rook.io/pdu","topology.rook.io/pod","topology.rook.io/room","topology.rook.io/datacenter"]}},{"clusterID":"openshift-storage","monitors":["172.30.146.184:3300","172.30.52.210:3300","172.30.2.199:3300"],"namespace":"openshift-storage","cephFS":{"kernelMountOptions":"ms_mode=prefer-crc"},"readAffinity":{"enabled":true,"crushLocationLabels":["kubernetes.io/hostname","topology.kubernetes.io/region","topology.kubernetes.io/zone","topology.rook.io/chassis","topology.rook.io/rack","topology.rook.io/row","topology.rook.io/pdu","topology.rook.io/pod","topology.rook.io/room","topology.rook.io/datacenter"]}}]' kind: ConfigMap metadata: creationTimestamp: "2024-03-12T07:43:17Z" name: rook-ceph-csi-config namespace: openshift-storage ownerReferences: - apiVersion: ceph.rook.io/v1 blockOwnerDeletion: true controller: true kind: CephCluster name: ocs-storagecluster-cephcluster uid: 0634e3ee-cf97-456d-b0c8-fe7a2f3f4cdc resourceVersion: "882281" uid: 8ef236eb-fac3-439f-98bc-2649ba0e1b5c ---
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.15.0 security, enhancement, & bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2024:1383