Bug 2265514
Summary: | "unable to get monitor info from DNS SRV with service name: ceph-mon" error observed while creating fedora app pod | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat OpenShift Data Foundation | Reporter: | Nagendra Reddy <nagreddy> |
Component: | rook | Assignee: | Rakshith <rar> |
Status: | CLOSED ERRATA | QA Contact: | Nagendra Reddy <nagreddy> |
Severity: | urgent | Docs Contact: | |
Priority: | unspecified | ||
Version: | 4.15 | CC: | ebenahar, mrajanna, muagarwa, odf-bz-bot, rar, tnielsen |
Target Milestone: | --- | ||
Target Release: | ODF 4.15.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | No Doc Update | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2024-03-19 15:33:05 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
>oc get cepc get cephcluster -nopenshift-storage -oyaml
...
spec:
cephVersion:
image: registry.redhat.io/rhceph/rhceph-6-rhel9@sha256:9dbd051cfcdb334aad33a536cc115ae1954edaea5f8cb5943ad615f1b41b0226
cleanupPolicy:
sanitizeDisks: {}
continueUpgradeAfterChecksEvenIfNotHealthy: true
crashCollector: {}
csi:
cephfs:
kernelMountOptions: ms_mode=prefer-crc
oc get cm rook-ceph-csi-config -oyaml
apiVersion: v1
data:
csi-cluster-config-json: '[{"clusterID":"openshift-storage","monitors":["172.30.191.241:3300","172.30.84.6:3300","172.30.226.19:3300"],"namespace":"openshift-storage"},{"clusterID":"5bb69c306a7d011c3e91c3cec112fb7a","monitors":["172.30.191.241:3300","172.30.84.6:3300","172.30.226.19:3300"],"namespace":"openshift-storage","cephFS":{"subvolumeGroup":"csi","kernelMountOptions":"ms_mode=prefer-crc"}}]'
kind: ConfigMap
metadata:
creationTimestamp: "2024-02-22T13:50:52Z"
name: rook-ceph-csi-config
namespace: openshift-storage
ownerReferences:
- apiVersion: apps/v1
blockOwnerDeletion: false
controller: true
kind: Deployment
name: rook-ceph-operator
uid: af2a4d1f-939d-4932-b116-c45f5f0b90c9
resourceVersion: "722929"
i see that we have kernelMountOptions in the ceph cluster CR but that is not added to the csi config for all the cluster ID's
workaround:
delete csi configmap and rook operator pod
oc delete cm rook-ceph-csi-config
oc delete po/rook-ceph-operator-7c56874fb6-l8gpw
Verified with build: 4.15.0-157 kernelMountOptions value added along with mon ips, no delay observed. Hence, moving this BZ to Verified state. steps followed: 1. Watch the rook-ceph-csi config map in a terminal with the cmd "oc get cm rook-ceph-csi-config -w -o yaml" 2. oc delete pod rook-ceph-operator-79bc976c7b-dtlfx 3. oc delete cm rook-ceph-csi-config 4. Monitor the output of "oc get cm rook-ceph-csi-config -w -o yaml", it looks like below. 5. It is observed that kernelMountOptions value added along with mon ips --- apiVersion: v1 data: csi-cluster-config-json: '[]' kind: ConfigMap metadata: creationTimestamp: "2024-03-12T07:43:17Z" name: rook-ceph-csi-config namespace: openshift-storage ownerReferences: - apiVersion: ceph.rook.io/v1 blockOwnerDeletion: true controller: true kind: CephCluster name: ocs-storagecluster-cephcluster uid: 0634e3ee-cf97-456d-b0c8-fe7a2f3f4cdc resourceVersion: "882199" uid: 8ef236eb-fac3-439f-98bc-2649ba0e1b5c --- apiVersion: v1 data: csi-cluster-config-json: '[{"clusterID":"5bb69c306a7d011c3e91c3cec112fb7a","monitors":["172.30.2.199:3300","172.30.146.184:3300","172.30.52.210:3300"],"namespace":"openshift-storage","cephFS":{"subvolumeGroup":"csi","kernelMountOptions":"ms_mode=prefer-crc"},"readAffinity":{"enabled":true,"crushLocationLabels":["kubernetes.io/hostname","topology.kubernetes.io/region","topology.kubernetes.io/zone","topology.rook.io/chassis","topology.rook.io/rack","topology.rook.io/row","topology.rook.io/pdu","topology.rook.io/pod","topology.rook.io/room","topology.rook.io/datacenter"]}}]' kind: ConfigMap metadata: creationTimestamp: "2024-03-12T07:43:17Z" name: rook-ceph-csi-config namespace: openshift-storage ownerReferences: - apiVersion: ceph.rook.io/v1 blockOwnerDeletion: true controller: true kind: CephCluster name: ocs-storagecluster-cephcluster uid: 0634e3ee-cf97-456d-b0c8-fe7a2f3f4cdc resourceVersion: "882246" uid: 8ef236eb-fac3-439f-98bc-2649ba0e1b5c --- apiVersion: v1 data: csi-cluster-config-json: '[{"clusterID":"5bb69c306a7d011c3e91c3cec112fb7a","monitors":["172.30.146.184:3300","172.30.52.210:3300","172.30.2.199:3300"],"namespace":"openshift-storage","cephFS":{"subvolumeGroup":"csi","kernelMountOptions":"ms_mode=prefer-crc"},"readAffinity":{"enabled":true,"crushLocationLabels":["kubernetes.io/hostname","topology.kubernetes.io/region","topology.kubernetes.io/zone","topology.rook.io/chassis","topology.rook.io/rack","topology.rook.io/row","topology.rook.io/pdu","topology.rook.io/pod","topology.rook.io/room","topology.rook.io/datacenter"]}},{"clusterID":"openshift-storage","monitors":["172.30.146.184:3300","172.30.52.210:3300","172.30.2.199:3300"],"namespace":"openshift-storage","cephFS":{"kernelMountOptions":"ms_mode=prefer-crc"},"readAffinity":{"enabled":true,"crushLocationLabels":["kubernetes.io/hostname","topology.kubernetes.io/region","topology.kubernetes.io/zone","topology.rook.io/chassis","topology.rook.io/rack","topology.rook.io/row","topology.rook.io/pdu","topology.rook.io/pod","topology.rook.io/room","topology.rook.io/datacenter"]}}]' kind: ConfigMap metadata: creationTimestamp: "2024-03-12T07:43:17Z" name: rook-ceph-csi-config namespace: openshift-storage ownerReferences: - apiVersion: ceph.rook.io/v1 blockOwnerDeletion: true controller: true kind: CephCluster name: ocs-storagecluster-cephcluster uid: 0634e3ee-cf97-456d-b0c8-fe7a2f3f4cdc resourceVersion: "882281" uid: 8ef236eb-fac3-439f-98bc-2649ba0e1b5c --- Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.15.0 security, enhancement, & bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2024:1383 |
Created attachment 2018159 [details] error in app pod events Description of problem (please be detailed as possible and provide log snippests): Upgraded ODF from 4.14 GA to 4.15, observed "unable to get monitor info from DNS SRV with service name: ceph-mon" error while creating fedora app pod Version of all relevant components (if applicable): odf: 4.15.0-147 Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? Yes, took dev [Madhu Rajanna] help to recover cluster Is there any workaround available to the best of your knowledge? Yes Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 2 Can this issue reproducible? Yes Can this issue reproduce from the UI? If this is a regression, please provide more details to justify this: Steps to Reproduce: 1. Upgrade odf from 4.14 to 4.15 2. create a fedora pod in new project. 3. the fedora app ppod creation will fail due to this issue. Actual results: App pod creation failed due to "unable to get monitor info from DNS SRV with service name: ceph-mon" issue. Expected results: App pod creation should not be filed due to "unable to get monitor info from DNS SRV with service name: ceph-mon" Additional info: