Bug 2265514

Summary: "unable to get monitor info from DNS SRV with service name: ceph-mon" error observed while creating fedora app pod
Product: [Red Hat Storage] Red Hat OpenShift Data Foundation Reporter: Nagendra Reddy <nagreddy>
Component: rookAssignee: Rakshith <rar>
Status: CLOSED ERRATA QA Contact: Nagendra Reddy <nagreddy>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 4.15CC: ebenahar, mrajanna, muagarwa, odf-bz-bot, rar, tnielsen
Target Milestone: ---   
Target Release: ODF 4.15.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2024-03-19 15:33:05 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Nagendra Reddy 2024-02-22 14:05:16 UTC
Created attachment 2018159 [details]
error in app pod events

Description of problem (please be detailed as possible and provide log
snippests):
Upgraded ODF from 4.14 GA to 4.15, observed "unable to get monitor info from DNS SRV with service name: ceph-mon" error while creating fedora app pod

Version of all relevant components (if applicable):
odf: 4.15.0-147

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
Yes, took dev [Madhu Rajanna] help to recover  cluster

Is there any workaround available to the best of your knowledge?
Yes

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
2

Can this issue reproducible?
Yes

Can this issue reproduce from the UI?


If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1. Upgrade odf from 4.14 to 4.15
2. create a fedora pod in new project.
3. the fedora app ppod creation will fail due to this issue.


Actual results:

App pod creation failed due to "unable to get monitor info from DNS SRV with service name: ceph-mon"  issue.

Expected results:
App pod creation should not be filed due to "unable to get monitor info from DNS SRV with service name: ceph-mon" 

Additional info:

Comment 3 Madhu Rajanna 2024-02-22 14:11:11 UTC
>oc get cepc get cephcluster -nopenshift-storage -oyaml
...
spec:
    cephVersion:
      image: registry.redhat.io/rhceph/rhceph-6-rhel9@sha256:9dbd051cfcdb334aad33a536cc115ae1954edaea5f8cb5943ad615f1b41b0226
    cleanupPolicy:
      sanitizeDisks: {}
    continueUpgradeAfterChecksEvenIfNotHealthy: true
    crashCollector: {}
    csi:
      cephfs:
        kernelMountOptions: ms_mode=prefer-crc



oc get cm rook-ceph-csi-config -oyaml
apiVersion: v1
data:
  csi-cluster-config-json: '[{"clusterID":"openshift-storage","monitors":["172.30.191.241:3300","172.30.84.6:3300","172.30.226.19:3300"],"namespace":"openshift-storage"},{"clusterID":"5bb69c306a7d011c3e91c3cec112fb7a","monitors":["172.30.191.241:3300","172.30.84.6:3300","172.30.226.19:3300"],"namespace":"openshift-storage","cephFS":{"subvolumeGroup":"csi","kernelMountOptions":"ms_mode=prefer-crc"}}]'
kind: ConfigMap
metadata:
  creationTimestamp: "2024-02-22T13:50:52Z"
  name: rook-ceph-csi-config
  namespace: openshift-storage
  ownerReferences:
  - apiVersion: apps/v1
    blockOwnerDeletion: false
    controller: true
    kind: Deployment
    name: rook-ceph-operator
    uid: af2a4d1f-939d-4932-b116-c45f5f0b90c9
  resourceVersion: "722929"



i see that we have kernelMountOptions in the ceph cluster CR but that is not added to the csi config for all the cluster ID's 

workaround:
delete csi configmap and rook operator pod

oc delete cm rook-ceph-csi-config
oc delete po/rook-ceph-operator-7c56874fb6-l8gpw

Comment 16 Nagendra Reddy 2024-03-12 08:03:24 UTC
Verified with build: 4.15.0-157

kernelMountOptions value added along with mon ips, no delay observed. Hence, moving this BZ to Verified state.

steps followed:

1. Watch the rook-ceph-csi config map in a terminal with the cmd "oc get cm rook-ceph-csi-config -w -o yaml"
2. oc delete pod rook-ceph-operator-79bc976c7b-dtlfx
3. oc delete cm rook-ceph-csi-config
4. Monitor the output of "oc get cm rook-ceph-csi-config -w -o yaml", it looks like below.
5. It is observed that kernelMountOptions value added along with mon ips
---
apiVersion: v1
data:
  csi-cluster-config-json: '[]'
kind: ConfigMap
metadata:
  creationTimestamp: "2024-03-12T07:43:17Z"
  name: rook-ceph-csi-config
  namespace: openshift-storage
  ownerReferences:
  - apiVersion: ceph.rook.io/v1
    blockOwnerDeletion: true
    controller: true
    kind: CephCluster
    name: ocs-storagecluster-cephcluster
    uid: 0634e3ee-cf97-456d-b0c8-fe7a2f3f4cdc
  resourceVersion: "882199"
  uid: 8ef236eb-fac3-439f-98bc-2649ba0e1b5c
---
apiVersion: v1
data:
  csi-cluster-config-json: '[{"clusterID":"5bb69c306a7d011c3e91c3cec112fb7a","monitors":["172.30.2.199:3300","172.30.146.184:3300","172.30.52.210:3300"],"namespace":"openshift-storage","cephFS":{"subvolumeGroup":"csi","kernelMountOptions":"ms_mode=prefer-crc"},"readAffinity":{"enabled":true,"crushLocationLabels":["kubernetes.io/hostname","topology.kubernetes.io/region","topology.kubernetes.io/zone","topology.rook.io/chassis","topology.rook.io/rack","topology.rook.io/row","topology.rook.io/pdu","topology.rook.io/pod","topology.rook.io/room","topology.rook.io/datacenter"]}}]'
kind: ConfigMap
metadata:
  creationTimestamp: "2024-03-12T07:43:17Z"
  name: rook-ceph-csi-config
  namespace: openshift-storage
  ownerReferences:
  - apiVersion: ceph.rook.io/v1
    blockOwnerDeletion: true
    controller: true
    kind: CephCluster
    name: ocs-storagecluster-cephcluster
    uid: 0634e3ee-cf97-456d-b0c8-fe7a2f3f4cdc
  resourceVersion: "882246"
  uid: 8ef236eb-fac3-439f-98bc-2649ba0e1b5c
---
apiVersion: v1
data:
  csi-cluster-config-json: '[{"clusterID":"5bb69c306a7d011c3e91c3cec112fb7a","monitors":["172.30.146.184:3300","172.30.52.210:3300","172.30.2.199:3300"],"namespace":"openshift-storage","cephFS":{"subvolumeGroup":"csi","kernelMountOptions":"ms_mode=prefer-crc"},"readAffinity":{"enabled":true,"crushLocationLabels":["kubernetes.io/hostname","topology.kubernetes.io/region","topology.kubernetes.io/zone","topology.rook.io/chassis","topology.rook.io/rack","topology.rook.io/row","topology.rook.io/pdu","topology.rook.io/pod","topology.rook.io/room","topology.rook.io/datacenter"]}},{"clusterID":"openshift-storage","monitors":["172.30.146.184:3300","172.30.52.210:3300","172.30.2.199:3300"],"namespace":"openshift-storage","cephFS":{"kernelMountOptions":"ms_mode=prefer-crc"},"readAffinity":{"enabled":true,"crushLocationLabels":["kubernetes.io/hostname","topology.kubernetes.io/region","topology.kubernetes.io/zone","topology.rook.io/chassis","topology.rook.io/rack","topology.rook.io/row","topology.rook.io/pdu","topology.rook.io/pod","topology.rook.io/room","topology.rook.io/datacenter"]}}]'
kind: ConfigMap
metadata:
  creationTimestamp: "2024-03-12T07:43:17Z"
  name: rook-ceph-csi-config
  namespace: openshift-storage
  ownerReferences:
  - apiVersion: ceph.rook.io/v1
    blockOwnerDeletion: true
    controller: true
    kind: CephCluster
    name: ocs-storagecluster-cephcluster
    uid: 0634e3ee-cf97-456d-b0c8-fe7a2f3f4cdc
  resourceVersion: "882281"
  uid: 8ef236eb-fac3-439f-98bc-2649ba0e1b5c
---

Comment 17 errata-xmlrpc 2024-03-19 15:33:05 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.15.0 security, enhancement, & bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2024:1383