Bug 2265514 - "unable to get monitor info from DNS SRV with service name: ceph-mon" error observed while creating fedora app pod
Summary: "unable to get monitor info from DNS SRV with service name: ceph-mon" error o...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: rook
Version: 4.15
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
: ODF 4.15.0
Assignee: Rakshith
QA Contact: Nagendra Reddy
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2024-02-22 14:05 UTC by Nagendra Reddy
Modified: 2024-03-19 15:33 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2024-03-19 15:33:05 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github red-hat-storage rook pull 585 0 None Draft [WIP]BUG 2265514: csi: update CSIDriverOption params during saving cluster config 2024-03-04 15:29:15 UTC
Github rook rook issues 13835 0 None open csi: CSIDriverOptions are not filled when entry in csi cm is created with mon ips 2024-03-12 07:28:32 UTC
Github rook rook pull 13836 0 None Draft csi: update CSIDriverOption params during saving cluster config 2024-03-04 09:42:44 UTC
Red Hat Product Errata RHSA-2024:1383 0 None None None 2024-03-19 15:33:08 UTC

Description Nagendra Reddy 2024-02-22 14:05:16 UTC
Created attachment 2018159 [details]
error in app pod events

Description of problem (please be detailed as possible and provide log
snippests):
Upgraded ODF from 4.14 GA to 4.15, observed "unable to get monitor info from DNS SRV with service name: ceph-mon" error while creating fedora app pod

Version of all relevant components (if applicable):
odf: 4.15.0-147

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
Yes, took dev [Madhu Rajanna] help to recover  cluster

Is there any workaround available to the best of your knowledge?
Yes

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
2

Can this issue reproducible?
Yes

Can this issue reproduce from the UI?


If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1. Upgrade odf from 4.14 to 4.15
2. create a fedora pod in new project.
3. the fedora app ppod creation will fail due to this issue.


Actual results:

App pod creation failed due to "unable to get monitor info from DNS SRV with service name: ceph-mon"  issue.

Expected results:
App pod creation should not be filed due to "unable to get monitor info from DNS SRV with service name: ceph-mon" 

Additional info:

Comment 3 Madhu Rajanna 2024-02-22 14:11:11 UTC
>oc get cepc get cephcluster -nopenshift-storage -oyaml
...
spec:
    cephVersion:
      image: registry.redhat.io/rhceph/rhceph-6-rhel9@sha256:9dbd051cfcdb334aad33a536cc115ae1954edaea5f8cb5943ad615f1b41b0226
    cleanupPolicy:
      sanitizeDisks: {}
    continueUpgradeAfterChecksEvenIfNotHealthy: true
    crashCollector: {}
    csi:
      cephfs:
        kernelMountOptions: ms_mode=prefer-crc



oc get cm rook-ceph-csi-config -oyaml
apiVersion: v1
data:
  csi-cluster-config-json: '[{"clusterID":"openshift-storage","monitors":["172.30.191.241:3300","172.30.84.6:3300","172.30.226.19:3300"],"namespace":"openshift-storage"},{"clusterID":"5bb69c306a7d011c3e91c3cec112fb7a","monitors":["172.30.191.241:3300","172.30.84.6:3300","172.30.226.19:3300"],"namespace":"openshift-storage","cephFS":{"subvolumeGroup":"csi","kernelMountOptions":"ms_mode=prefer-crc"}}]'
kind: ConfigMap
metadata:
  creationTimestamp: "2024-02-22T13:50:52Z"
  name: rook-ceph-csi-config
  namespace: openshift-storage
  ownerReferences:
  - apiVersion: apps/v1
    blockOwnerDeletion: false
    controller: true
    kind: Deployment
    name: rook-ceph-operator
    uid: af2a4d1f-939d-4932-b116-c45f5f0b90c9
  resourceVersion: "722929"



i see that we have kernelMountOptions in the ceph cluster CR but that is not added to the csi config for all the cluster ID's 

workaround:
delete csi configmap and rook operator pod

oc delete cm rook-ceph-csi-config
oc delete po/rook-ceph-operator-7c56874fb6-l8gpw

Comment 16 Nagendra Reddy 2024-03-12 08:03:24 UTC
Verified with build: 4.15.0-157

kernelMountOptions value added along with mon ips, no delay observed. Hence, moving this BZ to Verified state.

steps followed:

1. Watch the rook-ceph-csi config map in a terminal with the cmd "oc get cm rook-ceph-csi-config -w -o yaml"
2. oc delete pod rook-ceph-operator-79bc976c7b-dtlfx
3. oc delete cm rook-ceph-csi-config
4. Monitor the output of "oc get cm rook-ceph-csi-config -w -o yaml", it looks like below.
5. It is observed that kernelMountOptions value added along with mon ips
---
apiVersion: v1
data:
  csi-cluster-config-json: '[]'
kind: ConfigMap
metadata:
  creationTimestamp: "2024-03-12T07:43:17Z"
  name: rook-ceph-csi-config
  namespace: openshift-storage
  ownerReferences:
  - apiVersion: ceph.rook.io/v1
    blockOwnerDeletion: true
    controller: true
    kind: CephCluster
    name: ocs-storagecluster-cephcluster
    uid: 0634e3ee-cf97-456d-b0c8-fe7a2f3f4cdc
  resourceVersion: "882199"
  uid: 8ef236eb-fac3-439f-98bc-2649ba0e1b5c
---
apiVersion: v1
data:
  csi-cluster-config-json: '[{"clusterID":"5bb69c306a7d011c3e91c3cec112fb7a","monitors":["172.30.2.199:3300","172.30.146.184:3300","172.30.52.210:3300"],"namespace":"openshift-storage","cephFS":{"subvolumeGroup":"csi","kernelMountOptions":"ms_mode=prefer-crc"},"readAffinity":{"enabled":true,"crushLocationLabels":["kubernetes.io/hostname","topology.kubernetes.io/region","topology.kubernetes.io/zone","topology.rook.io/chassis","topology.rook.io/rack","topology.rook.io/row","topology.rook.io/pdu","topology.rook.io/pod","topology.rook.io/room","topology.rook.io/datacenter"]}}]'
kind: ConfigMap
metadata:
  creationTimestamp: "2024-03-12T07:43:17Z"
  name: rook-ceph-csi-config
  namespace: openshift-storage
  ownerReferences:
  - apiVersion: ceph.rook.io/v1
    blockOwnerDeletion: true
    controller: true
    kind: CephCluster
    name: ocs-storagecluster-cephcluster
    uid: 0634e3ee-cf97-456d-b0c8-fe7a2f3f4cdc
  resourceVersion: "882246"
  uid: 8ef236eb-fac3-439f-98bc-2649ba0e1b5c
---
apiVersion: v1
data:
  csi-cluster-config-json: '[{"clusterID":"5bb69c306a7d011c3e91c3cec112fb7a","monitors":["172.30.146.184:3300","172.30.52.210:3300","172.30.2.199:3300"],"namespace":"openshift-storage","cephFS":{"subvolumeGroup":"csi","kernelMountOptions":"ms_mode=prefer-crc"},"readAffinity":{"enabled":true,"crushLocationLabels":["kubernetes.io/hostname","topology.kubernetes.io/region","topology.kubernetes.io/zone","topology.rook.io/chassis","topology.rook.io/rack","topology.rook.io/row","topology.rook.io/pdu","topology.rook.io/pod","topology.rook.io/room","topology.rook.io/datacenter"]}},{"clusterID":"openshift-storage","monitors":["172.30.146.184:3300","172.30.52.210:3300","172.30.2.199:3300"],"namespace":"openshift-storage","cephFS":{"kernelMountOptions":"ms_mode=prefer-crc"},"readAffinity":{"enabled":true,"crushLocationLabels":["kubernetes.io/hostname","topology.kubernetes.io/region","topology.kubernetes.io/zone","topology.rook.io/chassis","topology.rook.io/rack","topology.rook.io/row","topology.rook.io/pdu","topology.rook.io/pod","topology.rook.io/room","topology.rook.io/datacenter"]}}]'
kind: ConfigMap
metadata:
  creationTimestamp: "2024-03-12T07:43:17Z"
  name: rook-ceph-csi-config
  namespace: openshift-storage
  ownerReferences:
  - apiVersion: ceph.rook.io/v1
    blockOwnerDeletion: true
    controller: true
    kind: CephCluster
    name: ocs-storagecluster-cephcluster
    uid: 0634e3ee-cf97-456d-b0c8-fe7a2f3f4cdc
  resourceVersion: "882281"
  uid: 8ef236eb-fac3-439f-98bc-2649ba0e1b5c
---

Comment 17 errata-xmlrpc 2024-03-19 15:33:05 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.15.0 security, enhancement, & bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2024:1383


Note You need to log in before you can comment on or make changes to this bug.