2304238 – failed to create pod on ocs-storagecluster-cephfs volume

Bug 2304238 - failed to create pod on ocs-storagecluster-cephfs volume

Summary: failed to create pod on ocs-storagecluster-cephfs volume

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenShift Data Foundation
Classification:	Red Hat Storage
Component:	rook
Sub Component:
Version:	4.17
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Target Release:	ODF 4.17.0
Assignee:	Santosh Pillai
QA Contact:	Vijay Avuthu
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2024-08-13 05:50 UTC by Vijay Avuthu
Modified:	2024-10-30 14:30 UTC (History)
CC List:	6 users (show)
Fixed In Version:	4.17.0-77
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2024-10-30 14:30:51 UTC
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	red-hat-storage ocs-operator pull 2747	None	open	[DNM]remove upmap-read balancer mode	2024-08-13 08:41:46 UTC
Github	red-hat-storage ocs-operator pull 2748	None	open	Bug 2304238: [release-4.17] remove upmap-read balancer mode	2024-08-13 11:22:58 UTC
Red Hat Issue Tracker	OCSBZM-8856	None	None	None	2024-08-26 11:14:28 UTC
Red Hat Product Errata	RHSA-2024:8676	None	None	None	2024-10-30 14:30:55 UTC

Description Vijay Avuthu 2024-08-13 05:50:34 UTC

Description of problem (please be detailed as possible and provide log
snippests):

Failed to create pod on ocs-storagecluster-cephfs volume


Version of all relevant components (if applicable):
openshift installer (4.17.0-0.nightly-2024-08-09-031511)
ocs-registry:4.17.0-70


Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
Yes

Is there any workaround available to the best of your knowledge?
No

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
1

Can this issue reproducible?
Yes ( 2/2 )

Can this issue reproduce from the UI?
Not tried

If this is a regression, please provide more details to justify this:
Yes

Steps to Reproduce:
1. install ODF using ocs-ci
2. create pod using sc "ocs-storagecluster-cephfs"
3.


Actual results:

$ oc describe pod image-registry-557c89c7b9-8gtdx  -n openshift-image-registry 
Name:                 image-registry-557c89c7b9-8gtdx
Namespace:            openshift-image-registry
Priority:             2000000000
Priority Class Name:  system-cluster-critical
.
.
 Warning  FailedMount             4m39s  kubelet                  MountVolume.MountDevice failed for volume "pvc-99a88fc8-18bb-4c9f-9b17-619803be4721" : rpc error: code = Internal desc = an error (exit status 32) occurred while running mount args: [-t ceph 172.30.211.105:3300,172.30.248.126:3300,172.30.233.144:3300:/volumes/csi/csi-vol-59580dd3-824d-4d3f-b5df-1d2aae13829c/0a9fdeac-ffa6-4f02-b223-be8e0a622b2e /var/lib/kubelet/plugins/kubernetes.io/csi/openshift-storage.cephfs.csi.ceph.com/f4c977bb751a5bd86b137e853bec376625ce8faccee6b0e3e15c6a5120020a9e/globalmount -o name=csi-cephfs-node,secretfile=/tmp/csi/keys/keyfile-3526911127,mds_namespace=ocs-storagecluster-cephfilesystem,ms_mode=prefer-crc,read_from_replica=localize,crush_location=host:compute-2|rack:rack2,_netdev] stderr: unable to get monitor info from DNS SRV with service name: ceph-mon
2024-08-13T05:17:39.311+0000 7f74dea48000 -1 failed for service _ceph-mon._tcp
mount error: no mds (Metadata Server) is up. The cluster might be laggy, or you may not be authorized
  Warning  FailedMount  3m37s  kubelet  MountVolume.MountDevice failed for volume "pvc-99a88fc8-18bb-4c9f-9b17-619803be4721" : rpc error: code = Internal desc = an error (exit status 32) occurred while running mount args: [-t ceph 172.30.211.105:3300,172.30.248.126:3300,172.30.233.144:3300:/volumes/csi/csi-vol-59580dd3-824d-4d3f-b5df-1d2aae13829c/0a9fdeac-ffa6-4f02-b223-be8e0a622b2e /var/lib/kubelet/plugins/kubernetes.io/csi/openshift-storage.cephfs.csi.ceph.com/f4c977bb751a5bd86b137e853bec376625ce8faccee6b0e3e15c6a5120020a9e/globalmount -o name=csi-cephfs-node,secretfile=/tmp/csi/keys/keyfile-3461892681,mds_namespace=ocs-storagecluster-cephfilesystem,ms_mode=prefer-crc,read_from_replica=localize,crush_location=host:compute-2|rack:rack2,_netdev] stderr: unable to get monitor info from DNS SRV with service name: ceph-mon
2024-08-13T05:18:42.272+0000 7f6df8ce3000 -1 failed for service _ceph-mon._tcp
mount error: no mds (Metadata Server) is up. The cluster might be laggy, or you may not be authorized
  Warning  FailedMount  2m36s  kubelet  MountVolume.MountDevice failed for volume "pvc-99a88fc8-18bb-4c9f-9b17-619803be4721" : rpc error: code = Internal desc = an error (exit status 32) occurred while running mount args: [-t ceph 172.30.211.105:3300,172.30.248.126:3300,172.30.233.144:3300:/volumes/csi/csi-vol-59580dd3-824d-4d3f-b5df-1d2aae13829c/0a9fdeac-ffa6-4f02-b223-be8e0a622b2e /var/lib/kubelet/plugins/kubernetes.io/csi/openshift-storage.cephfs.csi.ceph.com/f4c977bb751a5bd86b137e853bec376625ce8faccee6b0e3e15c6a5120020a9e/globalmount -o name=csi-cephfs-node,secretfile=/tmp/csi/keys/keyfile-1980131443,mds_namespace=ocs-storagecluster-cephfilesystem,ms_mode=prefer-crc,read_from_replica=localize,crush_location=host:compute-2|rack:rack2,_netdev] stderr: unable to get monitor info from DNS SRV with service name: ceph-mon


Expected results:
pod should be running


Additional info:

Tried creating the demo-pod( nginx ) and hit the same  issue

job: https://url.corp.redhat.com/23da020
must gather:  https://url.corp.redhat.com/8fb8363

Comment 9 Sunil Kumar Acharya 2024-08-26 11:12:44 UTC

Please update the RDT flag/text appropriately.

Comment 12 errata-xmlrpc 2024-10-30 14:30:51 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.17.0 Security, Enhancement, & Bug Fix Update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2024:8676

Note You need to log in before you can comment on or make changes to this bug.