2303177 – after upgrade from 4.15.15 to 4.15.18 image registry pods are stuck at “container creating”

Bug 2303177 - after upgrade from 4.15.15 to 4.15.18 image registry pods are stuck at “container creating”

Summary: after upgrade from 4.15.15 to 4.15.18 image registry pods are stuck at “conta...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenShift Data Foundation
Classification:	Red Hat Storage
Component:	csi-driver
Sub Component:
Version:	4.16
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	ODF 4.16.2
Assignee:	Praveen M
QA Contact:	Vijay Avuthu
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2024-08-06 14:29 UTC by Praveen M
Modified:	2025-01-29 04:25 UTC (History)
CC List:	4 users (show)
Fixed In Version:	4.16.2-3
Doc Type:	Bug Fix
Doc Text:	Previously, when the label on the node was empty, the mount would fail. With this fix, when the node label is empty, the node is not considered for `crush_location` mount option and as a result persistent volume claim (PVC) mounts successfully.
Clone Of:
Environment:
Last Closed:	2024-09-18 11:57:03 UTC
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	red-hat-storage ceph-csi pull 354	None	open	BUG 2303177: util: exclude empty label values for crushlocation map	2024-08-14 05:58:28 UTC
Red Hat Issue Tracker	OCSBZM-8806	None	None	None	2024-08-21 07:15:39 UTC
Red Hat Product Errata	RHSA-2024:6755	None	None	None	2024-09-18 11:57:07 UTC

Description Praveen M 2024-08-06 14:29:14 UTC

This bug was initially created as a copy of Bug #2297265

I am copying this bug because: 



Description of problem (please be detailed as possible and provide log
snippests):


after upgrade from 4.15.15 to 4.15.18 image registry pods are stuck at "container creating". 

Error in event log:

(combined from similar events): MountVolume.MountDevice failed for volume "pvc-c795a19b-07e7-4594-b998-6f41e537f65d" : rpc error: code = Internal desc = an error (exit status 32) occurred while running mount args: [-t ceph 172.30.176.65:3300,172.30.143.191:3300,172.30.124.83:3300:/volumes/csi/csi-vol-393bd202-3750-4d53-bae1-89916bfed53a/3f03bc85-670b-4fd9-9a71-6e24f2d11f5b /var/lib/kubelet/plugins/kubernetes.io/csi/openshift-storage.cephfs.csi.ceph.com/3c19db5ad93b22a0a40c7b2207f20aa1122367773bd729ef7050476308d75f49/globalmount -o name=csi-cephfs-node,secretfile=/tmp/csi/keys/keyfile-1623855236,mds_namespace=ocs-storagecluster-cephfilesystem,ms_mode=prefer-crc,read_from_replica=localize,crush_location=host:srv-ocdev03-demo-smartis-si|zone:,_netdev] stderr: unable to get monitor info from DNS SRV with service name: ceph-mon 2024-07-05T08:16:22.770+0000 7f9af3ce8ec0 -1 failed for service _ceph-mon._tcp mount error 22 = Invalid argument


NAME                                               READY   STATUS              RESTARTS   AGE     IP             NODE                          NOMINATED NODE   READINESS GATES
image-pruner-28670400-ds9vj                        0/1     Error               0          2d6h    10.130.0.247   srv-ocdev03.demo.smartis.si   <none>           <none>
image-pruner-28671840-jjcmj                        0/1     Error               0          30h     10.130.0.106   srv-ocdev03.demo.smartis.si   <none>           <none>
image-pruner-28673280-4x2kn                        0/1     Error               0          6h19m   10.130.1.191   srv-ocdev03.demo.smartis.si   <none>           <none>
image-registry-7bb46cc7f8-c958t                    0/1     ContainerCreating   0          2d23h   <none>         srv-ocdev03.demo.smartis.si   <none>           <none>
image-registry-7bb46cc7f8-qwpzn                    0/1     ContainerCreating   0          2d23h   <none>         srv-ocdev01.demo.smartis.si   <none>           <none>




Version of all relevant components (if applicable):

ODF 4.15.15 

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?

image-registry pods are unable to start



Is there any workaround available to the best of your knowledge?
No

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
N/A

Can this issue reproducible?
N/A

Can this issue reproduce from the UI?
N/A

If this is a regression, please provide more details to justify this:


Steps to Reproduce:
upgrade from 4.15.15 to 4.15.18  


Actual results:
image registry pods are stuck at "container creating". 


Expected results:
image registry pods should be running


Additional info:
possibly related BZ
There was an OpenShift Data Foundation bug[1] on this that was fixed in ODF 4.15.0 [2]
looks like this should have been fixed in 4.15.0 and they are on v4.15.3
[1] https://bugzilla.redhat.com/show_bug.cgi?id=2265514
[2] https://access.redhat.com/errata/RHSA-2024:1383

Comment 4 Sunil Kumar Acharya 2024-08-29 17:38:37 UTC

Please backport the fix to ODF-4.16 and update the RDT flag/text appropriately.

Comment 11 errata-xmlrpc 2024-09-18 11:57:03 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.16.2 security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2024:6755

Comment 13 Red Hat Bugzilla 2025-01-29 04:25:05 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days

Note You need to log in before you can comment on or make changes to this bug.