2155402 – Pod and PVC for replica-1 pool in pending state

Bug 2155402 - Pod and PVC for replica-1 pool in pending state

Summary: Pod and PVC for replica-1 pool in pending state

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat OpenShift Data Foundation
Classification:	Red Hat Storage
Component:	ocs-operator
Sub Component:
Version:	4.12
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	ODF 4.12.0
Assignee:	Malay Kumar parida
QA Contact:	Martin Bukatovic
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2022-12-21 06:27 UTC by narayanspg
Modified:	2023-08-09 17:00 UTC (History)
CC List:	9 users (show)
Fixed In Version:	4.12.0-156
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2023-02-08 14:06:28 UTC
Embargoed:

Attachments	(Terms of Use)
PVC events (373.94 KB, image/png) 2022-12-21 06:27 UTC, narayanspg	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	red-hat-storage ocs-operator pull 1902	0	None	open	Set domainLabel as hostname instead of host in topology constrained pool	2022-12-22 09:47:58 UTC
Github	red-hat-storage ocs-operator pull 1903	0	None	open	Bug 2155402:[release-4.12] set domainLabel to "hostname" instead of "host" in topology constrained pool	2022-12-23 05:07:57 UTC

Description narayanspg 2022-12-21 06:27:08 UTC

Created attachment 1933876 [details]
PVC events

Description of problem (please be detailed as possible and provide log
snippests): Pod and PVC for replica-1 pool in pending state

for Replica 1 - Non-resilient pool - Dev Preview, after creating Non-resilient pool.

When we try to create a Pod with volume mount trying to use the pvc that we create with the non-resilient storageclass
The pod stays forever in pending state
The PVC the pod refers to stays forever in pending state


below is the feature link - 
https://issues.redhat.com/browse/RHSTOR-3283

followed the steps as mentioned in the article - https://hackmd.io/lAQfcgT3Qw2idCushZ7_fw#Validating-Metadata-on-RBD-amp-CephFS-Volumes-with-ocs-operator

we earlier had storagecluster stuck issue (BZ https://bugzilla.redhat.com/show_bug.cgi?id=2141265) which got resolved and this is new issue.

Version of all relevant components (if applicable):
ODF 4.12 

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
we are not able continue to validate the feature 

Is there any workaround available to the best of your knowledge?
No

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
3

Can this issue reproducible?
Yes

Can this issue reproduce from the UI?
yes

If this is a regression, please provide more details to justify this:


Steps to Reproduce:
followed steps from here - https://hackmd.io/lAQfcgT3Qw2idCushZ7_fw#Validating-Metadata-on-RBD-amp-CephFS-Volumes-with-ocs-operator
created the pod with PVC created.


Actual results:
The PVC is stuck waiting for vol to be created. 

Expected results:
PVC should get bound and subsequently pod should get ready state.

Additional info:
We have OCP cluster environment if required can be accessed for debugging.
"https://console-openshift-console.apps.rdr-res3.ibm.com"

user: kubeadmin and password: uygnj-mGbhK-3WkC6-8ypFR

please add below key at the end of the hosts file (as single line).

158.176.146.43 api.rdr-res3.ibm.com console-openshift-console.apps.rdr-res3.ibm.com integrated-oauth-server-openshift-authentication.apps.rdr-res3.ibm.com oauth-openshift.apps.rdr-res3.ibm.com prometheus-k8s-openshift-monitoring.apps.rdr-res3.ibm.com grafana-openshift-monitoring.apps.rdr-res3.ibm.com example.apps.rdr-res3.ibm.com

Comment 2 Malay Kumar parida 2022-12-21 07:12:35 UTC

This is the log message we see in the rbd provisoner pod

W1221 07:09:36.664740       1 controller.go:934] Retrying syncing claim "cabf9c36-0b32-43b9-9f9b-ad680bccbe3a", failure 705
E1221 07:09:36.664782       1 controller.go:957] error syncing claim "cabf9c36-0b32-43b9-9f9b-ad680bccbe3a": failed to provision volume with StorageClass "ocs-storagecluster-ceph-non-resilient-rbd": rpc error: code = Internal desc = none of the topology constrained pools matched requested topology constraints : pools ([{PoolName:ocs-storagecluster-cephblockpool-worker-0 DataPoolName: DomainSegments:[{DomainLabel:host DomainValue:worker-0}]} {PoolName:ocs-storagecluster-cephblockpool-worker-1 DataPoolName: DomainSegments:[{DomainLabel:host DomainValue:worker-1}]} {PoolName:ocs-storagecluster-cephblockpool-worker-2 DataPoolName: DomainSegments:[{DomainLabel:host DomainValue:worker-2}]}]) requested topology ({Requisite:[segments:<key:"topology.openshift-storage.rbd.csi.ceph.com/hostname" value:"worker-0" >  segments:<key:"topology.openshift-storage.rbd.csi.ceph.com/hostname" value:"worker-1" >  segments:<key:"topology.openshift-storage.rbd.csi.ceph.com/hostname" value:"worker-2" > ] Preferred:[segments:<key:"topology.openshift-storage.rbd.csi.ceph.com/hostname" value:"worker-0" >  segments:<key:"topology.openshift-storage.rbd.csi.ceph.com/hostname" value:"worker-1" >  segments:<key:"topology.openshift-storage.rbd.csi.ceph.com/hostname" value:"worker-2" > ] XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0})
I1221 07:09:36.664811       1 event.go:285] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"openshift-storage", Name:"non-resilient-rbd-pvc", UID:"cabf9c36-0b32-43b9-9f9b-ad680bccbe3a", APIVersion:"v1", ResourceVersion:"3503148", FieldPath:""}): type: 'Warning' reason: 'ProvisioningFailed' failed to provision volume with StorageClass "ocs-storagecluster-ceph-non-resilient-rbd": rpc error: code = Internal desc = none of the topology constrained pools matched requested topology constraints : pools ([{PoolName:ocs-storagecluster-cephblockpool-worker-0 DataPoolName: DomainSegments:[{DomainLabel:host DomainValue:worker-0}]} {PoolName:ocs-storagecluster-cephblockpool-worker-1 DataPoolName: DomainSegments:[{DomainLabel:host DomainValue:worker-1}]} {PoolName:ocs-storagecluster-cephblockpool-worker-2 DataPoolName: DomainSegments:[{DomainLabel:host DomainValue:worker-2}]}]) requested topology ({Requisite:[segments:<key:"topology.openshift-storage.rbd.csi.ceph.com/hostname" value:"worker-0" >  segments:<key:"topology.openshift-storage.rbd.csi.ceph.com/hostname" value:"worker-1" >  segments:<key:"topology.openshift-storage.rbd.csi.ceph.com/hostname" value:"worker-2" > ] Preferred:[segments:<key:"topology.openshift-storage.rbd.csi.ceph.com/hostname" value:"worker-0" >  segments:<key:"topology.openshift-storage.rbd.csi.ceph.com/hostname" value:"worker-1" >  segments:<key:"topology.openshift-storage.rbd.csi.ceph.com/hostname" value:"worker-2" > ] XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0})

Comment 3 Madhu Rajanna 2022-12-21 07:15:25 UTC

Please attach OCS-mustgather for checking what is wrong.

Comment 7 narayanspg 2022-12-22 04:02:56 UTC

as per the latest update from Malay is working on the fix for this issue.

Comment 15 narayanspg 2023-01-10 10:09:27 UTC

The issue is resolved on the latest build tested. we can close this issue.
Thank you.

Comment 16 narayanspg 2023-01-12 07:15:56 UTC

I am not getting the option to move to a Verified state. only two options are available (ON_QA and CLOSED).
Please mark as Verified.

Comment 17 Harish NV Rao 2023-01-12 07:17:37 UTC

(In reply to narayanspg from comment #16)
> I am not getting the option to move to a Verified state. only two options
> are available (ON_QA and CLOSED).
> Please mark as Verified.
Thanks!
Moving the BZ to the verified state

Note You need to log in before you can comment on or make changes to this bug.