Bug 2244353 - [OCP Tracker] [RDR][CEPHFS] volsync-rsync-src pod's are stuck in ContainerCreating with msg for volume is not a mountpoint
Summary: [OCP Tracker] [RDR][CEPHFS] volsync-rsync-src pod's are stuck in ContainerCre...
Keywords:
Status: ON_QA
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: distribution
Version: 4.14
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ODF 4.16.0
Assignee: Madhu Rajanna
QA Contact: Pratik Surve
URL:
Whiteboard:
: 2240161 (view as bug list)
Depends On:
Blocks: 2244409
TreeView+ depends on / blocked
 
Reported: 2023-10-16 06:43 UTC by Pratik Surve
Modified: 2024-04-30 05:12 UTC (History)
8 users (show)

Fixed In Version: 4.16.0-86
Doc Type: Known Issue
Doc Text:
Missing NodeStageVolume RPC call blocks new pods from going into Running state NodeStageVolume RPC call is not being issued blocking some pods from going into `Running` state. The new pods are stuck in `Pending` forever. To workaround this issue, scale down all the affected pods at once or do a node reboot. After applying the workaround, all pods should go into Running state.
Clone Of:
Environment:
Last Closed:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker STOR-1473 0 None None None 2023-10-31 12:11:55 UTC

Description Pratik Surve 2023-10-16 06:43:00 UTC
Description of problem (please be detailed as possible and provide log
snippests):

[RDR][CEPHFS] volsync-rsync-src pod's are stuck in ContainerCreating with msg for volume is not a mountpoint

Events:
  Type     Reason       Age                      From     Message
  ----     ------       ----                     ----     -------
  Warning  FailedMount  7m25s (x1537 over 2d4h)  kubelet  Unable to attach or mount volumes: unmounted volumes=[data], unattached volumes=[], failed to process volumes=[]: timed out waiting for the condition
  Warning  FailedMount  107s (x1566 over 2d4h)   kubelet  MountVolume.SetUp failed for volume "pvc-5cadfa46-6038-4a36-bf6e-aebd3a133911" : rpc error: code = Internal desc = staging path /var/lib/kubelet/plugins/kubernetes.io/csi/openshift-storage.cephfs.csi.ceph.com/91a3d7827df8a706577f16aa678e4fdd181b9ce40d8d9436c21cf5ef6956588b/globalmount for volume 0001-0011-openshift-storage-0000000000000001-03559d4c-9763-49cd-83f6-90d6dae75f1e is not a mountpoint



Version of all relevant components (if applicable):

OCP version:- 4.14.0-0.nightly-2023-10-10-084534
ODF version:- 4.14.0-150
CEPH version:- ceph version 17.2.6-146.el9cp (1d01c2b30b5fd39787bb8804707c4b2e52e30137) quincy (stable)
ACM version:- 2.9.0-185
SUBMARINER version:- devel
VOLSYNC version:- volsync-product.v0.8.0
VOLSYNC method:- destinationCopyMethod: Direct


Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
yes

Is there any workaround available to the best of your knowledge?


Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?


Can this issue reproducible?


Can this issue reproduce from the UI?


If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1.Deploy RDR cluster
2.Run cephfs workload
3.check pod status in 1-2 days


Actual results:

volsync-rsync-tls-src-busybox-pvc-20-lbw4w   0/1     ContainerStatusUnknown   1          2d5h
volsync-rsync-tls-src-busybox-pvc-20-z6hp7   0/1     ContainerCreating        0          2d5h


Events:
  Type     Reason       Age                      From     Message
  ----     ------       ----                     ----     -------
  Warning  FailedMount  7m25s (x1537 over 2d4h)  kubelet  Unable to attach or mount volumes: unmounted volumes=[data], unattached volumes=[], failed to process volumes=[]: timed out waiting for the condition
  Warning  FailedMount  107s (x1566 over 2d4h)   kubelet  MountVolume.SetUp failed for volume "pvc-5cadfa46-6038-4a36-bf6e-aebd3a133911" : rpc error: code = Internal desc = staging path /var/lib/kubelet/plugins/kubernetes.io/csi/openshift-storage.cephfs.csi.ceph.com/91a3d7827df8a706577f16aa678e4fdd181b9ce40d8d9436c21cf5ef6956588b/globalmount for volume 0001-0011-openshift-storage-0000000000000001-03559d4c-9763-49cd-83f6-90d6dae75f1e is not a mountpoint

Expected results:
Pod should not be stuck in ContainerCreating for this long time

Additional info:

Comment 8 Madhu Rajanna 2023-10-31 12:24:46 UTC
*** Bug 2240161 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.