Bug 2322671 - [Stretch cluster] RWX storage issue on surviving zone : MountVolume.MountDevice failed [NEEDINFO]
Summary: [Stretch cluster] RWX storage issue on surviving zone : MountVolume.MountDevi...
Keywords:
Status: NEW
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: rook
Version: 4.17
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ---
Assignee: Santosh Pillai
QA Contact: Neha Berry
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2024-10-30 06:56 UTC by morstad
Modified: 2024-11-04 06:49 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Embargoed:
sapillai: needinfo? (vshankar)
sapillai: needinfo? (lflores)


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker OCSBZM-9455 0 None None None 2024-10-30 06:58:10 UTC

Description morstad 2024-10-30 06:56:13 UTC
Description of problem (please be detailed as possible and provide log
snippests):

Stretch Cluster testing w/ "DR" scenario - entire application was running on zone-1, took zone-1 down, & monitored recovery of application to zone-2.  The application relocated to zone-2 and portion was up and running in approx 20min, but several pods requiring RWX storage where struck in ContainerCreating with similar message to below :


Events:
  Type     Reason       Age    From               Message
  ----     ------       ----   ----               -------
  Normal   Scheduled    3m48s  ibm-cpd-scheduler  Successfully assigned cpd-ins/asset-files-api-5f46c7f599-hvfdz to dahorak-ibmcloud-bwvbz-worker-2-6dzcw
  Warning  FailedMount  38s    kubelet            MountVolume.MountDevice failed for volume "pvc-a45db048-7f1e-4be6-8ca2-2f47ea09046e" : rpc error: code = Aborted desc = an operation with the given Volume ID 0001-0011-openshift-storage-0000000000000001-a9a70a11-eca5-4377-aad4-7f276bfb1d46 already exists


Version of all relevant components (if applicable):


Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)? Yes - unable to recover application on surviving zone


Is there any workaround available to the best of your knowledge? No


Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)? 4 - custom install of application


Can this issue reproducible? likely - may be timing based


Can this issue reproduce from the UI? no


If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1. Install application with multiple pods using RWX storage on zone-1
2. Shutdown zone-1 & force delete pods once enter Terminating state
3. Monitor application relocation to surviving zone


Actual results:
Pods with RWX storage are able to mount drives

Expected results:
Pods with RWX storage were NOT able to mount drives

Additional info:


Note You need to log in before you can comment on or make changes to this bug.