Bug 1671364 - In app pods, mpath device name is not mapped/created for some blockvolumes in the initiator side
Summary: In app pods, mpath device name is not mapped/created for some blockvolumes i...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: kubernetes
Version: ocs-3.11
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
: OCS 3.11.z Batch Update 4
Assignee: Humble Chirammal
QA Contact: Sri Vignesh Selvan
URL:
Whiteboard:
Depends On:
Blocks: 1669979
TreeView+ depends on / blocked
 
Reported: 2019-01-31 13:34 UTC by Sri Vignesh Selvan
Modified: 2020-02-13 05:21 UTC (History)
16 users (show)

Fixed In Version: >= atomic-openshift-3.11.123-1
Doc Type: Known Issue
Doc Text:
When an application pod is created with multipath configuration and a gluster-block PVC, the iSCSI driver waits for 10 seconds for the storage path discovery. However, device discovery can be delayed when the system is under load or similar conditions. When only one path is discovered within 10 seconds, this path is used to mount the volume to the application pod. This means that failover to a different path does not work even when multiple paths were configured. To work around this issue, delete and reschedule the pod; there is a high possibility that, the new pod mounts the volume with all the paths.
Clone Of:
Environment:
Last Closed: 2020-02-13 05:21:52 UTC
Embargoed:
hchiramm: needinfo-
knarra: needinfo-


Attachments (Terms of Use)

Comment 4 Prasanna Kumar Kalever 2019-02-07 10:28:06 UTC
vignesh, ping!!

Comment 7 Prasanna Kumar Kalever 2019-03-28 12:31:19 UTC
IMO, There is nothing that can be done at gluster-block layer.

Request Humble and team to provide some insight.

Request Vignesh to attach the sos-reports.

Comment 9 RamaKasturi 2019-03-28 14:46:31 UTC
Acked for 3.11.3

Comment 14 Prasanna Kumar Kalever 2019-05-15 05:50:37 UTC
@Humble, @Vignesh

Please see comment 6, for complete analysis summary.

Thanks!

Comment 23 Humble Chirammal 2019-05-21 10:40:33 UTC
>>>

* Block volumes is working as expected.
* Two things to look at:
1. Increasing the system memory should help device-mapper bring the mapper device faster, as a result OCP will get mapper device
2. From https://github.com/openshift/origin/pull/21197 (Fix went as part of BZ#1596021) the wait for mapper device is still 10 sec (see patch https://github.com/openshift/origin/pull/21197/commits/c2058ab05a79c4fcbf1a0b9bd43ff5cc5ef7d04f#diff-30894ca64f3fc736355610ef191d8bb7R413), But again IMHO this should be tuned to >30 sec.
>>>

Again we are in the same page that, its just a resource crunch where device mapper multipath is not able to discover the paths within 10 seconds. But, thats what it is coded and its working as designed. 

If we are looking for a higher timeout I have to work with Kube community and try to convince them. 

For now, I dont see any BUG. IMO, either we have to document the behaviour or CLOSED/VERIFY this bug as working as expected/designed!

Comment 24 RamaKasturi 2019-05-22 14:27:20 UTC
(In reply to Humble Chirammal from comment #23)
> >>>
> 
> * Block volumes is working as expected.
> * Two things to look at:
> 1. Increasing the system memory should help device-mapper bring the mapper
> device faster, as a result OCP will get mapper device
I would like to understand why we would need to increase memory since we are running the test at recommended resources and customer running at this config would be hitting the same issue.

> 2. From https://github.com/openshift/origin/pull/21197 (Fix went as part of
> BZ#1596021) the wait for mapper device is still 10 sec (see patch
> https://github.com/openshift/origin/pull/21197/commits/
> c2058ab05a79c4fcbf1a0b9bd43ff5cc5ef7d04f#diff-
> 30894ca64f3fc736355610ef191d8bb7R413), But again IMHO this should be tuned
> to >30 sec.
> >>>
> 
> Again we are in the same page that, its just a resource crunch where device
> mapper multipath is not able to discover the paths within 10 seconds. But,
> thats what it is coded and its working as designed. 
> 
> If we are looking for a higher timeout I have to work with Kube community
> and try to convince them. 
> 
> For now, I dont see any BUG. IMO, either we have to document the behaviour
> or CLOSED/VERIFY this bug as working as expected/designed!

Comment 37 Humble Chirammal 2019-07-09 09:53:32 UTC
The fix is available in OCP 3.11 latest build. Please validate, I am moving to ON_QA .


Note You need to log in before you can comment on or make changes to this bug.