Bug 1671364

Summary: In app pods, mpath device name is not mapped/created for some blockvolumes in the initiator side
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Sri Vignesh Selvan <sselvan>
Component: kubernetesAssignee: Humble Chirammal <hchiramm>
Status: CLOSED CURRENTRELEASE QA Contact: Sri Vignesh Selvan <sselvan>
Severity: medium Docs Contact:
Priority: unspecified    
Version: ocs-3.11CC: asriram, hchiramm, knarra, kramdoss, madam, pasik, pkarampu, pprakash, prasanna.kalever, puebele, rcyriac, rhs-bugs, rtalur, sselvan, vbellur, xiubli
Target Milestone: ---Keywords: ZStream
Target Release: OCS 3.11.z Batch Update 4Flags: hchiramm: needinfo-
knarra: needinfo-
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: >= atomic-openshift-3.11.123-1 Doc Type: Known Issue
Doc Text:
When an application pod is created with multipath configuration and a gluster-block PVC, the iSCSI driver waits for 10 seconds for the storage path discovery. However, device discovery can be delayed when the system is under load or similar conditions. When only one path is discovered within 10 seconds, this path is used to mount the volume to the application pod. This means that failover to a different path does not work even when multiple paths were configured. To work around this issue, delete and reschedule the pod; there is a high possibility that, the new pod mounts the volume with all the paths.
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-02-13 05:21:52 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1669979    

Comment 4 Prasanna Kumar Kalever 2019-02-07 10:28:06 UTC
vignesh, ping!!

Comment 7 Prasanna Kumar Kalever 2019-03-28 12:31:19 UTC
IMO, There is nothing that can be done at gluster-block layer.

Request Humble and team to provide some insight.

Request Vignesh to attach the sos-reports.

Comment 9 RamaKasturi 2019-03-28 14:46:31 UTC
Acked for 3.11.3

Comment 14 Prasanna Kumar Kalever 2019-05-15 05:50:37 UTC
@Humble, @Vignesh

Please see comment 6, for complete analysis summary.

Thanks!

Comment 23 Humble Chirammal 2019-05-21 10:40:33 UTC
>>>

* Block volumes is working as expected.
* Two things to look at:
1. Increasing the system memory should help device-mapper bring the mapper device faster, as a result OCP will get mapper device
2. From https://github.com/openshift/origin/pull/21197 (Fix went as part of BZ#1596021) the wait for mapper device is still 10 sec (see patch https://github.com/openshift/origin/pull/21197/commits/c2058ab05a79c4fcbf1a0b9bd43ff5cc5ef7d04f#diff-30894ca64f3fc736355610ef191d8bb7R413), But again IMHO this should be tuned to >30 sec.
>>>

Again we are in the same page that, its just a resource crunch where device mapper multipath is not able to discover the paths within 10 seconds. But, thats what it is coded and its working as designed. 

If we are looking for a higher timeout I have to work with Kube community and try to convince them. 

For now, I dont see any BUG. IMO, either we have to document the behaviour or CLOSED/VERIFY this bug as working as expected/designed!

Comment 24 RamaKasturi 2019-05-22 14:27:20 UTC
(In reply to Humble Chirammal from comment #23)
> >>>
> 
> * Block volumes is working as expected.
> * Two things to look at:
> 1. Increasing the system memory should help device-mapper bring the mapper
> device faster, as a result OCP will get mapper device
I would like to understand why we would need to increase memory since we are running the test at recommended resources and customer running at this config would be hitting the same issue.

> 2. From https://github.com/openshift/origin/pull/21197 (Fix went as part of
> BZ#1596021) the wait for mapper device is still 10 sec (see patch
> https://github.com/openshift/origin/pull/21197/commits/
> c2058ab05a79c4fcbf1a0b9bd43ff5cc5ef7d04f#diff-
> 30894ca64f3fc736355610ef191d8bb7R413), But again IMHO this should be tuned
> to >30 sec.
> >>>
> 
> Again we are in the same page that, its just a resource crunch where device
> mapper multipath is not able to discover the paths within 10 seconds. But,
> thats what it is coded and its working as designed. 
> 
> If we are looking for a higher timeout I have to work with Kube community
> and try to convince them. 
> 
> For now, I dont see any BUG. IMO, either we have to document the behaviour
> or CLOSED/VERIFY this bug as working as expected/designed!

Comment 37 Humble Chirammal 2019-07-09 09:53:32 UTC
The fix is available in OCP 3.11 latest build. Please validate, I am moving to ON_QA .