Bug 1671364
| Summary: | In app pods, mpath device name is not mapped/created for some blockvolumes in the initiator side | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Sri Vignesh Selvan <sselvan> |
| Component: | kubernetes | Assignee: | Humble Chirammal <hchiramm> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Sri Vignesh Selvan <sselvan> |
| Severity: | medium | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | ocs-3.11 | CC: | asriram, hchiramm, knarra, kramdoss, madam, pasik, pkarampu, pprakash, prasanna.kalever, puebele, rcyriac, rhs-bugs, rtalur, sselvan, vbellur, xiubli |
| Target Milestone: | --- | Keywords: | ZStream |
| Target Release: | OCS 3.11.z Batch Update 4 | Flags: | hchiramm:
needinfo-
knarra: needinfo- |
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | >= atomic-openshift-3.11.123-1 | Doc Type: | Known Issue |
| Doc Text: |
When an application pod is created with multipath configuration and a gluster-block PVC, the iSCSI driver waits for 10 seconds for the storage path discovery. However, device discovery can be delayed when the system is under load or similar conditions. When only one path is discovered within 10 seconds, this path is used to mount the volume to the application pod. This means that failover to a different path does not work even when multiple paths were configured.
To work around this issue, delete and reschedule the pod; there is a high possibility that, the new pod mounts the volume with all the paths.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2020-02-13 05:21:52 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 1669979 | ||
|
Comment 4
Prasanna Kumar Kalever
2019-02-07 10:28:06 UTC
IMO, There is nothing that can be done at gluster-block layer. Request Humble and team to provide some insight. Request Vignesh to attach the sos-reports. Acked for 3.11.3 @Humble, @Vignesh Please see comment 6, for complete analysis summary. Thanks! >>> * Block volumes is working as expected. * Two things to look at: 1. Increasing the system memory should help device-mapper bring the mapper device faster, as a result OCP will get mapper device 2. From https://github.com/openshift/origin/pull/21197 (Fix went as part of BZ#1596021) the wait for mapper device is still 10 sec (see patch https://github.com/openshift/origin/pull/21197/commits/c2058ab05a79c4fcbf1a0b9bd43ff5cc5ef7d04f#diff-30894ca64f3fc736355610ef191d8bb7R413), But again IMHO this should be tuned to >30 sec. >>> Again we are in the same page that, its just a resource crunch where device mapper multipath is not able to discover the paths within 10 seconds. But, thats what it is coded and its working as designed. If we are looking for a higher timeout I have to work with Kube community and try to convince them. For now, I dont see any BUG. IMO, either we have to document the behaviour or CLOSED/VERIFY this bug as working as expected/designed! (In reply to Humble Chirammal from comment #23) > >>> > > * Block volumes is working as expected. > * Two things to look at: > 1. Increasing the system memory should help device-mapper bring the mapper > device faster, as a result OCP will get mapper device I would like to understand why we would need to increase memory since we are running the test at recommended resources and customer running at this config would be hitting the same issue. > 2. From https://github.com/openshift/origin/pull/21197 (Fix went as part of > BZ#1596021) the wait for mapper device is still 10 sec (see patch > https://github.com/openshift/origin/pull/21197/commits/ > c2058ab05a79c4fcbf1a0b9bd43ff5cc5ef7d04f#diff- > 30894ca64f3fc736355610ef191d8bb7R413), But again IMHO this should be tuned > to >30 sec. > >>> > > Again we are in the same page that, its just a resource crunch where device > mapper multipath is not able to discover the paths within 10 seconds. But, > thats what it is coded and its working as designed. > > If we are looking for a higher timeout I have to work with Kube community > and try to convince them. > > For now, I dont see any BUG. IMO, either we have to document the behaviour > or CLOSED/VERIFY this bug as working as expected/designed! The fix is available in OCP 3.11 latest build. Please validate, I am moving to ON_QA . |