vignesh, ping!!
IMO, There is nothing that can be done at gluster-block layer. Request Humble and team to provide some insight. Request Vignesh to attach the sos-reports.
Acked for 3.11.3
@Humble, @Vignesh Please see comment 6, for complete analysis summary. Thanks!
>>> * Block volumes is working as expected. * Two things to look at: 1. Increasing the system memory should help device-mapper bring the mapper device faster, as a result OCP will get mapper device 2. From https://github.com/openshift/origin/pull/21197 (Fix went as part of BZ#1596021) the wait for mapper device is still 10 sec (see patch https://github.com/openshift/origin/pull/21197/commits/c2058ab05a79c4fcbf1a0b9bd43ff5cc5ef7d04f#diff-30894ca64f3fc736355610ef191d8bb7R413), But again IMHO this should be tuned to >30 sec. >>> Again we are in the same page that, its just a resource crunch where device mapper multipath is not able to discover the paths within 10 seconds. But, thats what it is coded and its working as designed. If we are looking for a higher timeout I have to work with Kube community and try to convince them. For now, I dont see any BUG. IMO, either we have to document the behaviour or CLOSED/VERIFY this bug as working as expected/designed!
(In reply to Humble Chirammal from comment #23) > >>> > > * Block volumes is working as expected. > * Two things to look at: > 1. Increasing the system memory should help device-mapper bring the mapper > device faster, as a result OCP will get mapper device I would like to understand why we would need to increase memory since we are running the test at recommended resources and customer running at this config would be hitting the same issue. > 2. From https://github.com/openshift/origin/pull/21197 (Fix went as part of > BZ#1596021) the wait for mapper device is still 10 sec (see patch > https://github.com/openshift/origin/pull/21197/commits/ > c2058ab05a79c4fcbf1a0b9bd43ff5cc5ef7d04f#diff- > 30894ca64f3fc736355610ef191d8bb7R413), But again IMHO this should be tuned > to >30 sec. > >>> > > Again we are in the same page that, its just a resource crunch where device > mapper multipath is not able to discover the paths within 10 seconds. But, > thats what it is coded and its working as designed. > > If we are looking for a higher timeout I have to work with Kube community > and try to convince them. > > For now, I dont see any BUG. IMO, either we have to document the behaviour > or CLOSED/VERIFY this bug as working as expected/designed!
The fix is available in OCP 3.11 latest build. Please validate, I am moving to ON_QA .