Delay in detecting a successful CSI mount or incorrect report of failed mount, by kubelet for Ceph CSI based volume (due to the recursive fsGroup setting)
The time taken to attach still seems higher (or at par) without the fix.
The time taken prior to the fix and post are as follows,
CSI Provider file Pre-fix Post-fix
count time time
csi-ceph-rbd 9500 83s 78s
37300 116s 146s
csi-ebs-gp2 9500 85s 79s
37300 111s 131s
### Tested with the following versions:
$ oc version
Client Version: v4.2.0-alpha.0-90-g420c3d6
Server Version: 4.2.0-0.ci-2019-11-30-132200
Kubernetes Version: v1.14.6+a9e953f
### On checking the page https://openshift-release.svc.ci.openshift.org/releasestream/4.2.0-0.ci/release/4.2.0-0.ci-2019-11-30-132200 I see the bug listed under the list of fixes.
--------------------------------------------------------------------------------
I had tested the patch by building out my own hyperkube post applying the patch and using it in my test cluster. That build showed improvements like the data in https://bugzilla.redhat.com/show_bug.cgi?id=1745773#c28
The patch was applied on top the following commit:
commit ff885f256566a93dc2e42cc40b34bb9a9ca0ffa8
Author: Fabio Bertinatto <fbertina>
Date: Thu Nov 7 10:22:18 2019 +0100
UPSTREAM: 83747: Improve efficiency of csiMountMgr.GetAttributes
commit ccf0c2733f7f5cc76e220c1fe9bea909593512f1
Merge: 0517d42 99af4b7
Author: OpenShift Merge Robot <openshift-merge-robot.github.com>
Date: Mon Nov 4 06:23:03 2019 +0100
Merge pull request #23910 from p0lyn0mial/fix-ns-conditions-integration-test-4-2
Bug 1766365: TestNamespaceCondition integration test fails
--------------------------------------------------------------------------------
I may try again today or tomorrow picking up the latest CI builds and rerunning my tests, but any clarification on which build the fix is present from would also help in validating the fix.
Retested with the following versions, and the expected performance gains are noted.
$ oc version
Client Version: v4.2.0-alpha.0-92-g57a203c
Server Version: 4.2.0-0.ci-2019-12-03-064353
Kubernetes Version: v1.14.6+a9e953f
Attach speed-up information:
CSI Provider file Pre-fix Post-fix
count time time
csi-ceph-rbd 9500 83s 12s
37300 116s 31s
csi-ebs-gp2 9500 85s 13s
37300 111s 19s
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
https://access.redhat.com/errata/RHBA-2019:4093