Bug 1762658

Summary: Delay in detecting a successful CSI mount or incorrect report of failed mount, by kubelet for Ceph CSI based volume (due to the recursive fsGroup setting)
Product: OpenShift Container Platform Reporter: Fabio Bertinatto <fbertina>
Component: StorageAssignee: Fabio Bertinatto <fbertina>
Status: CLOSED ERRATA QA Contact: Chao Yang <chaoyang>
Severity: medium Docs Contact:
Priority: high    
Version: 4.2.0CC: aos-bugs, aos-storage-staff, asakala, bbennett, chaoyang, fbertina, hchiramm, jstrunk, kramdoss, mloriedo, mrajanna, srangana, ykaul
Target Milestone: ---Keywords: Performance
Target Release: 4.2.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: ocs-monkey
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1745773 Environment:
Last Closed: 2019-12-11 22:36:06 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1745773    
Bug Blocks:    

Comment 9 Shyamsundar 2019-12-03 15:24:52 UTC
The time taken to attach still seems higher (or at par) without the fix.

The time taken prior to the fix and post are as follows,
CSI Provider  file   Pre-fix Post-fix
              count  time    time
csi-ceph-rbd  9500   83s      78s
              37300  116s     146s
csi-ebs-gp2   9500   85s      79s
              37300  111s     131s

### Tested with the following versions:

$ oc version
Client Version: v4.2.0-alpha.0-90-g420c3d6
Server Version: 4.2.0-0.ci-2019-11-30-132200
Kubernetes Version: v1.14.6+a9e953f

### On checking the page https://openshift-release.svc.ci.openshift.org/releasestream/4.2.0-0.ci/release/4.2.0-0.ci-2019-11-30-132200 I see the bug listed under the list of fixes.

--------------------------------------------------------------------------------
I had tested the patch by building out my own hyperkube post applying the patch and using it in my test cluster. That build showed improvements like the data in https://bugzilla.redhat.com/show_bug.cgi?id=1745773#c28

The patch was applied on top the following commit:
commit ff885f256566a93dc2e42cc40b34bb9a9ca0ffa8
Author: Fabio Bertinatto <fbertina>
Date:   Thu Nov 7 10:22:18 2019 +0100

    UPSTREAM: 83747: Improve efficiency of csiMountMgr.GetAttributes

commit ccf0c2733f7f5cc76e220c1fe9bea909593512f1
Merge: 0517d42 99af4b7
Author: OpenShift Merge Robot <openshift-merge-robot.github.com>
Date:   Mon Nov 4 06:23:03 2019 +0100

    Merge pull request #23910 from p0lyn0mial/fix-ns-conditions-integration-test-4-2
    
    Bug 1766365: TestNamespaceCondition integration test fails

--------------------------------------------------------------------------------

I may try again today or tomorrow picking up the latest CI builds and rerunning my tests, but any clarification on which build the fix is present from would also help in validating the fix.

Comment 10 Shyamsundar 2019-12-03 17:33:19 UTC
Retested with the following versions, and the expected performance gains are noted.

$ oc version
Client Version: v4.2.0-alpha.0-92-g57a203c
Server Version: 4.2.0-0.ci-2019-12-03-064353
Kubernetes Version: v1.14.6+a9e953f

Attach speed-up information:
CSI Provider  file   Pre-fix Post-fix
              count  time    time
csi-ceph-rbd  9500   83s      12s
              37300  116s     31s
csi-ebs-gp2   9500   85s      13s
              37300  111s     19s

Comment 12 Chao Yang 2019-12-06 01:40:11 UTC
Update the status based on above comments

Comment 14 errata-xmlrpc 2019-12-11 22:36:06 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:4093