Bug 1854503

Summary: [tracker-rhcs-bug 1848503] cephfs: Provide alternatives to increase the total cephfs subvolume snapshot counts to greater than the current 400 across a Cephfs volume
Product: [Red Hat Storage] Red Hat OpenShift Container Storage Reporter: Humble Chirammal <hchiramm>
Component: csi-driverAssignee: Humble Chirammal <hchiramm>
Status: CLOSED ERRATA QA Contact: Avi Liani <alayani>
Severity: high Docs Contact:
Priority: high    
Version: 4.6CC: alayani, ceph-eng-bugs, etamir, kramdoss, madam, muagarwa, nberry, ocs-bugs, pdonnell, ratamir, rcyriac, sostapov, srangana, sweil
Target Milestone: ---Keywords: AutomationBackLog, Reopened, Tracking
Target Release: OCS 4.6.0   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: 1848503 Environment:
Last Closed: 2020-12-17 06:22:31 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1848503    
Bug Blocks:    

Comment 4 Humble Chirammal 2020-09-02 16:40:36 UTC
Moving this to ON_QA as the tracker/parent bug has moved to ON_QA.

Comment 6 Avi Liani 2020-10-26 08:33:02 UTC
running the test few times got the same results, 100 Snapshot only can be created.
creation of the snapshot number 101 failed - it did not created and stay in the status of :

Spec:
  Source:
    Persistent Volume Claim Name:  pvc-test-8302e05ee60644b0a8691fa88e337d3c
  Volume Snapshot Class Name:      ocs-storagecluster-cephfsplugin-snapclass
Status:
  Bound Volume Snapshot Content Name:  snapcontent-753b5e90-462f-49da-b2ea-3df685394060
  Ready To Use:                        false
Events:
  Type    Reason            Age   From                 Message
  ----    ------            ----  ----                 -------
  Normal  CreatingSnapshot  11m   snapshot-controller  Waiting for a snapshot namespace-test-c45840f65bb74f2593db7b9d9b36d349/pvc-snap-101-8302e05ee60644b0a8691fa88e337d3c to be created by the CSI driver.


Tested on versions :

Driver versions
================

        OCP versions
        ==============

                clientVersion:
                  buildDate: "2020-10-08T07:17:21Z"
                  compiler: gc
                  gitCommit: 074039a0a9c137967fba3e667b9849d60e5054d8
                  gitTreeState: clean
                  gitVersion: openshift-clients-4.6.0-202006250705.p0-162-g074039a0a
                  goVersion: go1.15.0
                  major: ""
                  minor: ""
                  platform: linux/amd64
                openshiftVersion: 4.6.0-0.nightly-2020-10-22-034051
                releaseClientVersion: 4.6.0-0.nightly-2020-10-10-041109
                serverVersion:
                  buildDate: "2020-10-08T15:58:07Z"
                  compiler: gc
                  gitCommit: d59ce3486ae3ca3a0c36e5498e56f51594076596
                  gitTreeState: clean
                  gitVersion: v1.19.0+d59ce34
                  goVersion: go1.15.0
                  major: "1"
                  minor: "19"
                  platform: linux/amd64
                
                
                Cluster version:

                NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
                version   4.6.0-0.nightly-2020-10-22-034051   True        False         4d1h    Cluster version is 4.6.0-0.nightly-2020-10-22-034051
                
        OCS versions
        ==============

                NAME                         DISPLAY                       VERSION        REPLACES   PHASE
                ocs-operator.v4.6.0-141.ci   OpenShift Container Storage   4.6.0-141.ci              Succeeded
                
        Rook versions
        ===============

                rook: 4.6-67.afaf3353.release_4.6
                go: go1.15.0
                
        Ceph versions
        ===============

                ceph version 14.2.8-111.el8cp (2e6029d57bc594eceba4751373da6505028c2650) nautilus (stable)
                
        RHCOS versions
        ================

                NAME              STATUS   ROLES    AGE    VERSION           INTERNAL-IP    EXTERNAL-IP    OS-IMAGE                                                       KERNEL-VERSION                     CONTAINER-RUNTIME
                compute-0         Ready    worker   4d1h   v1.19.0+d59ce34   10.1.160.92    10.1.160.92    Red Hat Enterprise Linux CoreOS 46.82.202010091720-0 (Ootpa)   4.18.0-193.24.1.el8_2.dt1.x86_64   cri-o://1.19.0-20.rhaos4.6.git97d715e.el8
                compute-1         Ready    worker   4d1h   v1.19.0+d59ce34   10.1.160.105   10.1.160.105   Red Hat Enterprise Linux CoreOS 46.82.202010091720-0 (Ootpa)   4.18.0-193.24.1.el8_2.dt1.x86_64   cri-o://1.19.0-20.rhaos4.6.git97d715e.el8
                compute-2         Ready    worker   4d1h   v1.19.0+d59ce34   10.1.160.103   10.1.160.103   Red Hat Enterprise Linux CoreOS 46.82.202010091720-0 (Ootpa)   4.18.0-193.24.1.el8_2.dt1.x86_64   cri-o://1.19.0-20.rhaos4.6.git97d715e.el8
                control-plane-0   Ready    master   4d1h   v1.19.0+d59ce34   10.1.160.99    10.1.160.99    Red Hat Enterprise Linux CoreOS 46.82.202010091720-0 (Ootpa)   4.18.0-193.24.1.el8_2.dt1.x86_64   cri-o://1.19.0-20.rhaos4.6.git97d715e.el8
                control-plane-1   Ready    master   4d1h   v1.19.0+d59ce34   10.1.160.36    10.1.160.36    Red Hat Enterprise Linux CoreOS 46.82.202010091720-0 (Ootpa)   4.18.0-193.24.1.el8_2.dt1.x86_64   cri-o://1.19.0-20.rhaos4.6.git97d715e.el8
                control-plane-2   Ready    master   4d1h   v1.19.0+d59ce34   10.1.160.97    10.1.160.97    Red Hat Enterprise Linux CoreOS 46.82.202010091720-0 (Ootpa)   4.18.0-193.24.1.el8_2.dt1.x86_64   cri-o://1.19.0-20.rhaos4.6.git97d715e.el8

Collecting must-gather from this test and will update when it will be ready.

Comment 14 Mudit Agarwal 2020-10-27 11:12:59 UTC
I am inclined towards closing it, I don't think that there is anything for QE to verify because from ocs side we don't expose any such parameter which can manage snapshot count.
There is a doc bug already created for this https://bugzilla.redhat.com/show_bug.cgi?id=1891757 and we can update the same there.

But again, I am not sure that we should mention a ceph internal parameter in OCS documentation or not.

Closing it, please feel free to reopen (with steps to verify) if someone thinks otherwise.

Comment 15 krishnaram Karthick 2020-10-27 11:29:34 UTC
@Humble - Based on the above comments, it appears that the intention of this bug is to allow the creation of 512 snapshots in OCS. IOW, mds_max_snaps_per_dir value needs to be configured to 512 in OCS out of the box. if so, we still don't have this in. wdyt? 

p.s., The ask is to have 512 snapshots (https://issues.redhat.com/browse/KNIP-661), which by default is now 100.

Comment 16 Raz Tamir 2020-10-27 15:02:47 UTC
We @eran confirmation on this.

Eran, please take a look at comment 14 and 15 and ack/nack the limit of cephfs snaps to 100 only

Comment 24 Mudit Agarwal 2020-10-28 14:37:30 UTC
Thanks Eran.

Moving back to ON_QA, please test according to https://bugzilla.redhat.com/show_bug.cgi?id=1854503#c23 and update the same in the doc BZ opened to address the same.

Comment 25 Avi Liani 2020-10-29 06:35:41 UTC
According to https://bugzilla.redhat.com/show_bug.cgi?id=1854503#c23 this BZ verified.
i ran a test on RBD, and successful created 512 snapshots for RBD 

for CephFS see https://bugzilla.redhat.com/show_bug.cgi?id=1854503#c6 - 100 snapshots is the limit

version used for the RBD test are the same as in https://bugzilla.redhat.com/show_bug.cgi?id=1854503#c6

Comment 26 Mudit Agarwal 2020-10-29 09:09:12 UTC
Marking requires_doc_text as '-' because we already have a separate doc BZ to address the same.

Comment 28 errata-xmlrpc 2020-12-17 06:22:31 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat OpenShift Container Storage 4.6.0 security, bug fix, enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5605