Bug 1854503 - [tracker-rhcs-bug 1848503] cephfs: Provide alternatives to increase the total cephfs subvolume snapshot counts to greater than the current 400 across a Cephfs volume
Summary: [tracker-rhcs-bug 1848503] cephfs: Provide alternatives to increase the total...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenShift Container Storage
Classification: Red Hat Storage
Component: csi-driver
Version: 4.6
Hardware: All
OS: All
high
high
Target Milestone: ---
: OCS 4.6.0
Assignee: Humble Chirammal
QA Contact: Avi Liani
URL:
Whiteboard:
Depends On: 1848503
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-07-07 15:25 UTC by Humble Chirammal
Modified: 2020-12-17 06:23 UTC (History)
14 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of: 1848503
Environment:
Last Closed: 2020-12-17 06:22:31 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2020:5605 0 None None None 2020-12-17 06:23:46 UTC

Comment 4 Humble Chirammal 2020-09-02 16:40:36 UTC
Moving this to ON_QA as the tracker/parent bug has moved to ON_QA.

Comment 6 Avi Liani 2020-10-26 08:33:02 UTC
running the test few times got the same results, 100 Snapshot only can be created.
creation of the snapshot number 101 failed - it did not created and stay in the status of :

Spec:
  Source:
    Persistent Volume Claim Name:  pvc-test-8302e05ee60644b0a8691fa88e337d3c
  Volume Snapshot Class Name:      ocs-storagecluster-cephfsplugin-snapclass
Status:
  Bound Volume Snapshot Content Name:  snapcontent-753b5e90-462f-49da-b2ea-3df685394060
  Ready To Use:                        false
Events:
  Type    Reason            Age   From                 Message
  ----    ------            ----  ----                 -------
  Normal  CreatingSnapshot  11m   snapshot-controller  Waiting for a snapshot namespace-test-c45840f65bb74f2593db7b9d9b36d349/pvc-snap-101-8302e05ee60644b0a8691fa88e337d3c to be created by the CSI driver.


Tested on versions :

Driver versions
================

        OCP versions
        ==============

                clientVersion:
                  buildDate: "2020-10-08T07:17:21Z"
                  compiler: gc
                  gitCommit: 074039a0a9c137967fba3e667b9849d60e5054d8
                  gitTreeState: clean
                  gitVersion: openshift-clients-4.6.0-202006250705.p0-162-g074039a0a
                  goVersion: go1.15.0
                  major: ""
                  minor: ""
                  platform: linux/amd64
                openshiftVersion: 4.6.0-0.nightly-2020-10-22-034051
                releaseClientVersion: 4.6.0-0.nightly-2020-10-10-041109
                serverVersion:
                  buildDate: "2020-10-08T15:58:07Z"
                  compiler: gc
                  gitCommit: d59ce3486ae3ca3a0c36e5498e56f51594076596
                  gitTreeState: clean
                  gitVersion: v1.19.0+d59ce34
                  goVersion: go1.15.0
                  major: "1"
                  minor: "19"
                  platform: linux/amd64
                
                
                Cluster version:

                NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
                version   4.6.0-0.nightly-2020-10-22-034051   True        False         4d1h    Cluster version is 4.6.0-0.nightly-2020-10-22-034051
                
        OCS versions
        ==============

                NAME                         DISPLAY                       VERSION        REPLACES   PHASE
                ocs-operator.v4.6.0-141.ci   OpenShift Container Storage   4.6.0-141.ci              Succeeded
                
        Rook versions
        ===============

                rook: 4.6-67.afaf3353.release_4.6
                go: go1.15.0
                
        Ceph versions
        ===============

                ceph version 14.2.8-111.el8cp (2e6029d57bc594eceba4751373da6505028c2650) nautilus (stable)
                
        RHCOS versions
        ================

                NAME              STATUS   ROLES    AGE    VERSION           INTERNAL-IP    EXTERNAL-IP    OS-IMAGE                                                       KERNEL-VERSION                     CONTAINER-RUNTIME
                compute-0         Ready    worker   4d1h   v1.19.0+d59ce34   10.1.160.92    10.1.160.92    Red Hat Enterprise Linux CoreOS 46.82.202010091720-0 (Ootpa)   4.18.0-193.24.1.el8_2.dt1.x86_64   cri-o://1.19.0-20.rhaos4.6.git97d715e.el8
                compute-1         Ready    worker   4d1h   v1.19.0+d59ce34   10.1.160.105   10.1.160.105   Red Hat Enterprise Linux CoreOS 46.82.202010091720-0 (Ootpa)   4.18.0-193.24.1.el8_2.dt1.x86_64   cri-o://1.19.0-20.rhaos4.6.git97d715e.el8
                compute-2         Ready    worker   4d1h   v1.19.0+d59ce34   10.1.160.103   10.1.160.103   Red Hat Enterprise Linux CoreOS 46.82.202010091720-0 (Ootpa)   4.18.0-193.24.1.el8_2.dt1.x86_64   cri-o://1.19.0-20.rhaos4.6.git97d715e.el8
                control-plane-0   Ready    master   4d1h   v1.19.0+d59ce34   10.1.160.99    10.1.160.99    Red Hat Enterprise Linux CoreOS 46.82.202010091720-0 (Ootpa)   4.18.0-193.24.1.el8_2.dt1.x86_64   cri-o://1.19.0-20.rhaos4.6.git97d715e.el8
                control-plane-1   Ready    master   4d1h   v1.19.0+d59ce34   10.1.160.36    10.1.160.36    Red Hat Enterprise Linux CoreOS 46.82.202010091720-0 (Ootpa)   4.18.0-193.24.1.el8_2.dt1.x86_64   cri-o://1.19.0-20.rhaos4.6.git97d715e.el8
                control-plane-2   Ready    master   4d1h   v1.19.0+d59ce34   10.1.160.97    10.1.160.97    Red Hat Enterprise Linux CoreOS 46.82.202010091720-0 (Ootpa)   4.18.0-193.24.1.el8_2.dt1.x86_64   cri-o://1.19.0-20.rhaos4.6.git97d715e.el8

Collecting must-gather from this test and will update when it will be ready.

Comment 14 Mudit Agarwal 2020-10-27 11:12:59 UTC
I am inclined towards closing it, I don't think that there is anything for QE to verify because from ocs side we don't expose any such parameter which can manage snapshot count.
There is a doc bug already created for this https://bugzilla.redhat.com/show_bug.cgi?id=1891757 and we can update the same there.

But again, I am not sure that we should mention a ceph internal parameter in OCS documentation or not.

Closing it, please feel free to reopen (with steps to verify) if someone thinks otherwise.

Comment 15 krishnaram Karthick 2020-10-27 11:29:34 UTC
@Humble - Based on the above comments, it appears that the intention of this bug is to allow the creation of 512 snapshots in OCS. IOW, mds_max_snaps_per_dir value needs to be configured to 512 in OCS out of the box. if so, we still don't have this in. wdyt? 

p.s., The ask is to have 512 snapshots (https://issues.redhat.com/browse/KNIP-661), which by default is now 100.

Comment 16 Raz Tamir 2020-10-27 15:02:47 UTC
We @eran confirmation on this.

Eran, please take a look at comment 14 and 15 and ack/nack the limit of cephfs snaps to 100 only

Comment 24 Mudit Agarwal 2020-10-28 14:37:30 UTC
Thanks Eran.

Moving back to ON_QA, please test according to https://bugzilla.redhat.com/show_bug.cgi?id=1854503#c23 and update the same in the doc BZ opened to address the same.

Comment 25 Avi Liani 2020-10-29 06:35:41 UTC
According to https://bugzilla.redhat.com/show_bug.cgi?id=1854503#c23 this BZ verified.
i ran a test on RBD, and successful created 512 snapshots for RBD 

for CephFS see https://bugzilla.redhat.com/show_bug.cgi?id=1854503#c6 - 100 snapshots is the limit

version used for the RBD test are the same as in https://bugzilla.redhat.com/show_bug.cgi?id=1854503#c6

Comment 26 Mudit Agarwal 2020-10-29 09:09:12 UTC
Marking requires_doc_text as '-' because we already have a separate doc BZ to address the same.

Comment 28 errata-xmlrpc 2020-12-17 06:22:31 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat OpenShift Container Storage 4.6.0 security, bug fix, enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5605


Note You need to log in before you can comment on or make changes to this bug.