Bug 1848503

Summary: cephfs: Provide alternatives to increase the total cephfs subvolume snapshot counts to greater than the current 400 across a Cephfs volume
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Shyamsundar <srangana>
Component: CephFSAssignee: Yan, Zheng <zyan>
Status: CLOSED ERRATA QA Contact: Hemanth Kumar <hyelloji>
Severity: urgent Docs Contact: Aron Gunn <agunn>
Priority: high    
Version: 4.1CC: agunn, ceph-eng-bugs, hyelloji, madam, mchangir, ocs-bugs, pdonnell, rcyriac, sostapov, srangana, sweil, tserlin, vereddy, vpoliset, zyan
Target Milestone: z2Keywords: Performance
Target Release: 4.1   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: ceph-14.2.8-104.el8cp, ceph-14.2.8-104.el7cp Doc Type: Bug Fix
Doc Text:
.Improved Ceph File System performance as the number of subvolume snapshots increases Previously, creating more than 400 subvolume snapshots was degrading the Ceph File System performance by slowing down file system operations. With this release, you can configure subvolumes to only support subvolume snapshots at the subvolume root directory, and you can prevent cross-subvolume links and renames. Doing this allows for the creation of higher numbers of subvolume snapshots, and does not degrade the Ceph File System performance.
Story Points: ---
Clone Of:
: 1854503 (view as bug list) Environment:
Last Closed: 2020-09-30 17:25:42 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1816167, 1854503    

Description Shyamsundar 2020-06-18 13:17:17 UTC
Issue is originally discussed here: https://github.com/ceph/ceph-csi/issues/1133

Ceph upstream tracker for the same is here: https://tracker.ceph.com/issues/46074

Snip from github discussion:
"For cephfs volume, we only create snapshots at volume root. we can disable the special handling for inodes with multiple links. If the special handling is disabled, that can help avoiding the 400 snapshot per-file-system limit"

For subvolumes, the intention is that these are used in isolation with each other, and hence hard links across subvolumes should not be a concern. Given this, the handling as discussed above can be relaxed for subvolumes.

This bug is filed to track the bug/feature is backported to required Ceph versions that would ship with OCS 4.6.

OCS 4.6 has a snapshot limit requirement of 512 as per: https://issues.redhat.com/browse/KNIP-661

Comment 3 Yaniv Kaul 2020-07-15 17:27:12 UTC
Can we get QA_ACK?

Comment 8 Hemanth Kumar 2020-09-02 08:33:47 UTC
Hi Shyam,

As part of the fix, I was able to reach 1k snapshots of a subvol on regular intervals with IO's on. I dont see any issues on the filesystem so far and the IO's were on for atleast 2 days now.
So, Do we still have any limit on the snapshot ? so that I can reach the max limit and see if we hit any issues.

Comment 19 Hemanth Kumar 2020-09-09 08:44:47 UTC
Hi Shyam, 

I have tested creating snapshots on subvolume path instead of a directory and it all works fine. No issues seen.

Moving to verified.

Comment 26 errata-xmlrpc 2020-09-30 17:25:42 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat Ceph Storage 4.1 Bug Fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4144

Comment 27 Yaniv Kaul 2020-10-26 14:28:15 UTC
(In reply to Hemanth Kumar from comment #19)
> Hi Shyam, 
> 
> I have tested creating snapshots on subvolume path instead of a directory
> and it all works fine. No issues seen.
> 
> Moving to verified.

Was it tested at scale? How many snaps?