Bug 2227807

Summary: snap-schedule: allow retention spec to specify max number of snaps to retain
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Milind Changire <mchangir>
Component: CephFSAssignee: Milind Changire <mchangir>
Status: CLOSED ERRATA QA Contact: sumr
Severity: high Docs Contact: Disha Walvekar <dwalveka>
Priority: unspecified    
Version: 6.1CC: ceph-eng-bugs, cephqe-warriors, dwalveka, hyelloji, tserlin, vshankar
Target Milestone: ---   
Target Release: 6.1z3   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ceph-17.2.6-155.el9cp Doc Type: Release Note
Doc Text:
The snap-schedule module now supports a new retention spec to retain user defined number of snapshots e.g. 50n In this example a user has specified at most 50 snapshots to retain irrespective of the snapshot creation cadence. Since the snapshot pruning happens after a new snapshot is created, the actual number of snapshots retained is 1 less than the max specified. So in this case there will be 49 snapshots retained so that there's a margin of 1 snapshot that can be created on the file system on the next iteration to avoid breaching the system configured limit of mds_max_snaps_per_dir. Users should be mindful of the configuration of mds_max_snaps_per_dir and snapshot scheduling limits to avoid unintentional deactivation of snapshot schedules due to the file system returning a "Too many links" error if the mds_max_snaps_per_dir limit is breached.
Story Points: ---
Clone Of: 2227806 Environment:
Last Closed: 2023-12-12 13:55:39 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2247624    

Description Milind Changire 2023-07-31 14:48:37 UTC
+++ This bug was initially created as a clone of Bug #2227806 +++

Description of problem:
Along with daily, weekly, monthly and yearly snaps, users also need a way to mention the max number of snaps they need to retain if they feel that MAX_SNAPS_PER_PATH (50) is insufficient for their purpose.

eg. the new snap-schedule with the retention spec could be: /PATH 1d1m 75n

where a max of 75 snaps from the 1d1m snap-schedule will be retained. If number of snaps ('n') are not specified as the retention spec then the default of MAX_SNAPS_PER_PATH (50) should be applicable.

NOTE: the max number of snaps possible are also a function of the system-wide config named mds_max_snaps_per_dir, which currently defaults to 100

So, if the number of snaps for a path/dir that need to be created exceeds 100, then a user would first need to tweak the value for mds_max_snaps_per_dir before updating the snap-schedule retention spec beyond 100.

NOTE: Since mds_max_snaps_per_dir is a run-time system-wide spec, any change to that config will immediately affect existing snap-schedule retention specs.



Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

--- Additional comment from RHEL Program Management on 2023-07-31 20:15:26 IST ---

Please specify the severity of this bug. Severity is defined here:
https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_severity.

Comment 1 RHEL Program Management 2023-07-31 14:48:48 UTC
Please specify the severity of this bug. Severity is defined here:
https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_severity.

Comment 10 sumr 2023-11-09 17:42:46 UTC
QA Test Plan:

1. Create snap-schedule with value as 1m1h1d and retention spec as 10n and apply to cephfs volume path /
2. Run IO continuously
3. Verify 10 snaps are retained after 11mins
4. Set retention spec as 60n and verify 60 snaps are retained after 65mins.
5. Set retention spec as 101n and verify error generated for count created greater than config max_snaps_per_dir default value 100.
6. Modify config max_snaps_per_dir value to 102, and apply retention spec as 102n, verify 102 snaps retained after 105mins
7. Remove retention spec, verify 50 snaps are retained after 55mins.
8. Modify config max_snaps_per_path value to 60, verify 60 snaps are retained after next 62mins.

Hi Milind,

Please review QA Test plan and share your inputs.

Comment 20 sumr 2023-11-15 07:06:35 UTC
Milind,

Noting the improvements that needs to be addressed for end-user's convenience. As discussed we will have different BZ to track these improvements.

If retention count being set by user is higher than system default(100 -> mds_max_snaps_per_dir), there should be a cmd error stating,
"Value is higher that system limit, for the required retention count to work, first 'mds_max_snaps_per_dir' value needs to be modified to desired value using cmd 'ceph config set mds_max_snaps_per_dir <desired_value>' and then apply the retention count in snapshot policy."

Also, we need to highlight within error message or in snapshot-schedule documentation that the number of snapshots being created will be 1 less that desired value if count is > 100

Because, currently we have an MGR exception in above scenario, causing snap-schedule to get deactivated. As there was no error message when retention count was set > 100, user may not know snap-schedule has been deactivated, thus losing snapshots.

Marking this BZ as verified, as other test steps worked as expected and raising new BZ to track improvements.

Comment 24 errata-xmlrpc 2023-12-12 13:55:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat Ceph Storage 6.1 security, enhancements, and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:7740