Bug 2227807 - snap-schedule: allow retention spec to specify max number of snaps to retain
Summary: snap-schedule: allow retention spec to specify max number of snaps to retain
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: CephFS
Version: 6.1
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 6.1z3
Assignee: Milind Changire
QA Contact: sumr
Disha Walvekar
URL:
Whiteboard:
Depends On:
Blocks: 2247624
TreeView+ depends on / blocked
 
Reported: 2023-07-31 14:48 UTC by Milind Changire
Modified: 2023-12-12 13:55 UTC (History)
6 users (show)

Fixed In Version: ceph-17.2.6-155.el9cp
Doc Type: Release Note
Doc Text:
The snap-schedule module now supports a new retention spec to retain user defined number of snapshots e.g. 50n In this example a user has specified at most 50 snapshots to retain irrespective of the snapshot creation cadence. Since the snapshot pruning happens after a new snapshot is created, the actual number of snapshots retained is 1 less than the max specified. So in this case there will be 49 snapshots retained so that there's a margin of 1 snapshot that can be created on the file system on the next iteration to avoid breaching the system configured limit of mds_max_snaps_per_dir. Users should be mindful of the configuration of mds_max_snaps_per_dir and snapshot scheduling limits to avoid unintentional deactivation of snapshot schedules due to the file system returning a "Too many links" error if the mds_max_snaps_per_dir limit is breached.
Clone Of: 2227806
Environment:
Last Closed: 2023-12-12 13:55:39 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Ceph Project Bug Tracker 59582 0 None None None 2023-11-08 10:12:48 UTC
Red Hat Issue Tracker RHCEPH-7111 0 None None None 2023-07-31 14:49:53 UTC
Red Hat Product Errata RHSA-2023:7740 0 None None None 2023-12-12 13:55:43 UTC

Description Milind Changire 2023-07-31 14:48:37 UTC
+++ This bug was initially created as a clone of Bug #2227806 +++

Description of problem:
Along with daily, weekly, monthly and yearly snaps, users also need a way to mention the max number of snaps they need to retain if they feel that MAX_SNAPS_PER_PATH (50) is insufficient for their purpose.

eg. the new snap-schedule with the retention spec could be: /PATH 1d1m 75n

where a max of 75 snaps from the 1d1m snap-schedule will be retained. If number of snaps ('n') are not specified as the retention spec then the default of MAX_SNAPS_PER_PATH (50) should be applicable.

NOTE: the max number of snaps possible are also a function of the system-wide config named mds_max_snaps_per_dir, which currently defaults to 100

So, if the number of snaps for a path/dir that need to be created exceeds 100, then a user would first need to tweak the value for mds_max_snaps_per_dir before updating the snap-schedule retention spec beyond 100.

NOTE: Since mds_max_snaps_per_dir is a run-time system-wide spec, any change to that config will immediately affect existing snap-schedule retention specs.



Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

--- Additional comment from RHEL Program Management on 2023-07-31 20:15:26 IST ---

Please specify the severity of this bug. Severity is defined here:
https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_severity.

Comment 1 RHEL Program Management 2023-07-31 14:48:48 UTC
Please specify the severity of this bug. Severity is defined here:
https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_severity.

Comment 10 sumr 2023-11-09 17:42:46 UTC
QA Test Plan:

1. Create snap-schedule with value as 1m1h1d and retention spec as 10n and apply to cephfs volume path /
2. Run IO continuously
3. Verify 10 snaps are retained after 11mins
4. Set retention spec as 60n and verify 60 snaps are retained after 65mins.
5. Set retention spec as 101n and verify error generated for count created greater than config max_snaps_per_dir default value 100.
6. Modify config max_snaps_per_dir value to 102, and apply retention spec as 102n, verify 102 snaps retained after 105mins
7. Remove retention spec, verify 50 snaps are retained after 55mins.
8. Modify config max_snaps_per_path value to 60, verify 60 snaps are retained after next 62mins.

Hi Milind,

Please review QA Test plan and share your inputs.

Comment 20 sumr 2023-11-15 07:06:35 UTC
Milind,

Noting the improvements that needs to be addressed for end-user's convenience. As discussed we will have different BZ to track these improvements.

If retention count being set by user is higher than system default(100 -> mds_max_snaps_per_dir), there should be a cmd error stating,
"Value is higher that system limit, for the required retention count to work, first 'mds_max_snaps_per_dir' value needs to be modified to desired value using cmd 'ceph config set mds_max_snaps_per_dir <desired_value>' and then apply the retention count in snapshot policy."

Also, we need to highlight within error message or in snapshot-schedule documentation that the number of snapshots being created will be 1 less that desired value if count is > 100

Because, currently we have an MGR exception in above scenario, causing snap-schedule to get deactivated. As there was no error message when retention count was set > 100, user may not know snap-schedule has been deactivated, thus losing snapshots.

Marking this BZ as verified, as other test steps worked as expected and raising new BZ to track improvements.

Comment 24 errata-xmlrpc 2023-12-12 13:55:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat Ceph Storage 6.1 security, enhancements, and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:7740


Note You need to log in before you can comment on or make changes to this bug.