.The `check_past_interval_bounds` uses the `max_oldest_map` to calculate the start interval
Previously, the oldest OSDMap which was used to calculate the past interval bounds was local to the OSD and not the `max_oldest_map` received with other peers instead. A specific OSD’s `oldest_map` can lag for a while behind the `max_oldest_map` across all peers. As a result, an assert would be triggered in `check_past_interval_bounds`.
With this fix, `check_past_interval_bounds` uses the `max_oldest_map` (renamed to `cluster_osdmap_trim_lower_bound`) to calculate the start interval. In addition, the option `osd_skip_check_past_interval_bounds` is introduced to allow OSDs to recover from this issue after applying the fix.
Description of problem:
After a power failure on 1 ceph node, the node was brought back up and now the OSDs are failing to start with the abort mgs: ceph_abort_msg("past_interval start interval mismatch").
Upstream tracker https://tracker.ceph.com/issues/49689
git issue: https://gist.github.com/Matan-B/ca564b6789f6ae6fc2ebc6a5b7e2aa69
Version-Release number of selected component (if applicable):
RHCS 5.3
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory (Moderate: Red Hat Ceph Storage 5.3 Security update), and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
https://access.redhat.com/errata/RHSA-2024:0745
Description of problem: After a power failure on 1 ceph node, the node was brought back up and now the OSDs are failing to start with the abort mgs: ceph_abort_msg("past_interval start interval mismatch"). Upstream tracker https://tracker.ceph.com/issues/49689 git issue: https://gist.github.com/Matan-B/ca564b6789f6ae6fc2ebc6a5b7e2aa69 Version-Release number of selected component (if applicable): RHCS 5.3