Bug 2247183 - [RFE] multisite sync observability: tracking sync deltas over time(in Prometheus)
Summary: [RFE] multisite sync observability: tracking sync deltas over time(in Prometh...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: RGW-Multisite
Version: 7.1
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 7.1
Assignee: Casey Bodley
QA Contact: Chaithra
URL:
Whiteboard:
: 2061627 (view as bug list)
Depends On:
Blocks: 2276340
TreeView+ depends on / blocked
 
Reported: 2023-10-31 07:14 UTC by daniel parkes
Modified: 2024-10-12 04:25 UTC (History)
15 users (show)

Fixed In Version: ceph-18.2.1-151.el9cp
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2024-06-13 14:22:42 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 2085451 1 None None None 2024-12-19 04:25:01 UTC
Red Hat Issue Tracker RHCEPH-7826 0 None None None 2023-10-31 07:15:12 UTC
Red Hat Product Errata RHSA-2024:3925 0 None None None 2024-06-13 14:22:50 UTC

Description daniel parkes 2023-10-31 07:14:25 UTC
Description of problem:

Currently, there is no easy way for an administrator to check the sync replication status between zones.  

Goal:

multisite sync observability: tracking sync deltas over time

Our proposed feature will increase the observability of the RGW multisite sync operations. It will provide administrators with real-time information about the replication health between zones. This will enable the admin to assess if the pending sync replication work is converging as expected or diverging. If it diverges and increases beyond a certain threshold, an alert can be configured in the alert manager to fire a warning.

To present this information to the user, we will use Prometheus to gather data and create a Grafana dashboard with data points representing the oldest incremental change not applied from the sync status command to populate the graph over time. 

The Grafana dashboard will display a slope to help us assess if the pending sync deltas are reducing or increasing over time. The ‘deltas’ will be sent from all the zones replicated in the zone group to Prometheus via the node-exporter.

Ideally, further down the line, we will be able to do similar work to have per-bucket granularity sync information in Prometheus so we can adhere to the bucket sync policy granularity that provides the user with a way to enable/disable bucket sync through the S3 API.

Comment 38 Tejas 2024-05-22 07:14:28 UTC
*** Bug 2061627 has been marked as a duplicate of this bug. ***

Comment 46 errata-xmlrpc 2024-06-13 14:22:42 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Critical: Red Hat Ceph Storage 7.1 security, enhancements, and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2024:3925

Comment 47 Red Hat Bugzilla 2024-10-12 04:25:12 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days


Note You need to log in before you can comment on or make changes to this bug.