Bug 2247183

Summary: [RFE] multisite sync observability: tracking sync deltas over time(in Prometheus)
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: daniel parkes <dparkes>
Component: RGW-MultisiteAssignee: Casey Bodley <cbodley>
Status: CLOSED ERRATA QA Contact: Chaithra <ckulal>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 7.1CC: aasharma, amaredia, anbehl, athakkar, cbodley, ceph-eng-bugs, cephqe-warriors, ckulal, jcaratza, mbenjamin, mkasturi, racpatel, smanjara, tserlin, vereddy
Target Milestone: ---Keywords: FutureFeature
Target Release: 7.1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ceph-18.2.1-151.el9cp Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2024-06-13 14:22:42 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2276340    

Description daniel parkes 2023-10-31 07:14:25 UTC
Description of problem:

Currently, there is no easy way for an administrator to check the sync replication status between zones.  

Goal:

multisite sync observability: tracking sync deltas over time

Our proposed feature will increase the observability of the RGW multisite sync operations. It will provide administrators with real-time information about the replication health between zones. This will enable the admin to assess if the pending sync replication work is converging as expected or diverging. If it diverges and increases beyond a certain threshold, an alert can be configured in the alert manager to fire a warning.

To present this information to the user, we will use Prometheus to gather data and create a Grafana dashboard with data points representing the oldest incremental change not applied from the sync status command to populate the graph over time. 

The Grafana dashboard will display a slope to help us assess if the pending sync deltas are reducing or increasing over time. The ‘deltas’ will be sent from all the zones replicated in the zone group to Prometheus via the node-exporter.

Ideally, further down the line, we will be able to do similar work to have per-bucket granularity sync information in Prometheus so we can adhere to the bucket sync policy granularity that provides the user with a way to enable/disable bucket sync through the S3 API.

Comment 38 Tejas 2024-05-22 07:14:28 UTC
*** Bug 2061627 has been marked as a duplicate of this bug. ***

Comment 46 errata-xmlrpc 2024-06-13 14:22:42 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Critical: Red Hat Ceph Storage 7.1 security, enhancements, and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2024:3925

Comment 47 Red Hat Bugzilla 2024-10-12 04:25:12 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days