2146546 – [RFE] Refactor RBD mirror metrics to use new labeled performance counter

Bug 2146546 - [RFE] Refactor RBD mirror metrics to use new labeled performance counter

Summary: [RFE] Refactor RBD mirror metrics to use new labeled performance counter

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Ceph Storage
Classification:	Red Hat Storage
Component:	RBD-Mirror
Sub Component:
Version:	6.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	6.1
Assignee:	Divyansh Kamboj
QA Contact:	Sunil Angadi
Docs Contact:	Akash Raj
URL:
Whiteboard:
Duplicates (1):	2145593 (view as bug list)
Depends On:
Blocks:	2025815 2145593 2146544 2192813
TreeView+	depends on / blocked

Reported:	2022-11-23 17:52 UTC by Juan Miguel Olmo
Modified:	2023-10-20 04:25 UTC (History)
CC List:	9 users (show)
Fixed In Version:
Doc Type:	Enhancement
Doc Text:	.Renaming and adding performance counters is supported With this release, you can rename and add performance counters in `rbd-mirror` daemon to improve clarity and provide detailed information about snapshot synchronization between source and destination clusters. The renamed existing snapshot and journal-based performance counters added new performance counters and switched to using labels instead of image specification in counter names.
Clone Of:
Environment:
Last Closed:	2023-06-15 09:16:25 UTC
Embargoed:
Dependent Products:
Flags:	sangadi: needinfo+ sangadi: needinfo+

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Red Hat Issue Tracker	RHCEPH-5680	None	None	None	2022-11-23 18:01:31 UTC
Red Hat Issue Tracker	RHCEPH-5724	None	None	None	2023-01-25 11:25:10 UTC
Red Hat Issue Tracker	RHSTOR-3886	None	None	None	2022-11-23 17:52:35 UTC
Red Hat Product Errata	RHSA-2023:3623	None	None	None	2023-06-15 09:17:10 UTC

Internal Links: 2025815

Description Juan Miguel Olmo 2022-11-23 17:52:35 UTC

As monitoring user of RBD mirror daemon I need to have perf counters where context attributes will not be part of the performance counter name. This will allow to transorm these performance counter to prometheus metrics using the usual format.

Example:

rbd_mirror_snapshot_image_<image_spec>snapshots*

to:

ceph_rbd_mirror_snapshot_image_snapshot_*(pool: px, image: imgx)

Comment 1 RHEL Program Management 2022-11-23 17:52:47 UTC

Please specify the severity of this bug. Severity is defined here:
https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_severity.

Comment 13 Divyansh Kamboj 2023-04-18 10:25:30 UTC

can you try and run the admin socket command in the directory, instead of providing the path? looks like your path is longer than the system defined limit.
also make sure that you're running the command on the secondary cluster to see the values.

Comment 15 Divyansh Kamboj 2023-04-18 11:04:09 UTC

i think the socket file path length is an os limit, i don't think we need to file a BZ for that

Comment 18 Divyansh Kamboj 2023-04-26 09:13:02 UTC

> Is this overall pool (Total number of pools) mirror usage for snapshot based?

These values are afaik agnostic of pools, as this is the global data of all the snapshots that daemon has handled. `perf dump` will not show any labels

> If we get these metrics using pool wise, wouldn't it be more helpful?

Right now there's only 2 levels in which snapshot based mirroring reports perf counters, one is on the global daemon level, another is per image based(with the labels). If we want pool based data, we can use the per image counter to compile it.

> why these metrics are not available from primary?

These are not available on the primary, as the snapshots on the primary get handled by the rbd-mirror daemon on the secondary cluster. Thus we are only able to generate counters on the secondary side. This data also gets written on mirror image status, so we can observe this data on the primary cluster using that command.

> Labels field is empty here, what value we should expect here and which scenario?

The counter dump command you've posted is for rbd_mirror_snapshot, which is counters for all the snapshots being processed by that daemon (global). The labels are on the per image based counter `rbd_mirror_snapshot_image`. And will look something like this, 

```
    "rbd_mirror_snapshot_image": {
        "labels": {
            "image": "image1",
            "namespace": "",
            "pool": "data"
        },
        "counters": {
            "snapshots": 106,
            "sync_time": {
                "avgcount": 106,
                "sum": 8.157710400,
                "avgtime": 0.076959532
            },
            "sync_bytes": 524288000,
            "remote_timestamp": 1682500200.777647685,
            "local_timestamp": 1682500200.777647685,
            "last_sync_time": 0.003713542,
            "last_sync_bytes": 0
        }
    }```

can you check and see if you have that counter working for you? you can confirm the values using mirror image status command to verify if the values are correct for the images.

> both are having same values, looks like an issue for me.

I'm not sure I see the same values in the output you've posted, can you point out where the values are same, I might've missed it.

Comment 22 Ilya Dryomov 2023-05-16 16:06:36 UTC

*** Bug 2145593 has been marked as a duplicate of this bug. ***

Comment 24 errata-xmlrpc 2023-06-15 09:16:25 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat Ceph Storage 6.1 security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:3623

Comment 25 Red Hat Bugzilla 2023-10-20 04:25:04 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days

Note You need to log in before you can comment on or make changes to this bug.