Bug 2256899

Summary: Duplicate metrics in ocs-metrics-exporter
Product: [Red Hat Storage] Red Hat OpenShift Data Foundation Reporter: Divyansh Kamboj <dkamboj>
Component: ceph-monitoringAssignee: Divyansh Kamboj <dkamboj>
Status: CLOSED ERRATA QA Contact: Filip Balák <fbalak>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.14CC: ebenahar, muagarwa, nthomas, odf-bz-bot
Target Milestone: ---   
Target Release: ODF 4.16.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: 4.15.0-123 Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2024-07-17 13:11:53 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Divyansh Kamboj 2024-01-05 04:50:45 UTC
A lot of metrics exported by ocs-metrics-exporter are also provided by the ceph-exporter.

eg:

ocs_rbd_mirror_image_primary_snapshot_timestamp and ceph_rbd_mirror_snapshot_image_local_timestamp provide the same information

Comment 6 Filip Balák 2024-02-06 12:35:58 UTC
After enabling rbd mirroring with command:

oc patch StorageCluster ocs-storagecluster -n openshift-storage --type json --patch '[{ 'op': 'replace', 'path': '/spec/mirroring', 'value': {'enabled': true} }]'

I see new metrics being added but I don't see required metrics from the PR added description added (https://github.com/red-hat-storage/ocs-operator/pull/2380):

ceph_rbd_mirror_snapshot_image_local_timestamp
ceph_rbd_mirror_snapshot_image_remote_timestamp
ceph_rbd_mirror_snapshot_image_last_sync_bytes

Are they changed in odf 4.15? How to reproduce the system state in which those metrics are available?

Tested with odf 4.15.0-134

Comment 7 Filip Balák 2024-02-20 18:16:06 UTC
Metrics ceph_rbd_mirror_snapshot_image_local_timestamp, ceph_rbd_mirror_snapshot_image_remote_timestamp, and ceph_rbd_mirror_snapshot_image_last_sync_bytes are not available after rbd mirroring is enabled. --> ASSIGNED

Tested with odf 4.15.0-146

Comment 8 Divyansh Kamboj 2024-02-21 12:58:34 UTC
The metrics only show up when images are created and start syncing with the other cluster. moving it back to QA after discussion with Filip

Comment 9 Filip Balák 2024-03-04 10:18:34 UTC
Metrics ceph_rbd_mirror_snapshot_image_local_timestamp, ceph_rbd_mirror_snapshot_image_remote_timestamp, and ceph_rbd_mirror_snapshot_image_last_sync_bytes are not available after rbd mirroring is enabled and syncing between 2 clusters starts in Regional DR setup. --> ASSIGNED

There have been ran workload on synced clusters but those metrics never appeared. If this is not sufficient to reproduce then please provide a reproducer.

Tested with odf 4.15.0-150

$ oc get cephblockpool ocs-storagecluster-cephblockpool -n openshift-storage -o json
{
    "apiVersion": "ceph.rook.io/v1",
    "kind": "CephBlockPool",
    "metadata": {
        "creationTimestamp": "2024-03-01T12:07:44Z",
        "finalizers": [
            "cephblockpool.ceph.rook.io"
        ],
        "generation": 2,
        "name": "ocs-storagecluster-cephblockpool",
        "namespace": "openshift-storage",
        "ownerReferences": [
            {
                "apiVersion": "ocs.openshift.io/v1",
                "blockOwnerDeletion": true,
                "controller": true,
                "kind": "StorageCluster",
                "name": "ocs-storagecluster",
                "uid": "79fc34f3-e3bf-450e-8375-9b654f04c58b"
            }
        ],
        "resourceVersion": "3653985",
        "uid": "77caacc8-30ff-494c-8daf-996797da41c5"
    },
    "spec": {
        "enableRBDStats": true,
        "erasureCoded": {
            "codingChunks": 0,
            "dataChunks": 0
        },
        "failureDomain": "rack",
        "mirroring": {
            "enabled": true,
            "mode": "image",
            "peers": {
                "secretNames": [
                    "d2ea70d369f4dbd0ba3120d8033d016ec672f3a"
                ]
            }
        },
        "quotas": {},
        "replicated": {
            "replicasPerFailureDomain": 1,
            "size": 3,
            "targetSizeRatio": 0.49
        },
        "statusCheck": {
            "mirror": {}
        }
    },
    "status": {
        "info": {
            "rbdMirrorBootstrapPeerSecretName": "pool-peer-token-ocs-storagecluster-cephblockpool"
        },
        "mirroringInfo": {
            "lastChanged": "2024-03-04T10:09:12Z",
            "lastChecked": "2024-03-04T10:10:12Z",
            "mode": "image",
            "peers": [
                {
                    "client_name": "client.rbd-mirror-peer",
                    "direction": "rx-tx",
                    "mirror_uuid": "8277bd89-9f38-4356-a9cd-482a473eba2b",
                    "site_name": "4ee435a6-8a04-4b1c-9fc8-a131945a0f18",
                    "uuid": "d96d875e-2636-4d4c-befe-5235e6254060"
                }
            ],
            "site_name": "06eb0bad-f9b7-4c40-ba60-a2e87418dbf5"
        },
        "mirroringStatus": {
            "lastChecked": "2024-03-04T10:10:12Z",
            "summary": {
                "daemon_health": "OK",
                "health": "OK",
                "image_health": "OK",
                "states": {
                    "replaying": 1
                }
            }
        },
        "observedGeneration": 2,
        "phase": "Ready",
        "snapshotScheduleStatus": {}
    }
}

Comment 10 Mudit Agarwal 2024-03-04 10:28:38 UTC
Not a 4.15.0 blocker

Comment 15 Divyansh Kamboj 2024-04-03 10:51:15 UTC
@fbalak i tested it out on a 4.15 cluster, and could see the values. i will give it a go again, to see if i face the issue you're facing

Comment 17 Filip Balák 2024-05-14 10:07:05 UTC
In ODF 4.16.0-96 I see metrics from listed PRs available and metrics that should be removed not available as expected. Marking as VERIFIED as this was taken out from 4.15 and is targeted for 4.16 release.

Comment 20 errata-xmlrpc 2024-07-17 13:11:53 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.16.0 security, enhancement & bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2024:4591