Bug 2080894

Summary: Metrics with label values that change during runtime should not expose the old values
Product: OpenShift Container Platform Reporter: Martin Kennelly <mkennell>
Component: NetworkingAssignee: Martin Kennelly <mkennell>
Networking sub component: ovn-kubernetes QA Contact: Anurag saxena <anusaxen>
Status: CLOSED WONTFIX Docs Contact:
Severity: low    
Priority: unspecified    
Version: 4.11   
Target Milestone: ---   
Target Release: 4.11.0   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 2080895 (view as bug list) Environment:
Last Closed: 2024-04-30 18:04:53 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2080895    

Description Martin Kennelly 2022-05-02 09:45:26 UTC
Description of problem:
We are using labelled (vector) metrics to export
info and if the label values change, we still
export the old labels values which confuses users
because they have no way of determining which labelled
metric is the latest.

We must reset each vector metric prior to using it to
ensure we are only exporting the current state and not
old states which should have been scraped previously due
to our usage of labels that can be dynamically updated
during runtime.

This is occurring in the OVN metrics exported by
ovnkube-node.

Follow on work is needed to fix OVS metrics that are
used by ovs-exporter.

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1. Get OVN Northbound database leader (use cluster/status command from ovn-nbctl)
2. Login to the ovnkube-node container on that node.
3. Curl the metrics endpoint and grep for ovn_db_cluster_server_role. Note the server_role label is leader.
4. On the same pod, repeatedly kill the nbdb container process only until the ovn_db_cluster_server_role metric changes it label 'server_role' from leader to 'cluster follower'. Do not kill the whole pod. Exec into the container continuously and send a sigterm to the container nbdb process.
5. The metric ovn_db_cluster_server_role has two values - it still exposes 'server_role' as leader but also as 'cluster follower'.

Note: it maybe very awkward to get the northbound database to switch from leader to 'cluster follower'. If you can freeze the process instead of killing it, you may have a better chance of it changing.

Actual results:
There are two values for ovn_db_cluster_server_role

Expected results:
There should only be one value or entry for ovn_db_cluster_server_role

Additional info:
Contact Nadia Pinaeva for further information if you cannot reproduce. I maybe on PTO.

Comment 1 Rory Thrasher 2024-04-30 18:04:53 UTC
OCP is no longer using Bugzilla and this bug appears to have been left in an orphaned state. If the bug is still relevant, please open a new issue in the OCPBUGS Jira project: https://issues.redhat.com/projects/OCPBUGS/summary