Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 2012426

Summary:	ThanosSidecarBucketOperationsFailed/ThanosSidecarUnhealthy alerts don't have namespace label
Product:	OpenShift Container Platform	Reporter:	Junqi Zhao <juzhao>
Component:	Monitoring	Assignee:	Arunprasad Rajkumar <arajkuma>
Status:	CLOSED ERRATA	QA Contact:	Junqi Zhao <juzhao>
Severity:	low	Docs Contact:
Priority:	low
Version:	4.9	CC:	amuller, anpicker, aos-bugs, arajkuma, erooth
Target Milestone:	---
Target Release:	4.10.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2022-03-10 16:18:42 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Junqi Zhao 2021-10-09 09:10:26 UTC

Description of problem:
when review 4.9 release note, https://github.com/openshift/openshift-docs/pull/37264, find ThanosSidecarBucketOperationsFailed/ThanosSidecarUnhealthy alerts don't have namespace label
*************************************
      - alert: ThanosSidecarBucketOperationsFailed
        annotations:
          description: Thanos Sidecar {{$labels.instance}} bucket operations are failing
          summary: Thanos Sidecar bucket operations are failing
        expr: |
          sum by (job, instance) (rate(thanos_objstore_bucket_operation_failures_total{job=~"prometheus-(k8s|user-workload)-thanos-sidecar"}[5m])) > 0
        for: 1h
        labels:
          severity: warning
      - alert: ThanosSidecarUnhealthy
        annotations:
          description: Thanos Sidecar {{$labels.instance}} is unhealthy for more than
            {{$value}} seconds.
          summary: Thanos Sidecar is unhealthy.
        expr: |
          time() - max by (job, instance) (thanos_sidecar_last_heartbeat_success_time_seconds{job=~"prometheus-(k8s|user-workload)-thanos-sidecar"}) >= 240
        for: 1h
        labels:
          severity: warning
*************************************
example, search expr for ThanosSidecarUnhealthy
time() - max by (job, instance) (thanos_sidecar_last_heartbeat_success_time_seconds{job=~"prometheus-(k8s|user-workload)-thanos-sidecar"})
result does not include namespace label
{instance="10.129.2.10:10902", job="prometheus-k8s-thanos-sidecar"}  12.650763988494873
{instance="10.131.0.11:10902", job="prometheus-k8s-thanos-sidecar"}  15.16017460823059

we could add the namespace label to expr, that is
time() - max by (job, instance, namespace) (thanos_sidecar_last_heartbeat_success_time_seconds{job=~"prometheus-(k8s|user-workload)-thanos-sidecar"})
result
{instance="10.129.2.10:10902", job="prometheus-k8s-thanos-sidecar", namespace="openshift-monitoring"}  38.67030143737793
{instance="10.131.0.11:10902", job="prometheus-k8s-thanos-sidecar", namespace="openshift-monitoring"}  41.178159952163696

same for ThanosSidecarBucketOperationsFailed alert

Version-Release number of selected component (if applicable):
4.9.0-0.nightly-2021-10-08-093633

How reproducible:
always

Steps to Reproduce:
1. see the description
2.
3.

Actual results:
ThanosSidecarBucketOperationsFailed/ThanosSidecarUnhealthy alerts don't have namespace label

Expected results:
ThanosSidecarBucketOperationsFailed/ThanosSidecarUnhealthy alerts have namespace label

Additional info:

Comment 11 errata-xmlrpc 2022-03-10 16:18:42 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056