Bug 2012426 - ThanosSidecarBucketOperationsFailed/ThanosSidecarUnhealthy alerts don't have namespace label
Summary: ThanosSidecarBucketOperationsFailed/ThanosSidecarUnhealthy alerts don't have ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Monitoring
Version: 4.9
Hardware: Unspecified
OS: Unspecified
low
low
Target Milestone: ---
: 4.10.0
Assignee: Arunprasad Rajkumar
QA Contact: Junqi Zhao
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-10-09 09:10 UTC by Junqi Zhao
Modified: 2022-03-10 16:19 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-03-10 16:18:42 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-monitoring-operator pull 1524 0 None open Bug 2012426: Add namespace label for all thanos alerts 2021-12-22 09:51:38 UTC
Red Hat Product Errata RHSA-2022:0056 0 None None None 2022-03-10 16:19:10 UTC

Description Junqi Zhao 2021-10-09 09:10:26 UTC
Description of problem:
when review 4.9 release note, https://github.com/openshift/openshift-docs/pull/37264, find ThanosSidecarBucketOperationsFailed/ThanosSidecarUnhealthy alerts don't have namespace label
*************************************
      - alert: ThanosSidecarBucketOperationsFailed
        annotations:
          description: Thanos Sidecar {{$labels.instance}} bucket operations are failing
          summary: Thanos Sidecar bucket operations are failing
        expr: |
          sum by (job, instance) (rate(thanos_objstore_bucket_operation_failures_total{job=~"prometheus-(k8s|user-workload)-thanos-sidecar"}[5m])) > 0
        for: 1h
        labels:
          severity: warning
      - alert: ThanosSidecarUnhealthy
        annotations:
          description: Thanos Sidecar {{$labels.instance}} is unhealthy for more than
            {{$value}} seconds.
          summary: Thanos Sidecar is unhealthy.
        expr: |
          time() - max by (job, instance) (thanos_sidecar_last_heartbeat_success_time_seconds{job=~"prometheus-(k8s|user-workload)-thanos-sidecar"}) >= 240
        for: 1h
        labels:
          severity: warning
*************************************
example, search expr for ThanosSidecarUnhealthy
time() - max by (job, instance) (thanos_sidecar_last_heartbeat_success_time_seconds{job=~"prometheus-(k8s|user-workload)-thanos-sidecar"})
result does not include namespace label
{instance="10.129.2.10:10902", job="prometheus-k8s-thanos-sidecar"}  12.650763988494873
{instance="10.131.0.11:10902", job="prometheus-k8s-thanos-sidecar"}  15.16017460823059

we could add the namespace label to expr, that is
time() - max by (job, instance, namespace) (thanos_sidecar_last_heartbeat_success_time_seconds{job=~"prometheus-(k8s|user-workload)-thanos-sidecar"})
result
{instance="10.129.2.10:10902", job="prometheus-k8s-thanos-sidecar", namespace="openshift-monitoring"}  38.67030143737793
{instance="10.131.0.11:10902", job="prometheus-k8s-thanos-sidecar", namespace="openshift-monitoring"}  41.178159952163696

same for ThanosSidecarBucketOperationsFailed alert

Version-Release number of selected component (if applicable):
4.9.0-0.nightly-2021-10-08-093633

How reproducible:
always

Steps to Reproduce:
1. see the description
2.
3.

Actual results:
ThanosSidecarBucketOperationsFailed/ThanosSidecarUnhealthy alerts don't have namespace label

Expected results:
ThanosSidecarBucketOperationsFailed/ThanosSidecarUnhealthy alerts have namespace label

Additional info:

Comment 11 errata-xmlrpc 2022-03-10 16:18:42 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056


Note You need to log in before you can comment on or make changes to this bug.