Bug 1800489 - KubePersistentVolumeFullInFourDays is triggered multiple times on ElasticSearch storage
Summary: KubePersistentVolumeFullInFourDays is triggered multiple times on ElasticSear...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Monitoring
Version: 3.11.0
Hardware: All
OS: All
medium
low
Target Milestone: ---
: 3.11.z
Assignee: Paul Gier
QA Contact: Junqi Zhao
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-02-07 09:11 UTC by Franck Grosjean
Modified: 2023-10-06 19:09 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-06-17 20:21:25 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-monitoring-operator pull 797 0 None closed Bug 1800489: Patch kube pv storage alert release 3.11 2021-02-04 10:34:59 UTC
Red Hat Product Errata RHBA-2020:2477 0 None None None 2020-06-17 20:21:43 UTC

Description Franck Grosjean 2020-02-07 09:11:35 UTC
Description of problem:

Our cluster keeps firing the KubePersistentVolumeFullInFourDays alerts many times
This is a false positive due to alert sensibility
This is a similar behaviour than https://github.com/kubernetes-monitoring/kubernetes-mixin/issues/262

It seems to be fixed upstream and in OCP 4.x with a modification in alert definition (clause "for")
https://github.com/kubernetes-monitoring/kubernetes-mixin/pull/288

- alert: KubePersistentVolumeFullInFourDays
  annotations:
    message: Based on recent sampling, the persistent volume claimed by {{ $labels.persistentvolumeclaim
      }} in namespace {{ $labels.namespace }} is expected to fill up within four
      days. Currently {{ $value }} bytes are available.
  expr: |
    kubelet_volume_stats_available_bytes{namespace=~"(openshift-.*|kube-.*|default|logging)",job="kubelet"} and predict_linear(kubelet_volume_stats_available_bytes{namespace=~"(openshift-.*|kube-.*|default|logging)",job="kubelet"}[6h], 4 * 24 * 3600) < 0
  for: 5m
  labels:
    severity: critical

    
Ocp 3.11 - https://github.com/openshift/cluster-monitoring-operator/blob/release-3.11/assets/prometheus-k8s/rules.yaml
Ocp 4.3  - https://github.com/openshift/cluster-monitoring-operator/blob/release-4.3/assets/prometheus-k8s/rules.yaml

Is it possible to backport for 4.x to 3.11

Version-Release number of selected component (if applicable):
3.11.x

How reproducible:
Monitor an application storage with a similar behaviour than describe here
https://github.com/kubernetes-monitoring/kubernetes-mixin/issues/262


Actual results:
KubePersistentVolumeFullInFourDays is firing and solved automatically due to alert sensibility

Expected results:
KubePersistentVolumeFullInFourDays should avoid storage pic since it is a long term alert

Additional info:

Comment 2 Lili Cosic 2020-02-07 10:28:03 UTC
Yes its possible to backport, can't promise it will be ASAP. Assigning to Serg.

Comment 3 Sergiusz Urbaniak 2020-03-06 09:20:13 UTC
*** Bug 1810838 has been marked as a duplicate of this bug. ***

Comment 4 Pawel Krupa 2020-03-06 09:25:49 UTC
This alert was improved in 4.1 [1] and we don't have plans for backport.

[1]: https://github.com/openshift/cluster-monitoring-operator/blob/release-4.1/assets/prometheus-k8s/rules.yaml#L777-L789

Comment 6 Franck Grosjean 2020-04-10 07:41:17 UTC
Hello,

Is there option to plan a backport of this alert to 3.11 ?

Comment 16 Paul Gier 2020-05-22 20:07:49 UTC
Targetting this to 4.2.x, and then we can create additional bugs for backporting to the other versions.

Comment 25 errata-xmlrpc 2020-06-17 20:21:25 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2477


Note You need to log in before you can comment on or make changes to this bug.