Bug 1800489

Summary: KubePersistentVolumeFullInFourDays is triggered multiple times on ElasticSearch storage
Product: OpenShift Container Platform Reporter: Franck Grosjean <fgrosjea>
Component: MonitoringAssignee: Paul Gier <pgier>
Status: CLOSED ERRATA QA Contact: Junqi Zhao <juzhao>
Severity: low Docs Contact:
Priority: medium    
Version: 3.11.0CC: akuriyan, alegrand, anpicker, cvogel, dahernan, erooth, kakkoyun, lcosic, mloibl, pkrupa, surbania, vwalek
Target Milestone: ---Keywords: Reopened
Target Release: 3.11.z   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-06-17 20:21:25 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Franck Grosjean 2020-02-07 09:11:35 UTC
Description of problem:

Our cluster keeps firing the KubePersistentVolumeFullInFourDays alerts many times
This is a false positive due to alert sensibility
This is a similar behaviour than https://github.com/kubernetes-monitoring/kubernetes-mixin/issues/262

It seems to be fixed upstream and in OCP 4.x with a modification in alert definition (clause "for")
https://github.com/kubernetes-monitoring/kubernetes-mixin/pull/288

- alert: KubePersistentVolumeFullInFourDays
  annotations:
    message: Based on recent sampling, the persistent volume claimed by {{ $labels.persistentvolumeclaim
      }} in namespace {{ $labels.namespace }} is expected to fill up within four
      days. Currently {{ $value }} bytes are available.
  expr: |
    kubelet_volume_stats_available_bytes{namespace=~"(openshift-.*|kube-.*|default|logging)",job="kubelet"} and predict_linear(kubelet_volume_stats_available_bytes{namespace=~"(openshift-.*|kube-.*|default|logging)",job="kubelet"}[6h], 4 * 24 * 3600) < 0
  for: 5m
  labels:
    severity: critical

    
Ocp 3.11 - https://github.com/openshift/cluster-monitoring-operator/blob/release-3.11/assets/prometheus-k8s/rules.yaml
Ocp 4.3  - https://github.com/openshift/cluster-monitoring-operator/blob/release-4.3/assets/prometheus-k8s/rules.yaml

Is it possible to backport for 4.x to 3.11

Version-Release number of selected component (if applicable):
3.11.x

How reproducible:
Monitor an application storage with a similar behaviour than describe here
https://github.com/kubernetes-monitoring/kubernetes-mixin/issues/262


Actual results:
KubePersistentVolumeFullInFourDays is firing and solved automatically due to alert sensibility

Expected results:
KubePersistentVolumeFullInFourDays should avoid storage pic since it is a long term alert

Additional info:

Comment 2 Lili Cosic 2020-02-07 10:28:03 UTC
Yes its possible to backport, can't promise it will be ASAP. Assigning to Serg.

Comment 3 Sergiusz Urbaniak 2020-03-06 09:20:13 UTC
*** Bug 1810838 has been marked as a duplicate of this bug. ***

Comment 4 Pawel Krupa 2020-03-06 09:25:49 UTC
This alert was improved in 4.1 [1] and we don't have plans for backport.

[1]: https://github.com/openshift/cluster-monitoring-operator/blob/release-4.1/assets/prometheus-k8s/rules.yaml#L777-L789

Comment 6 Franck Grosjean 2020-04-10 07:41:17 UTC
Hello,

Is there option to plan a backport of this alert to 3.11 ?

Comment 16 Paul Gier 2020-05-22 20:07:49 UTC
Targetting this to 4.2.x, and then we can create additional bugs for backporting to the other versions.

Comment 25 errata-xmlrpc 2020-06-17 20:21:25 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2477