Bug 1800489

Summary:	KubePersistentVolumeFullInFourDays is triggered multiple times on ElasticSearch storage
Product:	OpenShift Container Platform	Reporter:	Franck Grosjean <fgrosjea>
Component:	Monitoring	Assignee:	Paul Gier <pgier>
Status:	CLOSED ERRATA	QA Contact:	Junqi Zhao <juzhao>
Severity:	low	Docs Contact:
Priority:	medium
Version:	3.11.0	CC:	akuriyan, alegrand, anpicker, cvogel, dahernan, erooth, kakkoyun, lcosic, mloibl, pkrupa, surbania, vwalek
Target Milestone:	---	Keywords:	Reopened
Target Release:	3.11.z
Hardware:	All
OS:	All
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2020-06-17 20:21:25 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Franck Grosjean 2020-02-07 09:11:35 UTC

Description of problem:

Our cluster keeps firing the KubePersistentVolumeFullInFourDays alerts many times
This is a false positive due to alert sensibility
This is a similar behaviour than https://github.com/kubernetes-monitoring/kubernetes-mixin/issues/262

It seems to be fixed upstream and in OCP 4.x with a modification in alert definition (clause "for")
https://github.com/kubernetes-monitoring/kubernetes-mixin/pull/288

- alert: KubePersistentVolumeFullInFourDays
  annotations:
    message: Based on recent sampling, the persistent volume claimed by {{ $labels.persistentvolumeclaim
      }} in namespace {{ $labels.namespace }} is expected to fill up within four
      days. Currently {{ $value }} bytes are available.
  expr: |
    kubelet_volume_stats_available_bytes{namespace=~"(openshift-.*|kube-.*|default|logging)",job="kubelet"} and predict_linear(kubelet_volume_stats_available_bytes{namespace=~"(openshift-.*|kube-.*|default|logging)",job="kubelet"}[6h], 4 * 24 * 3600) < 0
  for: 5m
  labels:
    severity: critical

    
Ocp 3.11 - https://github.com/openshift/cluster-monitoring-operator/blob/release-3.11/assets/prometheus-k8s/rules.yaml
Ocp 4.3  - https://github.com/openshift/cluster-monitoring-operator/blob/release-4.3/assets/prometheus-k8s/rules.yaml

Is it possible to backport for 4.x to 3.11

Version-Release number of selected component (if applicable):
3.11.x

How reproducible:
Monitor an application storage with a similar behaviour than describe here
https://github.com/kubernetes-monitoring/kubernetes-mixin/issues/262


Actual results:
KubePersistentVolumeFullInFourDays is firing and solved automatically due to alert sensibility

Expected results:
KubePersistentVolumeFullInFourDays should avoid storage pic since it is a long term alert

Additional info:

Comment 2 Lili Cosic 2020-02-07 10:28:03 UTC

Yes its possible to backport, can't promise it will be ASAP. Assigning to Serg.

Comment 3 Sergiusz Urbaniak 2020-03-06 09:20:13 UTC

*** Bug 1810838 has been marked as a duplicate of this bug. ***

Comment 4 Pawel Krupa 2020-03-06 09:25:49 UTC

This alert was improved in 4.1 [1] and we don't have plans for backport.

[1]: https://github.com/openshift/cluster-monitoring-operator/blob/release-4.1/assets/prometheus-k8s/rules.yaml#L777-L789

Comment 6 Franck Grosjean 2020-04-10 07:41:17 UTC

Hello,

Is there option to plan a backport of this alert to 3.11 ?

Comment 16 Paul Gier 2020-05-22 20:07:49 UTC

Targetting this to 4.2.x, and then we can create additional bugs for backporting to the other versions.

Comment 25 errata-xmlrpc 2020-06-17 20:21:25 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2477