1800489 – KubePersistentVolumeFullInFourDays is triggered multiple times on ElasticSearch storage

Bug 1800489 - KubePersistentVolumeFullInFourDays is triggered multiple times on ElasticSearch storage

Summary: KubePersistentVolumeFullInFourDays is triggered multiple times on ElasticSear...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Monitoring
Sub Component:
Version:	3.11.0
Hardware:	All
OS:	All
Priority:	medium
Severity:	low
Target Milestone:	---
Target Release:	3.11.z
Assignee:	Paul Gier
QA Contact:	Junqi Zhao
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-02-07 09:11 UTC by Franck Grosjean
Modified:	2023-10-06 19:09 UTC (History)
CC List:	12 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-06-17 20:21:25 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift cluster-monitoring-operator pull 797	0	None	closed	Bug 1800489: Patch kube pv storage alert release 3.11	2021-02-04 10:34:59 UTC
Red Hat Product Errata	RHBA-2020:2477	0	None	None	None	2020-06-17 20:21:43 UTC

Description Franck Grosjean 2020-02-07 09:11:35 UTC

Description of problem:

Our cluster keeps firing the KubePersistentVolumeFullInFourDays alerts many times
This is a false positive due to alert sensibility
This is a similar behaviour than https://github.com/kubernetes-monitoring/kubernetes-mixin/issues/262

It seems to be fixed upstream and in OCP 4.x with a modification in alert definition (clause "for")
https://github.com/kubernetes-monitoring/kubernetes-mixin/pull/288

- alert: KubePersistentVolumeFullInFourDays
  annotations:
    message: Based on recent sampling, the persistent volume claimed by {{ $labels.persistentvolumeclaim
      }} in namespace {{ $labels.namespace }} is expected to fill up within four
      days. Currently {{ $value }} bytes are available.
  expr: |
    kubelet_volume_stats_available_bytes{namespace=~"(openshift-.*|kube-.*|default|logging)",job="kubelet"} and predict_linear(kubelet_volume_stats_available_bytes{namespace=~"(openshift-.*|kube-.*|default|logging)",job="kubelet"}[6h], 4 * 24 * 3600) < 0
  for: 5m
  labels:
    severity: critical

    
Ocp 3.11 - https://github.com/openshift/cluster-monitoring-operator/blob/release-3.11/assets/prometheus-k8s/rules.yaml
Ocp 4.3  - https://github.com/openshift/cluster-monitoring-operator/blob/release-4.3/assets/prometheus-k8s/rules.yaml

Is it possible to backport for 4.x to 3.11

Version-Release number of selected component (if applicable):
3.11.x

How reproducible:
Monitor an application storage with a similar behaviour than describe here
https://github.com/kubernetes-monitoring/kubernetes-mixin/issues/262


Actual results:
KubePersistentVolumeFullInFourDays is firing and solved automatically due to alert sensibility

Expected results:
KubePersistentVolumeFullInFourDays should avoid storage pic since it is a long term alert

Additional info:

Comment 2 Lili Cosic 2020-02-07 10:28:03 UTC

Yes its possible to backport, can't promise it will be ASAP. Assigning to Serg.

Comment 3 Sergiusz Urbaniak 2020-03-06 09:20:13 UTC

*** Bug 1810838 has been marked as a duplicate of this bug. ***

Comment 4 Pawel Krupa 2020-03-06 09:25:49 UTC

This alert was improved in 4.1 [1] and we don't have plans for backport.

[1]: https://github.com/openshift/cluster-monitoring-operator/blob/release-4.1/assets/prometheus-k8s/rules.yaml#L777-L789

Comment 6 Franck Grosjean 2020-04-10 07:41:17 UTC

Hello,

Is there option to plan a backport of this alert to 3.11 ?

Comment 16 Paul Gier 2020-05-22 20:07:49 UTC

Targetting this to 4.2.x, and then we can create additional bugs for backporting to the other versions.

Comment 25 errata-xmlrpc 2020-06-17 20:21:25 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2477

Note You need to log in before you can comment on or make changes to this bug.