Bug 1810838

Summary: [3.11] - alert KubePersistentVolumeFullInFourDays is showing even there is enough storage
Product: OpenShift Container Platform Reporter: Vladislav Walek <vwalek>
Component: MonitoringAssignee: Pawel Krupa <pkrupa>
Status: CLOSED CURRENTRELEASE QA Contact: Junqi Zhao <juzhao>
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.11.0CC: alegrand, anpicker, erooth, kakkoyun, lcosic, mloibl, pkrupa, surbania
Target Milestone: ---Keywords: Reopened
Target Release: 3.11.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-04-02 10:46:28 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Vladislav Walek 2020-03-06 01:17:33 UTC
Description of problem:

IHAC where the alert "KubePersistentVolumeFullInFourDays" is triggered for the logging-es-0 pvc will be filled within 4 days.

However, running the command in the Prometheus shows that the storage is enough.

The graph in the attachments shows that there is little spike when the value is below 0 (as in the alert), however, it stabilize on the correct value after couple of minutes.

Labels
alertname = KubePersistentVolumeFullInFourDays
cluster = <cluster>
endpoint = https-metrics
instance = <node-ip>:10250
job = kubelet
namespace = openshift-logging
persistentvolumeclaim = logging-es-0
prometheus = openshift-monitoring/k8s
service = kubelet
severity = critical
Annotations
message = Based on recent sampling, the persistent volume claimed by logging-es-0 in namespace openshift-logging is expected to fill up within four days. Currently 4.838066176e+10 bytes are available.


Version-Release number of selected component (if applicable):
OpenShift Container Platform 3.11

How reproducible:
n/a

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:
the problem could be related to bug - https://bugzilla.redhat.com/1809375 where the prometheus pods are constantly restarted.

Comment 6 Pawel Krupa 2020-04-02 10:46:28 UTC
This is a prediction based alert that should be of a warning type. It doesn't have any direct impact on running cluster and there is `KubePersistentVolumeUsageCritical` alert which if firing needs fast reaction.

KubePersistentVolumeFullInFourDays is greatly improved in the latest OpenShift versions and currently, we don't have plans for backport.