Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1973147

Summary:

KubePersistentVolumeFillingUp - False Alert firing for PVCs with volumeMode as block.

Product:

OpenShift Container Platform

Reporter:

Martin Bukatovic <mbukatov>

Component:

Monitoring

Assignee:

Arunprasad Rajkumar <arajkuma>

Status:

CLOSED ERRATA

QA Contact:

hongyan li <hongyli>

Severity:

high

Docs Contact:

Priority:

unspecified

Version:

4.8

CC:

adeshpan, alchan, amuller, anpicker, ansaini, arajkuma, assingh, ddelcian, dholler, dmoessne, ebrizuel, erooth, etamir, hongyli, kgershon, mbarrett, mpandey, muagarwa, mzali, ndevos, nthomas, ocs-bugs, ssonigra, tdale

Target Milestone:

---

Keywords:

Regression

Target Release:

4.9.0

Flags:

arajkuma: needinfo-
arajkuma: needinfo-

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

No Doc Update

Doc Text:

Story Points:

---

Clone Of:

Clones:

1984283 (view as bug list)

Environment:

Last Closed:

2021-10-18 17:34:57 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

1977460, 1984283, 1984817

Attachments:

Description	Flags
screenshot #1: ocs dashboard with list of firing alerts, see KubePersistentVolumeFillingUp there	none
screenshot of gp2 PVC with 0% free after creation	none

Description Martin Bukatovic 2021-06-17 10:36:55 UTC

Description of problem
======================

Right after installation of internal LSO StorageCluster, I see that there are
12 KubePersistentVolumeFillingUp firing (one for each LSO device).

Version-Release number of selected component
============================================

OCP 4.8.0-0.nightly-2021-06-16-020345
LSO 4.8.0-202106102328
OCS 4.8.0-418.ci

How reproducible
================

4/4

Steps to Reproduce
==================

1. Install OCP on vSphere, with 3 master and 6 worker nodes, with 2 local
   storage device per worker node (for LSO).

2. Install LSO and OCS operators.

3. Use "Create Storage Cluster" wizard in OCP Console to start setup of Storage
   Cluster in "Internal - Attached devices" mode.

4. When installation of storage cluster finishes, check firing alerts.

Actual results
==============

In my case (with 12 LSO devices), I see 12 KubePersistentVolumeFillingUp alerts
firing (one for each local pv).

Looking into one of such alerts I see:

> KubePersistentVolumeFillingUp Critical
> The PersistentVolume claimed by ocs-deviceset-ocs-1-data-3ldm77 in Namespace openshift-storage is only 0% free.

See also screenshot #1.

Expected results
================

KubePersistentVolumeFillingUp are not firing.

Additional info
===============

This wasn't happening in previous OCS releases => flagging as a regression.

OCS is using the local PVs for OSDs, and it imho doesn't make sense to evaluate
storage utilization on this level.

See also must gather tarball attached in a comment below.

Alert expression is:

```
kubelet_volume_stats_available_bytes{job="kubelet",metrics_path="/metrics",namespace=~"(openshift-.*|kube-.*|default|logging)"} / kubelet_volume_stats_capacity_bytes{job="kubelet",metrics_path="/metrics",namespace=~"(openshift-.*|kube-.*|default|logging)"} < 0.03
```

Comment 4 Martin Bukatovic 2021-06-17 10:43:42 UTC

Created attachment 1791781 [details]
screenshot #1: ocs dashboard with list of firing alerts, see KubePersistentVolumeFillingUp there

Comment 10 Niels de Vos 2021-06-25 11:08:50 UTC

Created attachment 1794347 [details]
screenshot of gp2 PVC with 0% free after creation

This is not only a false alert for LSO PVCs, also for PVCs created on gp2. Maybe the subject of this BZ can be adjusted?

And PVC with "volumeMode: Block" would be affected with this.

Comment 11 Mudit Agarwal 2021-06-28 12:14:42 UTC

Niels/Anmol, what are the next steps here? OCP 4.8 is already in Freeze stage and if this is a blocker, we might have to target this for OCP 4.8.1 so that it can be shipped before OCS4.8 releases.

Comment 12 Niels de Vos 2021-06-28 12:29:26 UTC

I think the approach should be to adjust the alerting rules. These alerts (KubePersistentVolumeFillingUp) should not get fired for `volumeMode: Block` volumes as usage/free can not be detected for them (both are set to 0, only capacity is valid).

Comment 18 hongyan li 2021-07-02 07:24:39 UTC

reproduced with comments #c15

Comment 20 Arunprasad Rajkumar 2021-07-08 06:51:05 UTC

Changes from upstream kubernetes-mixin to CMO will be synced through https://github.com/openshift/cluster-monitoring-operator/pull/1269

Comment 22 hongyan li 2021-07-16 02:32:58 UTC

Test with payload 4.9.0-0.nightly-2021-07-15-015134

Following #c15, no alerts KubePersistentVolumeFillingUp is triggered

Check alert rule changed
#oc -n openshift-monitoring get cm prometheus-k8s-rulefiles-0 -oyaml|grep KubePersistentVolumeFillingUp -A20
      - alert: KubePersistentVolumeFillingUp
        annotations:
          description: The PersistentVolume claimed by {{ $labels.persistentvolumeclaim
            }} in Namespace {{ $labels.namespace }} is only {{ $value | humanizePercentage
            }} free.
          summary: PersistentVolume is filling up.
        expr: |
          (
            kubelet_volume_stats_available_bytes{namespace=~"(openshift-.*|kube-.*|default)",job="kubelet", metrics_path="/metrics"}
              /
            kubelet_volume_stats_capacity_bytes{namespace=~"(openshift-.*|kube-.*|default)",job="kubelet", metrics_path="/metrics"}
          ) < 0.03
          and
          kubelet_volume_stats_used_bytes{namespace=~"(openshift-.*|kube-.*|default)",job="kubelet", metrics_path="/metrics"} > 0
        for: 1m
        labels:
          severity: critical
      - alert: KubePersistentVolumeFillingUp
        annotations:
          description: Based on recent sampling, the PersistentVolume claimed by {{ $labels.persistentvolumeclaim
            }} in Namespace {{ $labels.namespace }} is expected to fill up within four
            days. Currently {{ $value | humanizePercentage }} is available.
          summary: PersistentVolume is filling up.
        expr: |
          (
            kubelet_volume_stats_available_bytes{namespace=~"(openshift-.*|kube-.*|default)",job="kubelet", metrics_path="/metrics"}
              /
            kubelet_volume_stats_capacity_bytes{namespace=~"(openshift-.*|kube-.*|default)",job="kubelet", metrics_path="/metrics"}
          ) < 0.15
          and
          kubelet_volume_stats_used_bytes{namespace=~"(openshift-.*|kube-.*|default)",job="kubelet", metrics_path="/metrics"} > 0
          and
          predict_linear(kubelet_volume_stats_available_bytes{namespace=~"(openshift-.*|kube-.*|default)",job="kubelet", metrics_path="/metrics"}[6h], 4 * 24 * 3600) < 0
        for: 1h
        labels:
          severity: warning

Comment 30 Martin Bukatovic 2021-07-28 16:54:10 UTC

*** Bug 1986917 has been marked as a duplicate of this bug. ***

Comment 47 errata-xmlrpc 2021-10-18 17:34:57 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3759

Comment 52 Red Hat Bugzilla 2023-09-15 01:34:35 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 365 days