1955469 – [4.7] Disable collection of node_mountstats_nfs metrics in node_exporter

Bug 1955469 - [4.7] Disable collection of node_mountstats_nfs metrics in node_exporter

Summary: [4.7] Disable collection of node_mountstats_nfs metrics in node_exporter

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Monitoring
Sub Component:
Version:	4.6
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	4.7.z
Assignee:	Filip Petkovski
QA Contact:	hongyan li
Docs Contact:
URL:
Whiteboard:
Depends On:	1955467
Blocks:	1954016 1955471
TreeView+	depends on / blocked

Reported:	2021-04-30 08:02 UTC by Simon Pasquier
Modified:	2021-05-24 17:15 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:	Cause: the mountstats collector of the node-exporter daemonset was enabled. Consequence: nodes with NFS mount points expose mountstat metrics with high cardinality which induce high memory usage of Prometheus. Fix: the mountstats collector is disabled. Result: the mountstat metrics aren't exposed anymore by node-exporter which reduces the number of metrics stored by Prometheus.
Clone Of:	1955467
Clones:	1955471 (view as bug list)
Environment:
Last Closed:	2021-05-24 17:14:41 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift cluster-monitoring-operator pull 1154	0	None	open	Bug 1955469: remove node_mountstats_nfs_* metrics	2021-05-10 08:18:14 UTC
Red Hat Product Errata	RHSA-2021:1561	0	None	None	None	2021-05-24 17:15:15 UTC

Description Simon Pasquier 2021-04-30 08:02:17 UTC

+++ This bug was initially created as a clone of Bug #1955467 +++

Description of problem:
We've identified that on some clusters, the node_mountstats_nfs_* metrics account for more than half of the total metrics stored in Prometheus.

These metrics aren't used actually anywhere (neither rules nor dashboards) and storing them in Prometheus increases memory usage by a lot for clusters that have nodes configured with NFS.


Version-Release number of selected component (if applicable):
4.6

How reproducible:
Always

Steps to Reproduce:

Check the definition of the node-exporter daemonset:
oc describe -n openshift-monitoring daemonset node-exporter

Actual results:
The '--collector.mountstats' flag is listed in the node-exporter container's argument list.

Expected results:
The '--collector.mountstats' flag isn't set.

Additional info:
The mountstats collector had been enabled in [1] following a customer
request for enhancement. But looking at the history, the customer was
asking for the kubelet_volume_* metrics which weren't supported by their
storage provider at this time (it's been fixed since then [2]). The
mountstats metrics don't fill the same need and are superfluous.

[1] https://github.com/openshift/cluster-monitoring-operator/pull/409
[2] https://github.com/NetApp/trident/issues/134

Comment 1 hongyan li 2021-05-12 07:04:21 UTC

Test with PR

oc describe -n openshift-monitoring daemonset node-exporter
...
  Containers:
   node-exporter:
    Image:      quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:1c2a456aa6dc253f47f67d1aeb55b0781173a36b78e33a794cd1644c40dbd852
    Port:       <none>
    Host Port:  <none>
    Args:
      --web.listen-address=127.0.0.1:9100
      --path.sysfs=/host/sys
      --path.rootfs=/host/root
      --no-collector.wifi
      --collector.filesystem.ignored-mount-points=^/(dev|proc|sys|var/lib/docker/.+|var/lib/kubelet/pods/.+)($|/)
      --collector.netclass.ignored-devices=^(veth.*)$
      --collector.netdev.device-exclude=^(veth.*)$
      --collector.cpu.info
      --collector.textfile.directory=/var/node_exporter/textfile
...

Comment 2 hongyan li 2021-05-13 06:38:56 UTC

$ token=`oc sa get-token prometheus-k8s -n openshift-monitoring`
$ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://prometheus-k8s.openshift-monitoring.svc:9091/api/v1/label/__name__/values' | jq | grep -e node_mountstats_nfs_

no results

Comment 3 hongyan li 2021-05-13 13:49:02 UTC

create nfs sc and create pvc based on it, checked no metric name with prefix node_mountstats_nfs_

Comment 7 errata-xmlrpc 2021-05-24 17:14:41 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.12 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:1561

Note You need to log in before you can comment on or make changes to this bug.