Bug 1955467

Summary: Disable collection of node_mountstats_nfs metrics in node_exporter
Product: OpenShift Container Platform Reporter: Simon Pasquier <spasquie>
Component: MonitoringAssignee: Simon Pasquier <spasquie>
Status: CLOSED ERRATA QA Contact: hongyan li <hongyli>
Severity: high Docs Contact:
Priority: high    
Version: 4.6CC: alegrand, anpicker, dkulkarn, erooth, kakkoyun, lcosic, pkrupa
Target Milestone: ---   
Target Release: 4.8.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: the mountstats collector of the node-exporter daemonset was enabled. Consequence: nodes with NFS mount points expose mountstat metrics with high cardinality which induce high memory usage of Prometheus. Fix: the mountstats collector is disabled. Result: the mountstat metrics aren't exposed anymore by node-exporter which reduces the number of metrics stored by Prometheus.
Story Points: ---
Clone Of:
: 1955469 (view as bug list) Environment:
Last Closed: 2021-07-27 23:05:13 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1951052, 1955469    

Description Simon Pasquier 2021-04-30 07:58:30 UTC
Description of problem:
We've identified that on some clusters, the node_mountstats_nfs_* metrics account for more than half of the total metrics stored in Prometheus.

These metrics aren't used actually anywhere (neither rules nor dashboards) and storing them in Prometheus increases memory usage by a lot for clusters that have nodes configured with NFS.


Version-Release number of selected component (if applicable):
4.6

How reproducible:
Always

Steps to Reproduce:

Check the definition of the node-exporter daemonset:
oc describe -n openshift-monitoring daemonset node-exporter

Actual results:
The '--collector.mountstats' flag is listed in the node-exporter container's argument list.

Expected results:
The '--collector.mountstats' flag isn't set.

Additional info:
The mountstats collector had been enabled in [1] following a customer
request for enhancement. But looking at the history, the customer was
asking for the kubelet_volume_* metrics which weren't supported by their
storage provider at this time (it's been fixed since then [2]). The
mountstats metrics don't fill the same need and are superfluous.

[1] https://github.com/openshift/cluster-monitoring-operator/pull/409
[2] https://github.com/NetApp/trident/issues/134

Comment 2 hongyan li 2021-05-06 06:46:24 UTC
Test with payload 4.8.0-0.nightly-2021-05-06-003426

oc describe -n openshift-monitoring daemonset node-exporter
...
  Containers:
   node-exporter:
    Image:      quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:1c2a456aa6dc253f47f67d1aeb55b0781173a36b78e33a794cd1644c40dbd852
    Port:       <none>
    Host Port:  <none>
    Args:
      --web.listen-address=127.0.0.1:9100
      --path.sysfs=/host/sys
      --path.rootfs=/host/root
      --no-collector.wifi
      --collector.filesystem.ignored-mount-points=^/(dev|proc|sys|var/lib/docker/.+|var/lib/kubelet/pods/.+)($|/)
      --collector.netclass.ignored-devices=^(veth.*)$
      --collector.netdev.device-exclude=^(veth.*)$
      --collector.cpu.info
      --collector.textfile.directory=/var/node_exporter/textfile
...

Comment 3 hongyan li 2021-05-13 05:43:10 UTC
#token=`oc sa get-token prometheus-k8s -n openshift-monitoring`
#oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://prometheus-k8s.openshift-monitoring.svc:9091/api/v1/label/__name__/values' | jq | grep -e node_mountstats_nfs_

no result

Comment 4 hongyan li 2021-05-13 09:43:17 UTC
created nfs sc and configured pv with the sc, check no metric node_mountstats_nfs_*

Comment 7 errata-xmlrpc 2021-07-27 23:05:13 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438