Bug 1955471

Summary: [4.6] Disable collection of node_mountstats_nfs metrics in node_exporter
Product: OpenShift Container Platform Reporter: Simon Pasquier <spasquie>
Component: MonitoringAssignee: Arunprasad Rajkumar <arajkuma>
Status: CLOSED ERRATA QA Contact: hongyan li <hongyli>
Severity: high Docs Contact:
Priority: high    
Version: 4.6CC: alegrand, anpicker, erooth, hongyli, juzhao, kakkoyun, lcosic, pkrupa
Target Milestone: ---Keywords: EasyFix
Target Release: 4.6.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: the mountstats collector of the node-exporter daemonset was enabled. Consequence: nodes with NFS mount points expose mountstat metrics with high cardinality which induce high memory usage of Prometheus. Fix: the mountstats collector is disabled. Result: the mountstat metrics aren't exposed anymore by node-exporter which reduces the number of metrics stored by Prometheus.
Story Points: ---
Clone Of: 1955469 Environment:
Last Closed: 2021-06-01 12:10:08 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1955469    
Bug Blocks: 1955452    

Description Simon Pasquier 2021-04-30 08:03:54 UTC
+++ This bug was initially created as a clone of Bug #1955469 +++

+++ This bug was initially created as a clone of Bug #1955467 +++

Description of problem:
We've identified that on some clusters, the node_mountstats_nfs_* metrics account for more than half of the total metrics stored in Prometheus.

These metrics aren't used actually anywhere (neither rules nor dashboards) and storing them in Prometheus increases memory usage by a lot for clusters that have nodes configured with NFS.


Version-Release number of selected component (if applicable):
4.6

How reproducible:
Always

Steps to Reproduce:

Check the definition of the node-exporter daemonset:
oc describe -n openshift-monitoring daemonset node-exporter

Actual results:
The '--collector.mountstats' flag is listed in the node-exporter container's argument list.

Expected results:
The '--collector.mountstats' flag isn't set.

Additional info:
The mountstats collector had been enabled in [1] following a customer
request for enhancement. But looking at the history, the customer was
asking for the kubelet_volume_* metrics which weren't supported by their
storage provider at this time (it's been fixed since then [2]). The
mountstats metrics don't fill the same need and are superfluous.

[1] https://github.com/openshift/cluster-monitoring-operator/pull/409
[2] https://github.com/NetApp/trident/issues/134

Comment 2 Junqi Zhao 2021-05-21 03:47:58 UTC
create nfs sc and create pvc based on it, checked no metric name with prefix node_mountstats_nfs_
# token=`oc sa get-token prometheus-k8s -n openshift-monitoring`
# oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://prometheus-k8s.openshift-monitoring.svc:9091/api/v1/label/__name__/values' | jq | grep node_mountstats_nfs
no result

# oc -n openshift-monitoring get ds node-exporter -oyaml | grep "\--collector"
        - --collector.filesystem.ignored-mount-points=^/(dev|proc|sys|var/lib/docker/.+|var/lib/kubelet/pods/.+)($|/)
        - --collector.cpu.info
        - --collector.textfile.directory=/var/node_exporter/textfile

Comment 7 errata-xmlrpc 2021-06-01 12:10:08 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6.31 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:2100