Bug 1870261 - 4.5.5 node-exporter pods show "failed to parse mountstats: invalid NFS per-operations stats"
Summary: 4.5.5 node-exporter pods show "failed to parse mountstats: invalid NFS per-op...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Monitoring
Version: 4.5
Hardware: Unspecified
OS: Unspecified
urgent
low
Target Milestone: ---
: 4.6.0
Assignee: Pawel Krupa
QA Contact: Junqi Zhao
URL:
Whiteboard:
Depends On:
Blocks: 1890466
TreeView+ depends on / blocked
 
Reported: 2020-08-19 15:34 UTC by Caden Marchese
Modified: 2023-12-15 18:56 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1890466 (view as bug list)
Environment:
Last Closed: 2020-10-27 16:29:12 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 5362041 0 None None None 2020-09-01 09:34:32 UTC
Red Hat Product Errata RHBA-2020:4196 0 None None None 2020-10-27 16:29:27 UTC

Description Caden Marchese 2020-08-19 15:34:50 UTC
Description of problem:
Upon an upgrade from 4.4.16 to 4.5.5, some node-exporter pods began to show the following log message:

time="2020-08-17T19:37:33Z" level=error msg="ERROR: mountstats collector failed after 0.000539s: failed to parse mountstats: invalid NFS per-operations stats: [NULL: 1 1 0 44 24 2 0 3 0]" source="collector.go:132"

Version-Release number of selected component (if applicable):
4.5.5

How reproducible:
I tried on my 4.5 cluster and it was able to successfully listen, so I'm not sure if it depends on the upgrade from 4.4 and/or some custom resource:

time="2020-08-10T17:14:06Z" level=info msg="Build context (go=go1.13.4, user=root@19c35050d8da, date=20200724-06:51:03)" source="node_exporter.go:157"
time="2020-08-10T17:14:06Z" level=info msg="Enabled collectors:" source="node_exporter.go:97"
time="2020-08-10T17:14:06Z" level=info msg=" - arp" source="node_exporter.go:104"
time="2020-08-10T17:14:06Z" level=info msg=" - bcache" source="node_exporter.go:104"
time="2020-08-10T17:14:06Z" level=info msg=" - bonding" 
...
time="2020-08-10T17:14:06Z" level=info msg=" - zfs" source="node_exporter.go:104"
time="2020-08-10T17:14:06Z" level=info msg="Listening on 127.0.0.1:9100" source="node_exporter.go:170"

Steps to Reproduce:
1. Upgrade 4.4.16 to 4.5.5
2. Observe errors in node-exporter

Actual results:
$ oc logs node-exporter-2b2bj -c node-exporter | tail
time="2020-08-17T19:37:33Z" level=error msg="ERROR: mountstats collector failed after 0.000539s: failed to parse mountstats: invalid NFS per-operations stats: [NULL: 1 1 0 44 24 2 0 3 0]" source="collector.go:132"

Expected results:
$ oc logs node-exporter-2b2bj -c node-exporter | tail
time="2020-08-10T17:14:06Z" level=info msg="Listening on 127.0.0.1:9100" source="node_exporter.go:170"

Additional info:
The following two issues were submitted upstream:

[0] https://github.com/prometheus/node_exporter/issues/1583
[1] https://github.com/prometheus/procfs/issues/275

Comment 2 Sergiusz Urbaniak 2020-08-20 16:54:41 UTC
This has been fixed upstream in https://github.com/prometheus/procfs/pull/276 and vendored into node_exporter in v1.0.1 which is going available in OpenShift 4.6.

Comment 4 Pawel Krupa 2020-09-01 10:47:29 UTC
Single patch update is not something that can be easily done and backporting fix to 4.5 would mean updating node_exporter to newer version and thus adding new features. Since this would add new features please open an RFE.

Comment 11 Pawel Krupa 2020-10-02 13:30:39 UTC
Node_exporter is a crucial component in monitoring stack which provides data for alerts, dashboards, and telemetry. Unfortunatelly version 1.0 changes lots of metrics and as such doing an in-place upgrade from v0.18 has high potential of breaking other parts of the stack. Taking this into account we recommend upgrading to OpenShift 4.6 as this is where everything was already tested.

Comment 17 Pawel Krupa 2020-10-22 10:23:53 UTC
Due to how our internal processes for backporting works this bug needs now verification if everything works correctly in 4.6. The tracking of the backport process for 4.5 is done in https://bugzilla.redhat.com/show_bug.cgi?id=1890466.


For QE:

In the latest node_exporter included in 4.6 this bug shouldn't happen. In 4.5 this could be reproduced by mounting any NFS volume on openshift host and checking node_exporter logs for that host. Also kernel version higher or equal to 5.3 is necessary for reproduction.

Comment 20 Junqi Zhao 2020-10-23 06:02:30 UTC
tested with 4.5.0-0.nightly-2020-10-20-022340 and attached NFS PVs, issue is reproduced
# oc -n openshift-monitoring logs -c node-exporter node-exporter-2zbvp | grep "failed to parse mountstats: invalid NFS per-operations stats" | tail -n 4
time="2020-10-23T06:00:28Z" level=error msg="ERROR: mountstats collector failed after 0.001108s: failed to parse mountstats: invalid NFS per-operations stats: [NULL: 1 1 0 44 24 1 0 2 0]" source="collector.go:132"
time="2020-10-23T06:00:31Z" level=error msg="ERROR: mountstats collector failed after 0.000795s: failed to parse mountstats: invalid NFS per-operations stats: [NULL: 1 1 0 44 24 1 0 2 0]" source="collector.go:132"
time="2020-10-23T06:00:43Z" level=error msg="ERROR: mountstats collector failed after 0.000780s: failed to parse mountstats: invalid NFS per-operations stats: [NULL: 1 1 0 44 24 1 0 2 0]" source="collector.go:132"
time="2020-10-23T06:00:46Z" level=error msg="ERROR: mountstats collector failed after 0.000836s: failed to parse mountstats: invalid NFS per-operations stats: [NULL: 1 1 0 44 24 1 0 2 0]" source="collector.go:132"

Comment 21 Junqi Zhao 2020-10-23 06:04:02 UTC
(In reply to Junqi Zhao from comment #20)
> tested with 4.5.0-0.nightly-2020-10-20-022340 and attached NFS PVs, issue is
> reproduced
node_exporter (version=0.18.1)

Comment 22 Junqi Zhao 2020-10-23 07:39:22 UTC
bound NFS PVs to 4.5.0-0.nightly-2020-10-20-022340 and upgrade to 4.6.0-0.nightly-2020-10-22-034051, node_exporter version=1.0.1, no "failed to parse mountstats: invalid NFS per-operations stats" errors, example:
# oc -n openshift-monitoring logs -c node-exporter node-exporter-6qpr9 | grep "invalid NFS per-operations stats"
no result

Comment 24 errata-xmlrpc 2020-10-27 16:29:12 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196


Note You need to log in before you can comment on or make changes to this bug.