Bug 1632350 - [starter-ca-central-1] NodeDiskRunningFull reports wrong mount?
Summary: [starter-ca-central-1] NodeDiskRunningFull reports wrong mount?
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Monitoring
Version: 3.11.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: 4.1.0
Assignee: Frederic Branczyk
QA Contact: Junqi Zhao
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-09-24 15:42 UTC by Justin Pierce
Modified: 2019-06-04 10:40 UTC (History)
0 users

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-06-04 10:40:35 UTC
Target Upstream Version:


Attachments (Terms of Use)
alert in UI (65.14 KB, image/png)
2018-09-24 15:42 UTC, Justin Pierce
no flags Details
[6d] listing for mountpoint (16.64 KB, text/plain)
2018-09-24 15:45 UTC, Justin Pierce
no flags Details
listing showing actual < 0 mountpoint (4.87 KB, text/plain)
2018-09-24 15:48 UTC, Justin Pierce
no flags Details


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2019:0758 None None None 2019-06-04 10:40:45 UTC

Description Justin Pierce 2018-09-24 15:42:32 UTC
Created attachment 1486444 [details]
alert in UI

Description of problem:

Receiving alerts on this cluster:
`device tmpfs on node 172.31.17.171:9100 is running full within the next 24 hours (mounted at /host/root/run/user/0)` where /host/root/run/user/0 is completely empty. 

But if I check this mount, it is completely empty:
[root@ip-172-31-17-171 ~]# df -h | grep /run/user
tmpfs                                   3.2G     0  3.2G   0% /run/user/0


Version-Release number of selected component (if applicable):
oc v3.11.0-0.21.0
kubernetes v1.11.0+d4cacc0
features: Basic-Auth GSSAPI Kerberos SPNEGO

How reproducible:
Steady state on cluster at present

Actual results:
Alert is presently firing.

Expected results:
There is no danger of this partition filling. The prediction seems inaccurate. 

Additional info:

Comment 1 Justin Pierce 2018-09-24 15:45:08 UTC
Created attachment 1486445 [details]
[6d] listing for mountpoint

Comment 2 Justin Pierce 2018-09-24 15:48:09 UTC
Created attachment 1486459 [details]
listing showing actual < 0 mountpoint

Actual mountpoint seems to be:

{device="/dev/mapper/rootvg-var_log",endpoint="https",fstype="xfs",instance="172.31.17.171:9100",job="node-exporter",mountpoint="/host/root/var/log",namespace="openshift-monitoring",pod="node-exporter-gs6rv",service="node-exporter"}	-145849006.69107854

Comment 3 Frederic Branczyk 2019-02-22 16:48:08 UTC
https://github.com/openshift/cluster-monitoring-operator/pull/173 pulled in the changes to appropriately ignore a device such as the one reported here. This will land in 4.0.

Comment 5 Junqi Zhao 2019-03-08 07:33:51 UTC
Issue is fixed with
4.0.0-0.nightly-2019-03-06-074438

Comment 8 errata-xmlrpc 2019-06-04 10:40:35 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0758


Note You need to log in before you can comment on or make changes to this bug.