Bug 2114515 - Getting critical NodeFilesystemAlmostOutOfSpace alert for 4K tmpfs
Summary: Getting critical NodeFilesystemAlmostOutOfSpace alert for 4K tmpfs
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Monitoring
Version: 4.7
Hardware: Unspecified
OS: Unspecified
low
low
Target Milestone: ---
: 4.13.0
Assignee: Jan Fajerski
QA Contact: Junqi Zhao
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-08-02 19:36 UTC by John McMeeking
Modified: 2023-05-17 22:47 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-05-17 22:46:56 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-monitoring-operator pull 1854 0 None open Bug 2114515: jsonnet: ignore `/var/lib/ibmc-s3fs/` mountpoints 2022-12-21 11:19:52 UTC
Red Hat Product Errata RHSA-2023:1326 0 None None None 2023-05-17 22:47:05 UTC

Description John McMeeking 2022-08-02 19:36:22 UTC
Description of problem:

The IBM Cloud COS (S3) driver creates a memory-mapped tmpfs of the exact size required to hold a password - 4 KB.  This results in a NodeFilesystemAlmostOutOfSpace critical alert because the filesystem is full and or customers have to create a silence.


Version-Release number of selected component (if applicable):
4.6 and higher


How reproducible:
Always


Steps to Reproduce:
1.
2.
3.

Actual results:

This alert: Filesystem on tmpfs at 53.13.174.22 has only 0.00% available space left.

Due to this filesystem:
 tmpfs 4.0K 4.0K 0 100% /var/lib/ibmc-s3fs/97efabc827b4c933d1b5d3df035a95b50bb07da92bfbe817b79c42aa3e0484ec


Expected results:

No alert.


Additional info:

Could the NodeFilesystemAlmostOutOfSpace alert query ignore filesystems (or maybe just tmpfs) below some size? I think the expectation is that the tmpfs are created with no size specified, which defaults to a maximum size of 50% of physical memory. A 4K tmpfs is clearly small enough that 100% full is probably unavoidable.

Comment 1 Jan Fajerski 2022-08-03 09:01:54 UTC
@jmcmeek.com Thanks for the report, we're trying to figure out what the best exclusion criterion would be. Can you paste the /proc/mounts or the output of the mount command for this file system here please?

Comment 2 John McMeeking 2022-08-04 15:52:40 UTC
@jfajersk Is this what you wanted?

sh-4.4# cat /proc/mounts | grep s3fs
tmpfs /var/lib/ibmc-s3fs/99ad9dbdeaf708f1ae4818365b393c8c77f83236baf7bf851e66f79de9900615 tmpfs rw,seclabel,relatime,size=4k 0 0
s3fs /var/data/kubelet/pods/25dec7c1-a572-4667-b5e4-836783c48815/volumes/ibm~ibmc-s3fs/pvc-5b3c4dad-9dd1-4097-a1ce-f38f5a09aae7 fuse.s3fs rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other 0 0

Comment 4 Jan Fajerski 2022-08-09 13:28:49 UTC
Thanks, that's the info I was after. I have a pretty good idea how to improve the alert now and will propose a solution upstream.

Comment 5 Jan Fajerski 2022-08-11 11:18:00 UTC
I proposed a change to the alert generation upstream: https://github.com/prometheus/node_exporter/pull/2446

This would allow us to us to ignore tmpfs instances under /var/lib/ibmc-s3fs/ for these alerts, while keeping alerts for other tmpfs instances intact.
In telemeter, the majority of alerts is related to /var/lib/ibmc-s3fs/ but there are alerts for /run and /var as well, so we want to keep those alerts.

Comment 8 John McMeeking 2022-08-15 15:38:35 UTC
Thanks! An ibmc-s3fs specific solution is fine. Hopefully we (or someone else) won't create another one of these.

Comment 14 errata-xmlrpc 2023-05-17 22:46:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.13.0 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:1326


Note You need to log in before you can comment on or make changes to this bug.