Bug 2114515

Summary: Getting critical NodeFilesystemAlmostOutOfSpace alert for 4K tmpfs
Product: OpenShift Container Platform Reporter: John McMeeking <jmcmeek>
Component: MonitoringAssignee: Jan Fajerski <jfajersk>
Status: CLOSED ERRATA QA Contact: Junqi Zhao <juzhao>
Severity: low Docs Contact:
Priority: low    
Version: 4.7CC: anpicker, jfajersk, jmarcal
Target Milestone: ---   
Target Release: 4.13.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-05-17 22:46:56 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description John McMeeking 2022-08-02 19:36:22 UTC
Description of problem:

The IBM Cloud COS (S3) driver creates a memory-mapped tmpfs of the exact size required to hold a password - 4 KB.  This results in a NodeFilesystemAlmostOutOfSpace critical alert because the filesystem is full and or customers have to create a silence.


Version-Release number of selected component (if applicable):
4.6 and higher


How reproducible:
Always


Steps to Reproduce:
1.
2.
3.

Actual results:

This alert: Filesystem on tmpfs at 53.13.174.22 has only 0.00% available space left.

Due to this filesystem:
 tmpfs 4.0K 4.0K 0 100% /var/lib/ibmc-s3fs/97efabc827b4c933d1b5d3df035a95b50bb07da92bfbe817b79c42aa3e0484ec


Expected results:

No alert.


Additional info:

Could the NodeFilesystemAlmostOutOfSpace alert query ignore filesystems (or maybe just tmpfs) below some size? I think the expectation is that the tmpfs are created with no size specified, which defaults to a maximum size of 50% of physical memory. A 4K tmpfs is clearly small enough that 100% full is probably unavoidable.

Comment 1 Jan Fajerski 2022-08-03 09:01:54 UTC
@jmcmeek.com Thanks for the report, we're trying to figure out what the best exclusion criterion would be. Can you paste the /proc/mounts or the output of the mount command for this file system here please?

Comment 2 John McMeeking 2022-08-04 15:52:40 UTC
@jfajersk Is this what you wanted?

sh-4.4# cat /proc/mounts | grep s3fs
tmpfs /var/lib/ibmc-s3fs/99ad9dbdeaf708f1ae4818365b393c8c77f83236baf7bf851e66f79de9900615 tmpfs rw,seclabel,relatime,size=4k 0 0
s3fs /var/data/kubelet/pods/25dec7c1-a572-4667-b5e4-836783c48815/volumes/ibm~ibmc-s3fs/pvc-5b3c4dad-9dd1-4097-a1ce-f38f5a09aae7 fuse.s3fs rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other 0 0

Comment 4 Jan Fajerski 2022-08-09 13:28:49 UTC
Thanks, that's the info I was after. I have a pretty good idea how to improve the alert now and will propose a solution upstream.

Comment 5 Jan Fajerski 2022-08-11 11:18:00 UTC
I proposed a change to the alert generation upstream: https://github.com/prometheus/node_exporter/pull/2446

This would allow us to us to ignore tmpfs instances under /var/lib/ibmc-s3fs/ for these alerts, while keeping alerts for other tmpfs instances intact.
In telemeter, the majority of alerts is related to /var/lib/ibmc-s3fs/ but there are alerts for /run and /var as well, so we want to keep those alerts.

Comment 8 John McMeeking 2022-08-15 15:38:35 UTC
Thanks! An ibmc-s3fs specific solution is fine. Hopefully we (or someone else) won't create another one of these.

Comment 14 errata-xmlrpc 2023-05-17 22:46:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.13.0 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:1326