Bug 2114515

Summary:	Getting critical NodeFilesystemAlmostOutOfSpace alert for 4K tmpfs
Product:	OpenShift Container Platform	Reporter:	John McMeeking <jmcmeek>
Component:	Monitoring	Assignee:	Jan Fajerski <jfajersk>
Status:	CLOSED ERRATA	QA Contact:	Junqi Zhao <juzhao>
Severity:	low	Docs Contact:
Priority:	low
Version:	4.7	CC:	anpicker, jfajersk, jmarcal
Target Milestone:	---
Target Release:	4.13.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2023-05-17 22:46:56 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description John McMeeking 2022-08-02 19:36:22 UTC

Description of problem:

The IBM Cloud COS (S3) driver creates a memory-mapped tmpfs of the exact size required to hold a password - 4 KB.  This results in a NodeFilesystemAlmostOutOfSpace critical alert because the filesystem is full and or customers have to create a silence.


Version-Release number of selected component (if applicable):
4.6 and higher


How reproducible:
Always


Steps to Reproduce:
1.
2.
3.

Actual results:

This alert: Filesystem on tmpfs at 53.13.174.22 has only 0.00% available space left.

Due to this filesystem:
 tmpfs 4.0K 4.0K 0 100% /var/lib/ibmc-s3fs/97efabc827b4c933d1b5d3df035a95b50bb07da92bfbe817b79c42aa3e0484ec


Expected results:

No alert.


Additional info:

Could the NodeFilesystemAlmostOutOfSpace alert query ignore filesystems (or maybe just tmpfs) below some size? I think the expectation is that the tmpfs are created with no size specified, which defaults to a maximum size of 50% of physical memory. A 4K tmpfs is clearly small enough that 100% full is probably unavoidable.

Comment 1 Jan Fajerski 2022-08-03 09:01:54 UTC

@jmcmeek.com Thanks for the report, we're trying to figure out what the best exclusion criterion would be. Can you paste the /proc/mounts or the output of the mount command for this file system here please?

Comment 2 John McMeeking 2022-08-04 15:52:40 UTC

@jfajersk Is this what you wanted?

sh-4.4# cat /proc/mounts | grep s3fs
tmpfs /var/lib/ibmc-s3fs/99ad9dbdeaf708f1ae4818365b393c8c77f83236baf7bf851e66f79de9900615 tmpfs rw,seclabel,relatime,size=4k 0 0
s3fs /var/data/kubelet/pods/25dec7c1-a572-4667-b5e4-836783c48815/volumes/ibm~ibmc-s3fs/pvc-5b3c4dad-9dd1-4097-a1ce-f38f5a09aae7 fuse.s3fs rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other 0 0

Comment 4 Jan Fajerski 2022-08-09 13:28:49 UTC

Thanks, that's the info I was after. I have a pretty good idea how to improve the alert now and will propose a solution upstream.

Comment 5 Jan Fajerski 2022-08-11 11:18:00 UTC

I proposed a change to the alert generation upstream: https://github.com/prometheus/node_exporter/pull/2446

This would allow us to us to ignore tmpfs instances under /var/lib/ibmc-s3fs/ for these alerts, while keeping alerts for other tmpfs instances intact.
In telemeter, the majority of alerts is related to /var/lib/ibmc-s3fs/ but there are alerts for /run and /var as well, so we want to keep those alerts.

Comment 8 John McMeeking 2022-08-15 15:38:35 UTC

Thanks! An ibmc-s3fs specific solution is fine. Hopefully we (or someone else) won't create another one of these.

Comment 14 errata-xmlrpc 2023-05-17 22:46:56 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.13.0 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:1326