Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1735661

Summary:	Pods with many log lines will report "failed to create fsnotify watcher: too many open files"
Product:	OpenShift Container Platform	Reporter:	Elvir Kuric <ekuric>
Component:	Machine Config Operator	Assignee:	Kirsten Garrison <kgarriso>
Status:	CLOSED ERRATA	QA Contact:	Micah Abbott <miabbott>
Severity:	unspecified	Docs Contact:
Priority:	unspecified
Version:	4.1.z	CC:	akamra, ccoleman, jmencak, kgarriso, mnguyen
Target Milestone:	---
Target Release:	4.2.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:
Clones:	1745016 (view as bug list)		Environment:
Last Closed:	2019-10-16 06:34:15 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1745016

Description Elvir Kuric 2019-08-01 08:25:08 UTC

Description of problem:

Pods with many log lines will report "failed to create fsnotify watcher: too many open files" 

How reproducible:
always 


Steps to Reproduce:
1. run some load inside pod where it produces massive log output. 


Actual results:
oc logs -f app_pod 

will not work and will only show "failed to create fsnotify watcher: too many open files" 

Expected results:

logs to be visible 


Additional info:
I have seen 
->  https://bugzilla.redhat.com/show_bug.cgi?id=1605153  
and applying on node where pod is assigned 

-- 
fs.inotify.max_user_instances = 8192
fs.inotify.max_user_watches = 524288
-- 
helped logs to start again to be visible

Maybe we should increase these values for "worker" nodes in OCP v4.x too. 
I am not sure should this go to "installer" or "tuned" component.

Comment 1 Antonio Murdaca 2019-08-13 15:48:58 UTC

What's the outcome of this? ship a default for tuned from the MCO to set fs inotify.max_*?

Comment 2 Clayton Coleman 2019-08-13 16:03:30 UTC

What did we set in 3.11?  I'm pretty sure we set this via Ansible before when setting up nodes, we may just have missed this config.

Comment 3 Kirsten Garrison 2019-08-13 23:58:16 UTC

I found these in the ansible repo (both master and 3.11):

https://github.com/openshift/openshift-ansible/pull/9204/files#diff-850a6b6759bd940bf13399b9766c6393

https://github.com/openshift/openshift-ansible/commit/85967b82241bc952a732c174f5cdc622b17f37ba

I'll work on a PR for this in MCO.

Comment 4 Kirsten Garrison 2019-08-15 17:38:47 UTC

I think this bug needs to be 4.2 and then we can backport to z-stream...?

Comment 5 Jiří Mencák 2019-08-19 20:39:44 UTC

Why is this bug against MCO and not the Node Tuning Operator?  NTO already sets "fs.inotify.max_user_watches" https://github.com/openshift/cluster-node-tuning-operator/blob/master/assets/tuned/default-cr-tuned.yaml#L65 and I believe adding sysctls to MCO will only add to configuration sprawl.  Will MCO set the sysctls to traditional RHEL hosts too?

Comment 6 Kirsten Garrison 2019-08-19 21:39:20 UTC

I was told by Clayton that this belonged in MCO since it is a fundamental part of nodes and kubelet and that NTO wasn't responsible for the fundamental behavior of the default node.

cc: Clayton could you clarify MCO vs NTO so we can all be on the same page? Should this stay in MCO? Thanks!

Comment 9 Michael Nguyen 2019-08-30 16:18:30 UTC

Verified in 4.2.0-0.nightly-2019-08-28-083236


- Fill up the log with this small app
oc run myapp --image docker.io/mnguyenrh/output:latest

- Let it run for 10 minutes

- Check the logs with follow flag
oc logs -f pod/myapp_pod_name

Comment 10 errata-xmlrpc 2019-10-16 06:34:15 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2922

Comment 11 Red Hat Bugzilla 2023-09-14 05:37:26 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days