Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1735661

Summary: Pods with many log lines will report "failed to create fsnotify watcher: too many open files"
Product: OpenShift Container Platform Reporter: Elvir Kuric <ekuric>
Component: Machine Config OperatorAssignee: Kirsten Garrison <kgarriso>
Status: CLOSED ERRATA QA Contact: Micah Abbott <miabbott>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 4.1.zCC: akamra, ccoleman, jmencak, kgarriso, mnguyen
Target Milestone: ---   
Target Release: 4.2.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1745016 (view as bug list) Environment:
Last Closed: 2019-10-16 06:34:15 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1745016    

Description Elvir Kuric 2019-08-01 08:25:08 UTC
Description of problem:

Pods with many log lines will report "failed to create fsnotify watcher: too many open files" 

How reproducible:
always 


Steps to Reproduce:
1. run some load inside pod where it produces massive log output. 


Actual results:
oc logs -f app_pod 

will not work and will only show "failed to create fsnotify watcher: too many open files" 

Expected results:

logs to be visible 


Additional info:
I have seen 
->  https://bugzilla.redhat.com/show_bug.cgi?id=1605153  
and applying on node where pod is assigned 

-- 
fs.inotify.max_user_instances = 8192
fs.inotify.max_user_watches = 524288
-- 
helped logs to start again to be visible

Maybe we should increase these values for "worker" nodes in OCP v4.x too. 
I am not sure should this go to "installer" or "tuned" component.

Comment 1 Antonio Murdaca 2019-08-13 15:48:58 UTC
What's the outcome of this? ship a default for tuned from the MCO to set fs inotify.max_*?

Comment 2 Clayton Coleman 2019-08-13 16:03:30 UTC
What did we set in 3.11?  I'm pretty sure we set this via Ansible before when setting up nodes, we may just have missed this config.

Comment 3 Kirsten Garrison 2019-08-13 23:58:16 UTC
I found these in the ansible repo (both master and 3.11):

https://github.com/openshift/openshift-ansible/pull/9204/files#diff-850a6b6759bd940bf13399b9766c6393

https://github.com/openshift/openshift-ansible/commit/85967b82241bc952a732c174f5cdc622b17f37ba

I'll work on a PR for this in MCO.

Comment 4 Kirsten Garrison 2019-08-15 17:38:47 UTC
I think this bug needs to be 4.2 and then we can backport to z-stream...?

Comment 5 Jiří Mencák 2019-08-19 20:39:44 UTC
Why is this bug against MCO and not the Node Tuning Operator?  NTO already sets "fs.inotify.max_user_watches" https://github.com/openshift/cluster-node-tuning-operator/blob/master/assets/tuned/default-cr-tuned.yaml#L65 and I believe adding sysctls to MCO will only add to configuration sprawl.  Will MCO set the sysctls to traditional RHEL hosts too?

Comment 6 Kirsten Garrison 2019-08-19 21:39:20 UTC
I was told by Clayton that this belonged in MCO since it is a fundamental part of nodes and kubelet and that NTO wasn't responsible for the fundamental behavior of the default node.

cc: Clayton could you clarify MCO vs NTO so we can all be on the same page? Should this stay in MCO? Thanks!

Comment 9 Michael Nguyen 2019-08-30 16:18:30 UTC
Verified in 4.2.0-0.nightly-2019-08-28-083236


- Fill up the log with this small app
oc run myapp --image docker.io/mnguyenrh/output:latest

- Let it run for 10 minutes

- Check the logs with follow flag
oc logs -f pod/myapp_pod_name

Comment 10 errata-xmlrpc 2019-10-16 06:34:15 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2922

Comment 11 Red Hat Bugzilla 2023-09-14 05:37:26 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days