Bug 1735661 - Pods with many log lines will report "failed to create fsnotify watcher: too many open files"
Summary: Pods with many log lines will report "failed to create fsnotify watcher: too ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Machine Config Operator
Version: 4.1.z
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: 4.2.0
Assignee: Kirsten Garrison
QA Contact: Micah Abbott
URL:
Whiteboard:
Depends On:
Blocks: 1745016
TreeView+ depends on / blocked
 
Reported: 2019-08-01 08:25 UTC by Elvir Kuric
Modified: 2023-09-14 05:37 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1745016 (view as bug list)
Environment:
Last Closed: 2019-10-16 06:34:15 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift machine-config-operator pull 1063 0 None closed Bug 1735661: add template with default inotify.max_users settings 2021-02-04 02:06:13 UTC
Red Hat Product Errata RHBA-2019:2922 0 None None None 2019-10-16 06:34:31 UTC

Description Elvir Kuric 2019-08-01 08:25:08 UTC
Description of problem:

Pods with many log lines will report "failed to create fsnotify watcher: too many open files" 

How reproducible:
always 


Steps to Reproduce:
1. run some load inside pod where it produces massive log output. 


Actual results:
oc logs -f app_pod 

will not work and will only show "failed to create fsnotify watcher: too many open files" 

Expected results:

logs to be visible 


Additional info:
I have seen 
->  https://bugzilla.redhat.com/show_bug.cgi?id=1605153  
and applying on node where pod is assigned 

-- 
fs.inotify.max_user_instances = 8192
fs.inotify.max_user_watches = 524288
-- 
helped logs to start again to be visible

Maybe we should increase these values for "worker" nodes in OCP v4.x too. 
I am not sure should this go to "installer" or "tuned" component.

Comment 1 Antonio Murdaca 2019-08-13 15:48:58 UTC
What's the outcome of this? ship a default for tuned from the MCO to set fs inotify.max_*?

Comment 2 Clayton Coleman 2019-08-13 16:03:30 UTC
What did we set in 3.11?  I'm pretty sure we set this via Ansible before when setting up nodes, we may just have missed this config.

Comment 3 Kirsten Garrison 2019-08-13 23:58:16 UTC
I found these in the ansible repo (both master and 3.11):

https://github.com/openshift/openshift-ansible/pull/9204/files#diff-850a6b6759bd940bf13399b9766c6393

https://github.com/openshift/openshift-ansible/commit/85967b82241bc952a732c174f5cdc622b17f37ba

I'll work on a PR for this in MCO.

Comment 4 Kirsten Garrison 2019-08-15 17:38:47 UTC
I think this bug needs to be 4.2 and then we can backport to z-stream...?

Comment 5 Jiří Mencák 2019-08-19 20:39:44 UTC
Why is this bug against MCO and not the Node Tuning Operator?  NTO already sets "fs.inotify.max_user_watches" https://github.com/openshift/cluster-node-tuning-operator/blob/master/assets/tuned/default-cr-tuned.yaml#L65 and I believe adding sysctls to MCO will only add to configuration sprawl.  Will MCO set the sysctls to traditional RHEL hosts too?

Comment 6 Kirsten Garrison 2019-08-19 21:39:20 UTC
I was told by Clayton that this belonged in MCO since it is a fundamental part of nodes and kubelet and that NTO wasn't responsible for the fundamental behavior of the default node.

cc: Clayton could you clarify MCO vs NTO so we can all be on the same page? Should this stay in MCO? Thanks!

Comment 9 Michael Nguyen 2019-08-30 16:18:30 UTC
Verified in 4.2.0-0.nightly-2019-08-28-083236


- Fill up the log with this small app
oc run myapp --image docker.io/mnguyenrh/output:latest

- Let it run for 10 minutes

- Check the logs with follow flag
oc logs -f pod/myapp_pod_name

Comment 10 errata-xmlrpc 2019-10-16 06:34:15 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2922

Comment 11 Red Hat Bugzilla 2023-09-14 05:37:26 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days


Note You need to log in before you can comment on or make changes to this bug.