Description of problem: Pods with many log lines will report "failed to create fsnotify watcher: too many open files" How reproducible: always Steps to Reproduce: 1. run some load inside pod where it produces massive log output. Actual results: oc logs -f app_pod will not work and will only show "failed to create fsnotify watcher: too many open files" Expected results: logs to be visible Additional info: I have seen -> https://bugzilla.redhat.com/show_bug.cgi?id=1605153 and applying on node where pod is assigned -- fs.inotify.max_user_instances = 8192 fs.inotify.max_user_watches = 524288 -- helped logs to start again to be visible Maybe we should increase these values for "worker" nodes in OCP v4.x too. I am not sure should this go to "installer" or "tuned" component.
What's the outcome of this? ship a default for tuned from the MCO to set fs inotify.max_*?
What did we set in 3.11? I'm pretty sure we set this via Ansible before when setting up nodes, we may just have missed this config.
I found these in the ansible repo (both master and 3.11): https://github.com/openshift/openshift-ansible/pull/9204/files#diff-850a6b6759bd940bf13399b9766c6393 https://github.com/openshift/openshift-ansible/commit/85967b82241bc952a732c174f5cdc622b17f37ba I'll work on a PR for this in MCO.
I think this bug needs to be 4.2 and then we can backport to z-stream...?
Why is this bug against MCO and not the Node Tuning Operator? NTO already sets "fs.inotify.max_user_watches" https://github.com/openshift/cluster-node-tuning-operator/blob/master/assets/tuned/default-cr-tuned.yaml#L65 and I believe adding sysctls to MCO will only add to configuration sprawl. Will MCO set the sysctls to traditional RHEL hosts too?
I was told by Clayton that this belonged in MCO since it is a fundamental part of nodes and kubelet and that NTO wasn't responsible for the fundamental behavior of the default node. cc: Clayton could you clarify MCO vs NTO so we can all be on the same page? Should this stay in MCO? Thanks!
Verified in 4.2.0-0.nightly-2019-08-28-083236 - Fill up the log with this small app oc run myapp --image docker.io/mnguyenrh/output:latest - Let it run for 10 minutes - Check the logs with follow flag oc logs -f pod/myapp_pod_name
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2922
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days