Bug 1589656

Summary: [3.6] inotify resources exhausted : possible leak in cAdvisor
Product: OpenShift Container Platform Reporter: Takayoshi Kimura <tkimura>
Component: NodeAssignee: Avesh Agarwal <avagarwa>
Status: CLOSED ERRATA QA Contact: DeShuai Ma <dma>
Severity: high Docs Contact:
Priority: high    
Version: 3.6.0CC: acomabon, amurdaca, aos-bugs, avagarwa, cpatters, decarr, jokerman, mmccomas, sjenning, smunilla, wjiang, wmeng
Target Milestone: ---   
Target Release: 3.6.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-08-09 22:10:03 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Takayoshi Kimura 2018-06-11 05:49:15 UTC
Description of problem:

Getting the following error during atomic-openshift-node startup and it keeps crashing:

> atomic-openshift-node[17197]: E0607 22:24:14.775652   17197 manager.go:263] Registration of the raw container factory failed: inotify_init: too many open files
> atomic-openshift-node[17197]: F0607 22:24:14.775792   17197 kubelet.go:1238] Failed to start cAdvisor inotify_init: too many open files

Similar issues reported to upstream:

inotify resources exhausted : possible leak in cAdvisor #10421
https://github.com/kubernetes/kubernetes/issues/10421

Failed to start cAdvisor inotify_init: too many open files #58081
https://github.com/kubernetes/kubernetes/issues/58081

inotify: fix memory leak in Watcher
https://github.com/golang/exp/commit/292a51b8d262487dab23a588950e8052d63d9113#diff-913c4cd2428ce8671839b1afb2699b25


Version-Release number of selected component (if applicable):

atomic-openshift-3.6.173.0.113-1.git.0.65fb9fb.el7.x86_64


How reproducible:

Always at customer env


Steps to Reproduce:
1.
2.
3.

Actual results:

atomic-openshift-node crashes with "too many open files"


Expected results:

No crash


Additional info:

Comment 1 Takayoshi Kimura 2018-06-11 05:52:25 UTC
Known workaround is to increase the /proc/sys/fs/inotify/max_user_watches value. In the upstream issue report, it's confirmed to work when setting the following kernel parameter in /etc/sysctl.conf:

  fs.inotify.max_user_watches=1048576

inotify resources exhausted : possible leak in cAdvisor #10421
https://github.com/kubernetes/kubernetes/issues/10421

Comment 10 Avesh Agarwal 2018-07-11 12:21:14 UTC
https://github.com/openshift/openshift-ansible/pull/9021 is merged on master branch.

Comment 17 weiwei jiang 2018-08-03 09:15:37 UTC
Checked with:
# openshift version 
openshift v3.6.173.0.128
kubernetes v1.6.1+5115d708d7
etcd 3.2.1

And can not reproduce this, so move to verified

Comment 19 errata-xmlrpc 2018-08-09 22:10:03 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2339