Bug 1766224 - erl_child_setup spams close() with large file descriptor limit
Summary: erl_child_setup spams close() with large file descriptor limit
Keywords:
Status: CLOSED EOL
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: erlang
Version: 15.0 (Stein)
Hardware: Unspecified
OS: Unspecified
high
medium
Target Milestone: ---
: ---
Assignee: John Eckersberg
QA Contact: nlevinki
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-10-28 15:34 UTC by John Eckersberg
Modified: 2020-09-30 19:15 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-09-30 19:15:03 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description John Eckersberg 2019-10-28 15:34:41 UTC
Description of problem:
Every time the erlang VM gets invoked, which happens repeatedly via rabbitmqctl during health checks, the erl_child_setup process gets started like this:

[root@controller-0 ~]# ps -ef | grep erl_child_setup
42439      17144   16985  0 Oct15 ?        00:01:19 erl_child_setup 65536

The 65536 argument is supplied by the VM to reflect the maximum file descriptors.

Then the erl_child_setup process closes all available fds in a loop:

https://github.com/erlang/otp/blob/736601dd0316bd7bc6060cd4fd0379473f6db682/erts/emulator/sys/unix/erl_child_setup.c#L428-L441

Because linux does not have closefrom() this invokes a close() for basically all fds.


Version-Release number of selected component (if applicable):
()[root@controller-0 /]# rpm -q erlang-erts
erlang-erts-20.3.8.22-1.el8ost.x86_64

How reproducible:
Always

Steps to Reproduce:
[root@controller-0 ~]# ulimit -n 65536
[root@controller-0 ~]# strace -f -e trace=close erl -noshell -eval 'init:stop().' 2>&1 | grep EBADF | wc -l
65525


Actual results:
calls close() on lots of bad file descriptors

Expected results:
should only close() descriptors that are actually open

Additional info:
Comparison of VM launch time by differing fd limits (under strace to time children as well):

[root@controller-0 ~]# ulimit -n 1024
[root@controller-0 ~]# time strace -qq -f -e trace=none erl -noshell -eval 'init:stop().'
[pid 720062] --- SIGUSR1 {si_signo=SIGUSR1, si_code=SI_USER, si_pid=720061, si_uid=0} ---
[pid 720062] +++ killed by SIGUSR1 +++
[pid 719999] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=720061, si_uid=0, si_status=0, si_utime=0, si_stime=1} ---

real    0m2.103s
user    0m0.548s
sys     0m0.600s
[root@controller-0 ~]# ulimit -n 65536
[root@controller-0 ~]# time strace -qq -f -e trace=none erl -noshell -eval 'init:stop().'
[pid 722323] --- SIGUSR1 {si_signo=SIGUSR1, si_code=SI_USER, si_pid=722322, si_uid=0} ---
[pid 722323] +++ killed by SIGUSR1 +++
[pid 722025] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=722322, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---

real    0m5.774s
user    0m1.014s
sys     0m2.279s

Comment 1 John Eckersberg 2019-10-28 15:56:31 UTC
I have a patch for this, will link here when I post it upstream.

Comment 2 John Eckersberg 2019-10-29 16:21:47 UTC
(In reply to John Eckersberg from comment #1)
> I have a patch for this, will link here when I post it upstream.

https://github.com/erlang/otp/pull/2438

Comment 3 John Eckersberg 2019-11-04 15:19:00 UTC
(In reply to John Eckersberg from comment #2)
> (In reply to John Eckersberg from comment #1)
> > I have a patch for this, will link here when I post it upstream.
> 
> https://github.com/erlang/otp/pull/2438

Merged upstream

Comment 4 stchen 2020-09-30 19:15:03 UTC
Closing EOL, OSP 15 has been retired as of Sept 19


Note You need to log in before you can comment on or make changes to this bug.