Bug 906468

Summary:	Deadlock in glibc between fork and malloc
Product:	Red Hat Enterprise Linux 7	Reporter:	alex_fiddler
Component:	glibc	Assignee:	Florian Weimer <fweimer>
Status:	CLOSED ERRATA	QA Contact:	Sergey Kolosov <skolosov>
Severity:	unspecified	Docs Contact:
Priority:	unspecified
Version:	7.0	CC:	af, codonell, cww, fweimer, goddlgiddl, jistone, jwalter, jwright, mcermak, mnewsome, pfrankli, skolosov
Target Milestone:	rc	Keywords:	Patch
Target Release:	7.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	glibc-2.17-162.el7	Doc Type:	Bug Fix
Doc Text:	Cause: glibc malloc arena locks, an internal malloc lock, and an internal stdio stream management lock could be acquired in different orders by multiple threads. Consequence: Concurrent calls to fork, malloc, and fflush (NULL) could deadlock, resulting in a hanging process. Fix: glibc now acquires these locks in a consistent order. Result: The deadlock and process hang is eliminated.	Story Points:	---
Clone Of:
Clones:	1332917 (view as bug list)		Environment:
Last Closed:	2017-08-01 18:06:55 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1332917
Bug Blocks:	1298243, 1390370

Description alex_fiddler 2013-01-31 17:00:08 UTC

I run into a deadlock in glibc where fork deadlocks with malloc.  This problem happens after a few hours of running the program under a heavy load.  Heavy load = fork is performed ~100 times/sec.

I am running on a 16-core x86-64 (32 hardware threads), with plenty of RAM and disk space.  Linux 2.6.32-220.el6.x86_64.

The root of the problem appears to be in glibc iogetdelim.c.  The function _IO_getdelim calls _IO_acquire_lock (fp), and then it calls malloc.  malloc then attempts to acquire the lock on the heap, which causes the deadlock since the forking mechanism locks the heap until forking is complete.

The easy patch to this problem appears to be in iogetdelim.c, in the function _IO_getdelim.  The call to malloc should be performed before _IO_acquire_lock (fp).  Since glibc doesn't have the nested locking anymore, the problem is solved.



I am attaching below my detailed analysis of the problem from the core dump of the deadlocked process.  I edited out the symbols which are not relevant to the problem at hand.



Thread 58

Thread 58 (Thread 0x7f0f3fdfd700 (LWP 13994)):
Reading in symbols for ../nptl/sysdeps/unix/sysv/linux/x86_64/fork.c...done.
Reading in symbols for iopopen.c...done.
Reading in symbols for src/LuaPluginSessionStart.cpp...done.
#0  __lll_lock_wait_private () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:97
#1  0x00007f0ff1395354 in _L_lock_2167 () at genops.c:598
#2  0x00007f0ff13945ff in *__GI__IO_list_lock () at genops.c:1299
#3  0x00007f0ff13cbc47 in __libc_fork () at ../nptl/sysdeps/unix/sysv/linux/fork.c:117
#4  0x00007f0ff1388977 in _IO_new_proc_open (fp=0x7f0eac0394b0, command=0x7f0eac055a38 "ls /etc/opt/*.conf | grep -v appconf.conf", mode=<optimized out>) at iopopen.c:191
#5  0x00007f0ff1388c69 in _IO_new_popen (command=0x7f0eac055a38 "ls /etc/opt/*.conf | grep -v appconf.conf", mode=0x7f0ff35e54d1 "r") at iopopen.c:312
#6  0x00007f0ff35de160 in io_popen () from /opt/...liblua.so
#7  0x00007f0ff35ce581 in luaD_precall () from /opt/.../liblua.so
#8  0x00007f0ff35d9239 in luaV_execute () from /opt/.../liblua.so
#9  0x00007f0ff35cea4d in luaD_call () from /opt/.../liblua.so
#10 0x00007f0ff35ce0f7 in luaD_rawrunprotected () from /opt/.../liblua.so
#11 0x00007f0ff35ce172 in luaD_pcall () from /opt/.../liblua.so
#12 0x00007f0ff35c9b91 in lua_pcall () from /opt/.../liblua.so
#13 0x00007f0ffd2551c0 in ...
#14 0x00007f0ffd252bb3 in ...
#15 0x00007f0ffd25c3aa in ...
#16 0x00007f0ffd2585df in ...
#17 0x00007f0ffcfa2d0c in ...
#18 0x00007f0ffcfa34c1 in ...
#19 0x00007f0ffcf6e4e5 in ...
#20 0x00007f0ffcf6f74e in ...
#21 0x00007f0ffcf5760e in ...
#22 0x00007f0ffcf5ab57 in ...
#23 0x00007f0ffcf5ae65 in ...
#24 0x0000000000408c4d in ...
#25 0x00007f0ff4ee0184 in g_thread_create_proxy (data=0x7f0fd4002cd0) at gthread.c:635
#26 0x00007f0ff319d7f1 in start_thread (arg=0x7f0f3fdfd700) at pthread_create.c:301
#27 0x00007f0ff140670d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

Stuck in fork waiting for list_all_lock in genops.c:1299

(gdb) frame 2
#2  0x00007f0ff13945ff in *__GI__IO_list_lock () at genops.c:1299
1299      _IO_lock_lock (list_all_lock);
(gdb) print list_all_lock
$2 = {lock = 2, cnt = 1, owner = 0x7f0f3fafa700}

Thread 0x7f0f3fafa700 == gdb thread 65

Thread 65 (Thread 0x7f0f3fafa700 (LWP 14019)):
#0  __lll_lock_wait_private () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:97
#1  0x00007f0ff1395287 in _L_lock_1239 () at genops.c:598
#2  0x00007f0ff1393f0b in _IO_flush_all_lockp (do_lock=1) at genops.c:840
#3  0x00007f0ff35de155 in io_popen () from /opt/.../liblua.so
#4  0x00007f0ff35ce581 in luaD_precall () from /opt/.../liblua.so
#5  0x00007f0ff35d9239 in luaV_execute () from /opt/.../liblua.so
#6  0x00007f0ff35cea4d in luaD_call () from /opt/.../liblua.so
#7  0x00007f0ff35ce0f7 in luaD_rawrunprotected () from /opt/.../liblua.so
#8  0x00007f0ff35ce172 in luaD_pcall () from /opt/.../liblua.so
#9  0x00007f0ff35c9b91 in lua_pcall () from /opt/.../liblua.so
#10 0x00007f0ffd2551c0 in ...
#11 0x00007f0ffd252bb3 in ...
#12 0x00007f0ffd25de88 in ...
#13 0x00007f0ffd25843f in ...
#14 0x00007f0ffcfc4d7d in ...
#15 0x00007f0ffcfc57ed in ...
#16 0x00007f0ffcf6d394 in ...
#17 0x00007f0ffcf6f74e in ...
#18 0x00007f0ffcf5760e in ...
#19 0x00007f0ffcf5ab57 in ...
#20 0x00007f0ffcf5ae65 in ...
#21 0x0000000000408c4d in ...
#22 0x00007f0ff4ee0184 in g_thread_create_proxy (data=0x7f0fd4003330) at gthread.c:635
#23 0x00007f0ff319d7f1 in start_thread (arg=0x7f0f3fafa700) at pthread_create.c:301
#24 0x00007f0ff140670d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

Inspecting code in frame 2, we see the following code:


#ifdef _IO_MTSAFE_IO
  _IO_cleanup_region_start_noarg (flush_cleanup);
  if (do_lock)
    _IO_lock_lock (list_all_lock);
#endif

  last_stamp = _IO_list_all_stamp;
  fp = (_IO_FILE *) INTUSE(_IO_list_all);
  while (fp != NULL)
    {
      run_fp = fp;
      if (do_lock)
	_IO_flockfile (fp);
	

Before getting to _IO_flockfile (fp), the thread calls _IO_lock_lock (list_all_lock).  

(gdb) print list_all_lock
$4 = {lock = 2, cnt = 1, owner = 0x7f0f3fafa700}


This confirms that list_all_lock is taken by thread 65.

Now find the ownership of fp.  So:

(gdb) print *fp
$6 = {_flags = -72539000, _IO_read_ptr = 0x0, _IO_read_end = 0x0, _IO_read_base = 0x0, _IO_write_base = 0x0, _IO_write_ptr = 0x0, _IO_write_end = 0x0, _IO_buf_base = 0x0, _IO_buf_end = 0x0, _IO_save_base = 0x0, 
  _IO_backup_base = 0x0, _IO_save_end = 0x0, _markers = 0x0, _chain = 0x7f0eb0063a70, _fileno = 42, _flags2 = 0, _old_offset = 577, _cur_column = 0, _vtable_offset = 2 '\002', 
  _shortbuf =  <incomplete sequence \334>, _lock = 0x7f0edc05bf70, _offset = -1, _codecvt = 0x666e496e6f002d25, _wide_data = 0x7f0edc05bf80, _freeres_list = 0x0, _freeres_buf = 0x7f0edc026ca0, 
  _freeres_size = 139701797257336, _mode = 0, _unused2 = "\"byt%-%O:%M:%S%F\000\060\067\060"}

(gdb) print *(fp._lock)
$8 = {lock = 2, cnt = 1, owner = 0x7f0fac24f700}

Thread 0x7f0fac24f700 == gdb thread 71

Comment 2 Carlos O'Donell 2013-01-31 20:34:03 UTC

Alex,

Thanks for the report, and I'm sorry to hear you're having a deadlock issue.

Can you please confirm that the situation you see is as follows?

Thread A:

* Calls fork() which runs atfork handlers, including locking the arenas via the implementation's malloc atfork handler ptmalloc_lock_all() (ensures malloc is in a consistent state across the fork)...
* ... Tries to take the IO lock via _IO_list_lock()
-> _IO_lock_lock(list_all_lock);

Thread B:

* Likely calls _IO_flush_all()
-> _IO_flush_all_lockp (do_lock=1)
-> _IO_lock_lock (list_all_lock); ...
* ... Which then attempts to take all open file locks and flush those files.

Thread C:

* Likely calls _IO_getdelim which takes the file lock and then tries to call realloc which requires the arena lock.

So in summary:
Thread A waits on thread B for the list_all_lock lock.
Thread B waits on thread C for the fp file lock.
Thread C waits on thread A for the arena lock.

Please note that your suggestion solution is not correct and only makes the deadlock window smaller. In general you would have to avoid *all* uses of malloc within critical sections that hold file locks, and that is not possible without serious performance penalties.

One possible fix is to have the malloc pthread_atfork handlers take the IO locks and wait for all IO to stop *before* locking all of the memory arenas. This would make the locking look like this:

Thread A:

* Calls fork() which runs atfork handlers:
- _IO_lock_lock (list_all_lock).
- Lock malloc arenas.

Thread B:

* Calls _IO_flush_all():
- _IO_lock_lock (list_all_lock).
- Lock all open files one by one for flushing.

Thread C:

* Calls _IO_getdelim():
- Takes file lock.
- Locks malloc arena to allocate memory.

In this case A and B acquire _IO_lock_lock before locking the arenas, and that ensures that other threads are able to make forward progress or blocked, but not deadlocked.

This is a more difficult fix since it requires refactoring the malloc pthread_atfork handlers into more generic code which does two things:
* Puts a stop to all IO.
* Puts a stop to all allocations.
... and artificially run this handler last.

Changing the code in this way could potentially delay the fork() indefinitely as the forking thread must wait for the IO lock. However the alternative is a deadlock. If the application ensures no IO is in progress it's no more latency than normal.

Comments?

Cheers,
Carlos.

Comment 3 Carlos O'Donell 2013-01-31 21:06:54 UTC

Note that the workaround is to ensure that all of your threads have *stopped* doing IO before issuing the fork. We understand that this might not be possible in all cases. However, I thought I should mention it since you might have control over the application.

Comment 5 alex_fiddler 2013-01-31 22:33:56 UTC

Carlos,
Your summary of the problem is accurate, except that the thread is stuck on malloc (iogetdelim.c:68), not realloc (iogetdelim.c:108).  I am attaching below the stack backtrace of the thread that is performing _IO_getdelim (I am sorry I missed it in the orginal problem description).


Thread 71 (Thread 0x7f0fac24f700 (LWP 14026)):

#0  __lll_lock_wait_private () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:97
#1  0x00007f0ff139cae8 in _L_lock_9164 () at malloc.c:3503
#2  0x00007f0ff139a482 in *__GI___libc_malloc (bytes=139701797257248) at malloc.c:3657
#3  0x00007f0ff1387f59 in _IO_getdelim (lineptr=0x7f0fac24e260, n=0x7f0fac24e268, delimiter=10, fp=0x7f0edc05be90) at iogetdelim.c:68
#4  0x00007f0ff538a909 in getline (__stream=<optimized out>, __n=<optimized out>, __lineptr=<optimized out>) at /usr/include/bits/stdio.h:114
#5  load_1 (ms=0x7f0edc000950, action=<optimized out>, fn=<optimized out>, errs=<optimized out>, marray=<optimized out>, marraycount=<optimized out>) at apprentice.c:690
#6  0x00007f0ff538c7fa in apprentice_load (ms=0x7f0edc000950, magicp=<optimized out>, nmagicp=<optimized out>, fn=0x7f0edc05c980 "/opt/.../magic", action=0) at apprentice.c:825
#7  0x00007f0ff538d729 in apprentice_1 (mlist=<optimized out>, action=<optimized out>, fn=<optimized out>, ms=<optimized out>) at apprentice.c:288
#8  file_apprentice (ms=<optimized out>, fn=<optimized out>, action=<optimized out>) at apprentice.c:385
#9  0x00007f0ff53896d8 in magic_load (ms=0x7f0edc000020, magicfile=<optimized out>) at magic.c:311
#10 0x00007f0fe88da09a in ...
#11 0x00007f0fe88da555 in ...
#12 0x00007f0ff9f0f28f in ...
#13 0x00007f0ffd75dcf7 in ...
#14 0x00007f0ffcf7b1f1 in ...
#15 0x00007f0ffcf7bff8 in ...
#16 0x00007f0ffcf51638 in ...
#17 0x00007f0ffcf55bb5 in ...
#18 0x00007f0ffcf5af9e in ...
#19 0x0000000000408c4d in ...
#20 0x00007f0ff4ee0184 in g_thread_create_proxy (data=0x7f0fd40086d0) at gthread.c:635
#21 0x00007f0ff319d7f1 in start_thread (arg=0x7f0fac24f700) at pthread_create.c:301
#22 0x00007f0ff140670d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115



You're right saying that moving (iogetdelim.c:68) outside of section holding the lock on fp file lock delays the problem, since we can still hit realloc (iogetdelim.c:108) while holding the fp lock.

Reversing the order of the locks as you suggested makes sense, since this would cover more cases.

Regarding your suggestion for the workaround where all the threads stop IO before forking, in our case this is simply not feasable.  We can have up to 1000 threads in our app, and synchronizing all IO and forking would kill the performance.

Comment 6 Carlos O'Donell 2013-01-31 22:57:37 UTC

(In reply to comment #5)
> Carlos,
> Your summary of the problem is accurate, except that the thread is stuck on
> malloc (iogetdelim.c:68), not realloc (iogetdelim.c:108).  I am attaching
> below the stack backtrace of the thread that is performing _IO_getdelim (I
> am sorry I missed it in the orginal problem description).
 
Thanks for the quick followup.

I'm glad you agree with the summary.

Note that there were two paths in _IO_getdelmin, one with malloc, the other with realloc, both attempt to lock the memory arena lock which was already locked by the malloc pthread_atfork handler.
 
> You're right saying that moving (iogetdelim.c:68) outside of section holding
> the lock on fp file lock delays the problem, since we can still hit realloc
> (iogetdelim.c:108) while holding the fp lock.

Correct.

> Regarding your suggestion for the workaround where all the threads stop IO
> before forking, in our case this is simply not feasable.  We can have up to
> 1000 threads in our app, and synchronizing all IO and forking would kill the
> performance.

The other alternative is to do unbuffered IO and recompile lua to stop calling fflush(NULL) before popen().

I should have clarified that this is stream-related IO using FILE pointers not normal IO using file descriptors.

Please note that you already incur a synchronization penalty by forking. The forking thread takes the libio list_all_lock and that will block *all* other threads from doing any libio operations which may make the stream IO state inconsistent across the fork. Generally this is limited to operations like "flush all streams" (walks master file list), "open FILE" (adds to master file list) etc.

Cheers,
Carlos.

Comment 7 Carlos O'Donell 2013-02-04 21:28:04 UTC

I've had a couple of upstream conversations regarding this deadlock and it's going to be too invasive and risky for RHEL 6.5, so I'm moving this to RHEL 7.0 where we'll have a chance to do the appropriate validation and verification.

If the priority of this issue changes for you, please update this issue.

Comment 13 Albert Flügel 2016-01-01 22:48:58 UTC

I think i ran into the same or a similar problem with a program of mine, that was working until and including Fedora-22. Now, on Fedora-23 it gets stuck after some fractions of a second. It is a service, that accepts TCP connections in threads, then depending on the request it may fork and run another program.
On Redhat 5 and 6 it is working since years and now is being broken by some change. On my Fedora 23 the glibc and pthread is from the package glibc-2.22-6.fc23.i686 . The machine is a Pentium single processor, even there it happens, so i think there something really bad has been introduced. When stuck, the program has around 40 threads. I see two threads with this fraction of a stack dump:
# gstack 26558
Thread 1 (Thread 0xa8be6b40 (LWP 26558)):
#0  0xb7795bc8 in __kernel_vsyscall ()
#1  0xb73d2392 in __lll_lock_wait_private () from /lib/libc.so.6
#2  0xb733d723 in ptmalloc_lock_all () from /lib/libc.so.6
#3  0xb7384eef in fork () from /lib/libc.so.6
# gstack 25500
Thread 1 (Thread 0xb67e3b40 (LWP 25500)):
#0  0xb7795bc8 in __kernel_vsyscall ()
#1  0xb73d2392 in __lll_lock_wait_private () from /lib/libc.so.6
#2  0xb733d6f3 in ptmalloc_lock_all () from /lib/libc.so.6
#3  0xb7384eef in fork () from /lib/libc.so.6
and ALL threads of the process are either in __lll_lock_wait or in __lll_lock_wait_private. I don't see any thread left holding the lock and doing something else.
This cannot be sane. Frankly i don't know if this is exactly the same thing like reported in this bug. However, one more time (i don't count it anymore how often) over the years i update RPMs and something in my programs (in this case emu used in the openvenus monitoring) breaks and i have no idea how long i have to wait until it is fixed again. Worse, here i have no instant idea how to workaround.

Comment 14 Carlos O'Donell 2016-01-04 22:02:51 UTC

(In reply to Albert Flügel from comment #13)
> I think i ran into the same or a similar problem with a program of mine,
> that was working until and including Fedora-22. Now, on Fedora-23 it gets
> stuck after some fractions of a second. It is a service, that accepts TCP
> connections in threads, then depending on the request it may fork and run
> another program.
> On Redhat 5 and 6 it is working since years and now is being broken by some
> change. On my Fedora 23 the glibc and pthread is from the package
> glibc-2.22-6.fc23.i686 . The machine is a Pentium single processor, even
> there it happens, so i think there something really bad has been introduced.
> When stuck, the program has around 40 threads. I see two threads with this

This deadlock and others exist in rhel5, rhel6, and likely rhel7. To some extent we can fix this in rhel6, and rhel7. However, rhel5 is out of scope given the production phase 3 for the release.

We will look into these changes for rhel-7.3 to see what we can do.

Comment 16 Florian Weimer 2016-01-05 16:45:06 UTC

Rough test case posted upstream.

Comment 18 Florian Weimer 2016-02-10 21:50:14 UTC

Patch posted upstream for review:

https://sourceware.org/ml/libc-alpha/2016-02/msg00267.html

Comment 19 Florian Weimer 2016-04-14 08:26:18 UTC

Upstream commit:

commit 29d794863cd6e03115d3670707cc873a9965ba92
Author: Florian Weimer <fweimer>
Date:   Thu Apr 14 09:17:02 2016 +0200

    malloc: Run fork handler as late as possible [BZ #19431]

Comment 34 errata-xmlrpc 2017-08-01 18:06:55 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:1916