Bug 1276753 - malloc: arena free list can become cyclic, increasing contention
malloc: arena free list can become cyclic, increasing contention
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: glibc (Show other bugs)
7.1
All Linux
high Severity high
: rc
: 7.3
Assigned To: Florian Weimer
Arjun Shankar
Marc Muehlfeld
: Patch
: 1297423 1330623 (view as bug list)
Depends On:
Blocks: 1203710 1230910 1297579 1313485 1213541 1284959 1364088
  Show dependency treegraph
 
Reported: 2015-10-30 14:10 EDT by Paulo Andrade
Modified: 2016-11-03 04:27 EDT (History)
28 users (show)

See Also:
Fixed In Version: glibc-2.17-156.el7
Doc Type: Bug Fix
Doc Text:
Core C library (glibc) enhanced to increase *malloc()* scalability A defect in the implementation of the *malloc()* function could result in unnecessary serialization of memory allocation requests across threads. This update fixes the bug and substantially increases the concurrent throughput of allocation requests for applications that frequently create and destroy threads.
Story Points: ---
Clone Of: 1264189
Environment:
Last Closed: 2016-11-03 04:27:05 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
IBM Linux Technology Center 133482 None None None 2016-01-11 10:08 EST
Sourceware 19048 None None None Never
Sourceware 19182 None None None 2016-01-11 10:10 EST
Sourceware 19243 None None None 2016-01-11 10:11 EST
Sourceware 20370 None None None 2016-07-14 10:53 EDT
Red Hat Product Errata RHSA-2016:2573 normal SHIPPED_LIVE Low: glibc security, bug fix, and enhancement update 2016-11-03 08:05:56 EDT

  None (edit)
Comment 14 Carlos O'Donell 2016-01-11 10:08:40 EST
I'm opening this bug up more publicly and including IBM.

We plan to deliver a fix for this bug in rhel-7.3.

This bug was created to track the backport of the patches to fix upstream sourceware bug 19048.

The fixes to be backported are as follows:
---
commit 1bd5483e104c8bde6e61dc5e3f8a848bc861872d
Author: Florian Weimer <fweimer@redhat.com>
Date:   Tue Dec 29 20:32:35 2015 +0100

    malloc: Test various special cases related to allocation failures
    
    This test case exercises unusual code paths in allocation functions,
    related to allocation failures.  Specifically, the test can reveal
    the following bugs:
    
    (a) calloc returns non-zero memory on fallback to sysmalloc.
    (b) calloc can self-deadlock because it fails to release
        the arena lock on certain allocation failures.
    (c) pvalloc can dereference a NULL arena pointer.
    
    (a) and (b) appear specific to a faulty downstream backport.
    (c) was fixed as part of commit 10ad46bc6526edc5c7afcc57112da96917ff3629.
    
    The test for (a) was inspired by a reproducer supplied by Jeff Layton.
---
commit 7962541a32eff5597bc4207e781cfac8d1bb0d87
Author: Florian Weimer <fweimer@redhat.com>
Date:   Wed Dec 23 17:23:33 2015 +0100

    malloc: Update comment for list_lock
---
commit 90c400bd4904b0240a148f0b357a5cbc36179239
Author: Florian Weimer <fweimer@redhat.com>
Date:   Mon Dec 21 16:42:46 2015 +0100

    malloc: Fix list_lock/arena lock deadlock [BZ #19182]
    
        * malloc/arena.c (list_lock): Document lock ordering requirements.
        (free_list_lock): New lock.
        (ptmalloc_lock_all): Comment on free_list_lock.
        (ptmalloc_unlock_all2): Reinitialize free_list_lock.
        (detach_arena): Update comment.  free_list_lock is now needed.
        (_int_new_arena): Use free_list_lock around detach_arena call.
        Acquire arena lock after list_lock.  Add comment, including FIXME
        about incorrect synchronization.
        (get_free_list): Switch to free_list_lock.
        (reused_arena): Acquire free_list_lock around detach_arena call
        and attached threads counter update.  Add two FIXMEs about
        incorrect synchronization.
        (arena_thread_freeres): Switch to free_list_lock.
        * malloc/malloc.c (struct malloc_state): Update comments to
        mention free_list_lock.
---
commit 3da825ce483903e3a881a016113b3e59fd4041de
Author: Florian Weimer <fweimer@redhat.com>
Date:   Wed Dec 16 12:39:48 2015 +0100

    malloc: Fix attached thread reference count handling [BZ #19243]
    
    reused_arena can increase the attached thread count of arenas on the
    free list.  This means that the assertion that the reference count is
    zero is incorrect.  In this case, the reference count initialization
    is incorrect as well and could cause arenas to be put on the free
    list too early (while they still have attached threads).
    
        * malloc/arena.c (get_free_list): Remove assert and adjust
        reference count handling.  Add comment about reused_arena
        interaction.
        (reused_arena): Add comments abount get_free_list interaction.
        * malloc/tst-malloc-thread-exit.c: New file.
        * malloc/Makefile (tests): Add tst-malloc-thread-exit.
        (tst-malloc-thread-exit): Link against libpthread.
---
commit 400e12265d99964f8445bb6d717321eb73152cc5
Author: Florian Weimer <fweimer@redhat.com>
Date:   Tue Nov 24 16:37:15 2015 +0100

    Replace MUTEX_INITIALIZER with _LIBC_LOCK_INITIALIZER in generic code
    
        * sysdeps/mach/hurd/libc-lock.h (_LIBC_LOCK_INITIALIZER): Define.
        (__libc_lock_define_initialized): Use it.
        * sysdeps/nptl/libc-lockP.h (_LIBC_LOCK_INITIALIZER): Define.
        * malloc/arena.c (list_lock): Use _LIBC_LOCK_INITIALIZER.
        * malloc/malloc.c (main_arena): Likewise.
        * sysdeps/generic/malloc-machine.h (MUTEX_INITIALIZER): Remove.
        * sysdeps/nptl/malloc-machine.h (MUTEX_INITIALIZER): Remove.
---
commit a62719ba90e2fa1728890ae7dc8df9e32a622e7b
Author: Florian Weimer <fweimer@redhat.com>
Date:   Wed Oct 28 19:32:46 2015 +0100

    malloc: Prevent arena free_list from turning cyclic [BZ #19048]
    
        [BZ# 19048]
        * malloc/malloc.c (struct malloc_state): Update comment.  Add
        attached_threads member.
        (main_arena): Initialize attached_threads.
        * malloc/arena.c (list_lock): Update comment.
        (ptmalloc_lock_all, ptmalloc_unlock_all): Likewise.
        (ptmalloc_unlock_all2): Reinitialize arena reference counts.
        (deattach_arena): New function.
        (_int_new_arena): Initialize arena reference count and deattach
        replaced arena.
        (get_free_list, reused_arena): Update reference count and deattach
        replaced arena.
        (arena_thread_freeres): Update arena reference count and only put
        unreferenced arenas on the free list.

---
commit 6782806d8f6664d87d17bb30f8ce4e0c7c931e17
Author: Florian Weimer <fweimer@redhat.com>
Date:   Sat Oct 17 12:06:48 2015 +0200

    malloc: Rewrite with explicit TLS access using __thread
---
Comment 15 Carlos O'Donell 2016-01-11 10:11:04 EST
*** Bug 1297423 has been marked as a duplicate of this bug. ***
Comment 24 Florian Weimer 2016-04-27 13:31:32 EDT
Note: I will make this bug public soon so that others can comment if they feel so inclined.
Comment 25 Florian Weimer 2016-04-27 13:33:59 EDT
*** Bug 1330623 has been marked as a duplicate of this bug. ***
Comment 26 Sumeet Keswani 2016-04-27 13:39:01 EDT
In our case (vertica database server) once the arena freelist goes circular, 
it affects the application moving forward independent of concurrency (as the application becomes sick).

This causes significant performance degradation in high concurrency situations.

We tested the efficacy of the patch posted on sourceware internally by (re)building glibc and the patch was stable and improved performance under concurrent load.
A couple customers have tested it not just for stability (it is) but also for performance (it helps).
Comment 27 David Linden 2016-04-27 13:50:08 EDT
What is the target release date for 7.3, thus glibc-2.17-131.el7 will be available?

What are the prospects for publishing glibc-2.17-131.el7 as an update prior to 7.3?

What are the prospects of backporting this to the glibc-2.12 stream for RHEL6?
Comment 28 Florian Weimer 2016-04-27 13:53:02 EDT
(In reply to Sumeet Keswani from comment #26)
> In our case (vertica database server) once the arena freelist goes circular, 
> it affects the application moving forward independent of concurrency (as the
> application becomes sick).
> 
> This causes significant performance degradation in high concurrency
> situations.
> 
> We tested the efficacy of the patch posted on sourceware internally by
> (re)building glibc and the patch was stable and improved performance under
> concurrent load.
> A couple customers have tested it not just for stability (it is) but also
> for performance (it helps).

You should see a similar performance improvement on Red Hat Enterprise Linux 6.8 Beta, where we fixed this issue as bug 1264189 (currently private).
Comment 30 Sumeet Keswani 2016-06-01 13:44:47 EDT
is this fix included in glibc-2.12-1.192 ?

does not show up in this advisory? (RHBA-2016:0834-1)
https://rhn.redhat.com/errata/RHBA-2016-0834.html

How can users on RHEL 6.X get this fix?
Comment 31 Sumeet Keswani 2016-06-01 13:47:39 EDT
can i get access to BZ 1264189
Comment 32 Joseph Kachuck 2016-06-01 13:55:55 EDT
Hello,
I have requested HPE access to BZ 1264189.
Please note this BZ was closed with errata:
https://rhn.redhat.com/errata/RHBA-2016-0834.html

Thank You
Joe Kachuck
Comment 33 Sumeet Keswani 2016-06-01 14:04:04 EDT
Its not listed in the errata (RHBA-2016:0834-1) hence i was not certain how to point users to that for a fix.
Comment 34 Florian Weimer 2016-06-01 14:23:44 EDT
(In reply to Sumeet Keswani from comment #33)
> Its not listed in the errata (RHBA-2016:0834-1) hence i was not certain how
> to point users to that for a fix.

This bug was fixed with RHBA-2016:0834-1 for Red Hat Enterprise Linux 6.8, under bug 1264189.  That bug largely consists of private comments and is not very illuminating to external parties as a result.
Comment 35 Arjun Shankar 2016-07-14 08:02:42 EDT
The upstream bug (https://sourceware.org/bugzilla/show_bug.cgi?id=19048) had a test.c and a check-free_list.sh script attached to it. Running the script against a running instance of the test exposes the bug on ppc64 and s390x (not sure why not on other architectures) even on the patched glibc. So this needs to be looked at again. Florian's doing that right now.
Comment 41 Georg Markgraf 2016-09-09 04:44:36 EDT
Arjun,  is this verified on both architectures, ppc64 and s390x ?
Comment 42 Martin Cermak 2016-09-09 11:20:35 EDT
(In reply to Georg Markgraf from comment #41)
> Arjun,  is this verified on both architectures, ppc64 and s390x ?

Georg, yes the verification happened on all the rhel-7 supported architectures, incl ppc64 and s390x.
Comment 43 IBM Bug Proxy 2016-09-12 13:31:12 EDT
------- Comment From MSTRUBEL@de.ibm.com 2016-09-12 13:27 EDT-------
I ran the arena tests (from glibc bugzilla) and did straces with and without TRIM option enabled using old & new glibc version. It seems the patches are effective.

Thank you very much for taking care of this issue,

I think, that this BZ can be closed now.

best regards
Matthias Strubel
Comment 44 Florian Weimer 2016-09-12 13:35:06 EDT
(In reply to IBM Bug Proxy from comment #43)
> ------- Comment From MSTRUBEL@de.ibm.com 2016-09-12 13:27 EDT-------
> I ran the arena tests (from glibc bugzilla) and did straces with and without
> TRIM option enabled using old & new glibc version. It seems the patches are
> effective.
> 
> Thank you very much for taking care of this issue,

Thank you for the additional testing.

> I think, that this BZ can be closed now.

This bug will be closed automatically once we ship the update as part of Red Hat Enterprise Linux 7.3.
Comment 46 errata-xmlrpc 2016-11-03 04:27:05 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2016-2573.html

Note You need to log in before you can comment on or make changes to this bug.