Bug 1430223 - In some conditions, tcmalloc memalign will segfault
Summary: In some conditions, tcmalloc memalign will segfault
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Fedora
Classification: Fedora
Component: gperftools
Version: 26
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Tom "spot" Callaway
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-03-08 04:39 UTC by wibrown@redhat.com
Modified: 2018-07-26 20:25 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-04-20 22:12:19 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description wibrown@redhat.com 2017-03-08 04:39:22 UTC
Description of problem:

This works with glibc posix_memalign. When linking tcmalloc to override the weak posix_memalign symbol, in some conditions (generally during a stress test) the following segmentation fault is experienced. I can reproduce this 100% of the time. This may be due to an incorrect check in the freelist under high memalign/free pressure. 

Thread 22 "lt-benchmark_pa" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff3429700 (LWP 15025)]
tcmalloc::SLL_Pop (list=0x6fda48) at src/linked_list.h:59
59	  *list = SLL_Next(*list);
(gdb) bt
#0  tcmalloc::SLL_Pop (list=0x6fda48) at src/linked_list.h:59
#1  tcmalloc::ThreadCache::FreeList::Pop (this=<optimized out>) at src/thread_cache.h:212
#2  tcmalloc::ThreadCache::Allocate (cl=<optimized out>, size=<optimized out>, this=<optimized out>) at src/thread_cache.h:371
#3  (anonymous namespace)::do_memalign (align=align@entry=64, size=<optimized out>, size@entry=80) at src/tcmalloc.cc:1462
#4  0x00007ffff7778669 in (anonymous namespace)::do_memalign_or_cpp_memalign (size=80, align=64) at src/tcmalloc.cc:1131
#5  tc_posix_memalign (result_ptr=result_ptr@entry=0x7ffff3428be0, align=align@entry=64, size=size@entry=80) at src/tcmalloc.cc:1781
#6  0x00007ffff79baa8f in sds_memalign (size=size@entry=80, alignment=alignment@entry=64) at /home/william/development/389ds/ds/src/libsds/sds/core/utils.c:110
#7  0x00007ffff79bfd20 in sds_bptree_txn_create (binst=binst@entry=0xe3c100) at /home/william/development/389ds/ds/src/libsds/sds/bpt_cow/txn.c:31
#8  0x00007ffff79bfecd in sds_bptree_cow_wrtxn_begin (binst=0xe3c100, btxn=0x7ffff3428c78) at /home/william/development/389ds/ds/src/libsds/sds/bpt_cow/txn.c:292
#9  0x00000000004022c4 in bptree_cow_write_begin (inst=<optimized out>, write_txn=<optimized out>) at /home/william/development/389ds/ds/src/libsds/test/benchmark_parwrap.c:141
#10 0x0000000000402865 in batch_insert (info=0x7fffffffe1c0) at /home/william/development/389ds/ds/src/libsds/test/benchmark_par.c:181
#11 0x0000000000402e3b in bench_thread_batch (arg=0x7fffffffe1c0) at /home/william/development/389ds/ds/src/libsds/test/benchmark_par.c:232
#12 0x00007ffff7122e8b in _pt_root (arg=0xe3e240) at ../../../nspr/pr/src/pthreads/ptthread.c:216
#13 0x00007ffff66e136d in start_thread (arg=0x7ffff3429700) at pthread_create.c:456
#14 0x00007ffff6c235bf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:97

Comment 2 Paolo Bonzini 2017-03-08 16:10:47 UTC
This backtrace seems to come from gperftools 2.5 (Fedora), not RHEL.

Comment 3 Florian Weimer 2017-03-14 12:52:39 UTC
(In reply to wibrown from comment #0)
> Description of problem:
> 
> This works with glibc posix_memalign. When linking tcmalloc to override the
> weak posix_memalign symbol, in some conditions (generally during a stress
> test) the following segmentation fault is experienced. I can reproduce this
> 100% of the time. This may be due to an incorrect check in the freelist
> under high memalign/free pressure. 

Do you have a small, obviously correct test case which exposes this problem?

The stronger barrier in the glibc malloc could hide an concurrency bug in the application, so this is not necessarily a tcmalloc issue.

Comment 4 Brad Hubbard 2017-09-22 01:44:03 UTC
Possibly (remotely?) related to bz1494309 ?

Comment 5 mreynolds 2018-04-20 22:12:19 UTC
Closing bug, 389-ds-base is no longer going to use tcmalloc because gperftools is going away.

Comment 6 Tom "spot" Callaway 2018-04-23 17:43:09 UTC
gperftools is going away? Upstream is still making releases, perhaps you mean "will not be in future RHEL builds"?

Comment 7 mreynolds 2018-04-23 17:55:32 UTC
(In reply to Tom "spot" Callaway from comment #6)
> gperftools is going away? Upstream is still making releases, perhaps you
> mean "will not be in future RHEL builds"?

My understanding is that it "is" going away.  The glibc team is no longer maintaining it (or is going to stop maintaining it very soon), and we've all been told to stop using it:

https://bugzilla.redhat.com/show_bug.cgi?id=1496871
https://bugzilla.redhat.com/show_bug.cgi?id=1496872

These are RHEL bugs, but if no one is maintaining it and no one is supporting it, then it's dead IMO.  I could be wrong, this is just my understanding of the situation.  Anyway in 389-ds-base we plan to bundle jemalloc starting in F28.

Comment 8 Tom "spot" Callaway 2018-04-23 17:57:58 UTC
There were commits upstream 10 days ago:
https://github.com/gperftools/gperftools/commits/master

and an RC release in March:
https://github.com/gperftools/gperftools/releases

I was not under the impression that glibc was _ever_ the maintainer for this.

Comment 9 Tom "spot" Callaway 2018-04-23 18:01:24 UTC
Replying to myself, but it appears that the request is to use the new per-thread local cache implementation within glibc, instead of using tcmalloc. This is different from "gperftools" is dead/going away.

Comment 10 mreynolds 2018-04-23 18:59:51 UTC
I apologize for the bad terminology.  I misunderstood what was really happening.  It looks like the only reason behind all of this was that gperftools was being dropped from RHEL.

Comment 11 Florian Weimer 2018-04-24 08:45:45 UTC
(In reply to mreynolds from comment #7)
> (In reply to Tom "spot" Callaway from comment #6)
> > gperftools is going away? Upstream is still making releases, perhaps you
> > mean "will not be in future RHEL builds"?
> 
> My understanding is that it "is" going away.  The glibc team is no longer
> maintaining it (or is going to stop maintaining it very soon), and we've all
> been told to stop using it:

The glibc team does not maintain the alternative allocators, and never has.  We simply do not have expertise in this area.

> https://bugzilla.redhat.com/show_bug.cgi?id=1496871
> https://bugzilla.redhat.com/show_bug.cgi?id=1496872
> 
> These are RHEL bugs, but if no one is maintaining it and no one is
> supporting it, then it's dead IMO.

For Fedora, we are fine as long as there is an active upstream.  I'm not sure if we have expertise in Fedora to diagnose and fix issues in tcmalloc, but even if the expertise is missing, we can still ship it as a leaf package in case someone wants to self-support, or use it for upstream projects where it is the default.

The challenge for Fedora is our selection of architectures, and some upstream projects might unconditionally enable allocators for all architectures, but have only tested their usage on common ones such as x86-64.


Note You need to log in before you can comment on or make changes to this bug.