Bug 1109309 - gperftools self-deadlock on ARM
Summary: gperftools self-deadlock on ARM
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: gperftools
Version: 22
Hardware: armhfp
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Tom "spot" Callaway
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: ARMTracker
TreeView+ depends on / blocked
 
Reported: 2014-06-13 15:42 UTC by Jerry James
Modified: 2016-07-19 19:02 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-07-19 19:02:12 UTC


Attachments (Terms of Use)

Description Jerry James 2014-06-13 15:42:36 UTC
Description of problem:
The cvc4 package is currently failing to build on ARM because one of the tests locks up (builds/armv7hl-redhat-linux-gnueabi/default-proof/test/unit/expr/.libs/lt-expr_public, for future reference).  I have obtained a backtrace of the locked up process, and it hasn't even managed to run main() yet.  The problem appears to be a self-deadlock inside the gperftools code.  There is one thread.  The backtrace is:

#0  0x40ec34b8 in nanosleep () at ../sysdeps/unix/syscall-template.S:81
#1  0x40bc44d8 in base::internal::SpinLockDelay (
    w=w@entry=0x40c32d3c <tcmalloc::Static::pageheap_lock_>, value=1074814640, 
    loop=loop@entry=8951898) at src/base/spinlock_linux-inl.h:90
#2  0x40bc4348 in SpinLock::SlowLock (
    this=this@entry=0x40c32d3c <tcmalloc::Static::pageheap_lock_>)
    at src/base/spinlock.cc:133
#3  0x40bf8830 in Lock (this=<optimized out>) at src/base/spinlock.h:71
#4  SpinLockHolder (l=<optimized out>, this=<synthetic pointer>)
    at src/base/spinlock.h:136
#5  tcmalloc::ThreadCache::InitModule () at src/thread_cache.cc:315
#6  0x40c07f5c in GetCache () at src/thread_cache.h:420
#7  do_malloc_no_errno (size=360) at src/tcmalloc.cc:1112
#8  do_malloc (size=360) at src/tcmalloc.cc:1119
#9  do_malloc_or_cpp_alloc (size=360) at src/tcmalloc.cc:1039
#10 tc_malloc (size=size@entry=360) at src/tcmalloc.cc:1578
#11 0x40e7ed94 in __fopen_internal (
    filename=0x40100310 "/lib/libprofiler.so.0", mode=0x4108d79c "r", is32=1)
    at iofopen.c:73
#12 0x40e7ee28 in _IO_new_fopen (filename=<optimized out>, 
    mode=<optimized out>) at iofopen.c:103
#13 0x4108b0a0 in load_debug_frame (file=0x4108b0a0 <load_debug_frame+80> "", 
    buf=0xbe8f1384, buf@entry=0xfffffff4, bufsize=0xbe8f1388, 
    bufsize@entry=0xfffffff8, is_local=1) at dwarf/Gfind_proc_info-lsb.c:111
#14 0x4108bbb8 in locate_debug_info (as=0x0, addr=3197047696, 
    addr@entry=1086073191, 
    dlname=dlname@entry=0x40100310 "/lib/libprofiler.so.0", start=1091150128, 
    start@entry=1091093252, end=1086146488, end@entry=1074791184)
    at dwarf/Gfind_proc_info-lsb.c:309
#15 0x4108bd24 in _ULarm_dwarf_find_debug_frame (found=0, 
    di_debug=di_debug@entry=0xbe8f256c, ip=ip@entry=1086073191, 
    segbase=1086046208, obj_name=0x40100310 "/lib/libprofiler.so.0", 
    start=start@entry=1086046208, end=end@entry=1086146488)
    at dwarf/Gfind_proc_info-lsb.c:389
#16 0x4108c304 in _ULarm_dwarf_callback (info=0xbe8f3664, 
    size=<optimized out>, ptr=<optimized out>)
    at dwarf/Gfind_proc_info-lsb.c:704
#17 0x40f3f0b8 in __GI___dl_iterate_phdr (callback=0x0, data=0xbe8f252c, 
    data@entry=0x0) at dl-iteratephdr.c:76
#18 0x410883c8 in _ULarm_find_proc_info (as=0x4109a130 <local_addr_space>, 
    ip=ip@entry=1086073191, pi=pi@entry=0xbe8f3664, 
    need_unwind_info=need_unwind_info@entry=1, arg=0xbe8f33fc)
    at arm/Gex_tables.c:523
#19 0x410899d4 in fetch_proc_info (c=c@entry=0xbe8f343c, ip=1086073191, 
    need_unwind_info=need_unwind_info@entry=1) at dwarf/Gparser.c:422
#20 0x4108a9dc in _ULarm_dwarf_find_save_locs (c=c@entry=0xbe8f343c)
    at dwarf/Gparser.c:863
#21 0x4108b014 in _ULarm_dwarf_step (c=c@entry=0xbe8f343c) at dwarf/Gstep.c:34
#22 0x41087628 in _ULarm_step (cursor=0xbe8f343c) at arm/Gstep.c:183
#23 0x40bc2990 in GetStackTrace_libunwind (result=0x170a008, max_depth=30, 
    skip_count=<optimized out>) at src/stacktrace_libunwind-inl.h:118
#24 0x40bc31ac in GetStackTrace (result=result@entry=0x170a008, 
    max_depth=max_depth@entry=30, skip_count=skip_count@entry=3)
    at src/stacktrace.cc:232
#25 0x40bf5848 in RecordGrowth (growth=1048576) at src/page_heap.cc:584
#26 tcmalloc::PageHeap::GrowHeap (this=0x40bd733c, n=<optimized out>)
    at src/page_heap.cc:610
#27 0x40bf5bdc in tcmalloc::PageHeap::New (this=0x174a000, n=n@entry=1)
    at src/page_heap.cc:156
#28 0x40bf4484 in tcmalloc::CentralFreeList::Populate (
    this=this@entry=0x40c26098 <tcmalloc::Static::central_cache_+640>)
    at src/central_freelist.cc:329
#29 0x40bf46d0 in tcmalloc::CentralFreeList::FetchFromOneSpansSafe (
    this=0x40c26098 <tcmalloc::Static::central_cache_+640>, N=1, 
    start=0xbe8f764c, end=0xbe8f7650) at src/central_freelist.cc:284
#30 0x40bf4780 in tcmalloc::CentralFreeList::RemoveRange (
    this=0x40c26098 <tcmalloc::Static::central_cache_+640>, start=0xbe8f764c, 
    start@entry=0xbe8f7644, end=0xbe8f7650, end@entry=0xbe8f7648, N=1)
    at src/central_freelist.cc:264
#31 0x40bf7b08 in tcmalloc::ThreadCache::FetchFromCentralCache (
    this=this@entry=0x176a6e8, cl=1, byte_size=byte_size@entry=8)
    at src/thread_cache.cc:166
#32 0x40c07e8c in Allocate (cl=<optimized out>, size=<optimized out>, 
    this=<optimized out>) at src/thread_cache.h:365
#33 do_malloc_small (size=<optimized out>, heap=<optimized out>)
    at src/tcmalloc.cc:1103
#34 do_malloc_no_errno (size=1) at src/tcmalloc.cc:1112
#35 do_malloc (size=1) at src/tcmalloc.cc:1119
#36 do_malloc_or_cpp_alloc (size=1) at src/tcmalloc.cc:1039
#37 tc_malloc (size=size@entry=1) at src/tcmalloc.cc:1578
#38 0x40be9204 in TCMallocGuard::TCMallocGuard (
    this=0x40c1fbc8 <module_enter_exit_hook>) at src/tcmalloc.cc:921
#39 0x40be5c74 in __static_initialization_and_destruction_0 (__initialize_p=1, 
    __priority=65535) at src/tcmalloc.cc:953
#40 _GLOBAL__sub_I_tcmalloc.cc(void) () at src/tcmalloc.cc:1742
#41 0x400ed7f0 in call_init (l=<optimized out>, argc=argc@entry=1, 
    argv=argv@entry=0xbe8f7754, env=env@entry=0xbe8f775c) at dl-init.c:76
#42 0x400ed94c in call_init (env=<optimized out>, argv=<optimized out>, 
    argc=<optimized out>, l=<optimized out>) at dl-init.c:34
#43 _dl_init (main_map=0x40106908, argc=1, argv=0xbe8f7754, env=0xbe8f775c)
    at dl-init.c:124
#44 0x400dcbc4 in _dl_start_user () from /lib/ld-linux-armhf.so.3

In frame 28, we are at src/central_freelist.cc line 329.  One line 328, we acquired Static::pageheap_lock().  In frame 5, we are at thread_cache.cc line 315, attempting to acquire the same lock.  Frames 0 to 4 are the result of periodically waking up to see if the lock is available.  (Also, note that in frame 1, we get to the nanosleep call because "have_futex" is false; is that right?  Are futexes unavailable on ARM?)

Version-Release number of selected component (if applicable):
gperftools-libs-2.2-2.fc21.armv7hl
libunwind-1.1-6.fc21.armv7hl

How reproducible:
Always (so far).

Steps to Reproduce:
1. Try to build the cvc4 package on ARM Rawhide.
2.
3.

Actual results:
One of the tests locks up before main() even starts.

Expected results:
The test should at least reach main().  Anything after that is my problem. :-)

Additional info:

Comment 1 Jerry James 2014-06-13 15:57:40 UTC
After staring at that backtrace for a bit, I think I see what happened.  The gperftools code wants to record the construction of the page heap (frame 27), so it calls into a backend-specific stack trace generator (frame 23), which happens to be libunwind in this case.  The libunwind code then walks the stack to gather its information.  It decides it really needs to look at /lib/libprofiler.so.0 and so calls fopen (frame 11) ... and fopen now calls malloc() (frame 10).

So the bug here is that while the malloc() infrastructure is still setting itself up, it makes a call into libunwind that makes a call into glibc that calls malloc().

Comment 2 Jaroslav Reznik 2015-03-03 16:01:48 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 22 development cycle.
Changing version to '22'.

More information and reason for this action is here:
https://fedoraproject.org/wiki/Fedora_Program_Management/HouseKeeping/Fedora22

Comment 3 Peter Robinson 2015-04-21 23:30:27 UTC
(In reply to Jerry James from comment #1)
> After staring at that backtrace for a bit, I think I see what happened.  The
> gperftools code wants to record the construction of the page heap (frame
> 27), so it calls into a backend-specific stack trace generator (frame 23),
> which happens to be libunwind in this case.  The libunwind code then walks
> the stack to gather its information.  It decides it really needs to look at
> /lib/libprofiler.so.0 and so calls fopen (frame 11) ... and fopen now calls
> malloc() (frame 10).
> 
> So the bug here is that while the malloc() infrastructure is still setting
> itself up, it makes a call into libunwind that makes a call into glibc that
> calls malloc().

cvc4 appears to be built in F-22 so is this still a problem with gperftools 2.4 on ARMv7?

Comment 4 Jerry James 2015-04-22 03:16:04 UTC
Since cvc4 can be built either with or without gperftools, I chose to build without in order to avoid this bug.  I can try building with again at any time to see if the bug is still present.

Comment 5 Peter Robinson 2015-04-22 12:59:34 UTC
A scratch build with it enabled built but it looks like the support for it was disabled over all for other reasons

http://koji.fedoraproject.org/koji/taskinfo?taskID=9536750

Comment 6 Jerry James 2015-04-22 16:26:13 UTC
That scratch build did not include gperftools support.  The build log says:

checking whether to link in google perftools libraries... no (user didn't request it)

You'll need to add --with-google-perftools to the %configure line.

I disabled it all over both because of this bug, and because something is broken on i386.  I was getting wrong answers out of cvc4 on i386.  I tried running under valgrind to see if we had some kind of memory corruption going on, and sure enough, valgrind reported a large number of memory errors, all with backtraces that involved gperftools functions.  Just for kicks, I tried building without gperftools support ... and the memory errors and wrong answers all went away.  Now valgrind doesn't report any problems, and cvc4 gives correct answers.  So either gperftools is broken (in a different way) on i386, or cvc4 is using gperftools incorrectly somehow.  In any case, I decided to build cvc4 without gperftools support to avoid the arm and i386 problems.  Perhaps strangely, cvc4 + gperftools seems to work just fine on x86_64.

Comment 7 Fedora End Of Life 2016-07-19 19:02:12 UTC
Fedora 22 changed to end-of-life (EOL) status on 2016-07-19. Fedora 22 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.