Bug 1112610 - eu-stack: "Callback returned failure" for seemingly OK shared libraries
Summary: eu-stack: "Callback returned failure" for seemingly OK shared libraries
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: elfutils
Version: 20
Hardware: x86_64
OS: Linux
unspecified
low
Target Milestone: ---
Assignee: Mark Wielaard
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-06-24 10:30 UTC by Martin Milata
Modified: 2014-09-25 10:41 UTC (History)
10 users (show)

Fixed In Version: elfutils-0.160-1.fc19
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-07-28 19:04:10 UTC


Attachments (Terms of Use)
Do not rely on link_map.l_addr. (1.34 KB, patch)
2014-06-26 18:05 UTC, Jan Kratochvil
no flags Details | Diff

Description Martin Milata 2014-06-24 10:30:14 UTC
Description of problem:
Some functions (e.g. dwfl_module_getelf) return DWFL_E_CB even though the related library seems to be alright. It can be reproduced by running eu-stack on the provided coredump.

Version-Release number of selected component (if applicable):
elfutils-0.158-4.fc20.x86_64

How reproducible:
Hopefully always.

Steps to Reproduce:
1. Obtain F20 system.
2. # yum install koji mock
3. # usermod -a -G mock user
4. # su - user
5. $ wget http://.../ccpp-callback-failure.tar.xz
6. $ wget http://.../setup-retracing.py
   (I'll provide the actual locations in next comment)
7. $ tar xJvf ccpp-callback-failure.tar.xz
8. $ ./setup-retracing.py ccpp-callback-failure/
   (This will download the RPMs that contain libraries referenced
    by the core dump from koji and unpack them to fedora-20-x86_64
    mock chroot.)
9. $ mock -r fedora-20-x86_64 --copyin ccpp-callback-failure /tmp/ccpp-callback-failure
10. $ mock -r fedora-20-x86_64 --shell
11. <mock># cd /tmp/ccpp-callback-failure && eu-stack --core coredump

Actual results:
PID 1664 - core
(...)
TID 1700:
#0  0x00000031df00bd20 pthread_cond_wait@@GLIBC_2.3.2
#1  0x00000031f9c236b0 PR_WaitCondVar
#2  0x00000031e65470de
eu-stack: dwfl_thread_getframes tid 1700 at 0x31e65470dd in libmozjs-17.0.so: Callback returned failure
TID 1699:
#0  0x00000031df00bd20 pthread_cond_wait@@GLIBC_2.3.2
#1  0x00000031f9c236b0 PR_WaitCondVar
#2  0x00000031e64b0700
eu-stack: dwfl_thread_getframes tid 1699 at 0x31e64b06ff in libmozjs-17.0.so: Callback returned failure
(...)

Expected results:
Complete stack traces for the affected thread.

Additional info:
Running GDB on the core seems to produce complete stack trace.
libmozjs-17.0.so is present and with the same build id as in the core.
It's likely that the bug is not related to unwinding as we get the same error when calling dwfl_module_getelf.

Comment 2 Jan Kratochvil 2014-06-26 18:05:06 UTC
Created attachment 912508 [details]
Do not rely on link_map.l_addr.

Problem 1:
$ eu-stack --core /tmp/ccpp-callback-failure/coredump
#18 0x00007fc661daece8 meta_run
#19 0x0000000000402131
eu-stack: dwfl_thread_getframes tid 1664 at 0x402130 in [exe]: Callback returned failure

Problem 1 is in how ABRT arranges the retracing chroot and/or how ABRT calls eu-stack.  This is not a bug in elfutils.

Solution 1.A:
$ eu-stack -e /usr/bin/gnome-shell --core /tmp/ccpp-callback-failure/coredump
[...]
#18 0x00007fc661daece8 meta_run
#19 0x0000000000402131 main
[...]

Solution 1.B:
$ gdb /tmp/ccpp-callback-failure/coredump
Missing separate debuginfo for the main executable file
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/a3/93a342ffc386364d06e61da0b06c1e1f972eb4
$ yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/a3/93a342ffc386364d06e61da0b06c1e1f972eb4
$ eu-stack --core /tmp/ccpp-callback-failure/coredump
#18 0x00007fc661daece8 meta_run
#19 0x0000000000402131 main


Problem 2:
TID 1700:
#0  0x00000031df00bd20 pthread_cond_wait@@GLIBC_2.3.2
#1  0x00000031f9c236b0 PR_WaitCondVar
#2  0x00000031e65470de
eu-stack: dwfl_thread_getframes tid 1700 at 0x31e65470dd in libmozjs-17.0.so: Callback returned failure
TID 1699:
#0  0x00000031df00bd20 pthread_cond_wait@@GLIBC_2.3.2
#1  0x00000031f9c236b0 PR_WaitCondVar
#2  0x00000031e64b0700
eu-stack: dwfl_thread_getframes tid 1699 at 0x31e64b06ff in libmozjs-17.0.so: Callback returned failure

Solution: Attached l_addr.patch.
It needs a testcase and then I will post it to the elfutils mailing list.  The problem was that user dumping the core file had DSOs prelinked but downloaded DSOs from RPMs are not prelinked - sure an elfutils bug (in my code although not in the unwinder but in DSOs r_debug reader).
TID 1700:
#0  0x00000031df00bd20 pthread_cond_wait@@GLIBC_2.3.2
#1  0x00000031f9c236b0 PR_WaitCondVar
#2  0x00000031e65470de js::SourceCompressorThread::threadLoop()
[...]
TID 1699:
#0  0x00000031df00bd20 pthread_cond_wait@@GLIBC_2.3.2
#1  0x00000031f9c236b0 PR_WaitCondVar
#2  0x00000031e64b0700 js::GCHelperThread::threadLoop()
[...]

Comment 3 Jan Kratochvil 2014-07-28 19:04:10 UTC
Problem 2 fix is now checked in upstream:
  475849fdb25265706772905b856cd7028c566a71

The NEEDINFO:
Jakub Filak has asked for a new rpm release with this fix, IIUC for all Fedoras.

Comment 4 Mark Wielaard 2014-07-28 22:08:20 UTC
I backported the fix to rawhide elfutils-0.159-8.fc22. If that works for you we'll can push that version to older fedora releases too.

Comment 5 Mark Wielaard 2014-09-01 12:05:13 UTC
elfutils 0.160 has been released and included in fedora f21/rawhide. As said in comment #4. Let us know if that works for you then we can see whether to backport it to earlier fedora releases. Thanks.

Comment 6 Martin Milata 2014-09-02 16:12:35 UTC
Verified that 0.160 works fine - thanks for the quick fix! Backport to earlier versions will be much appreciated.

Comment 7 Fedora Update System 2014-09-08 15:05:46 UTC
elfutils-0.160-1.fc20 has been submitted as an update for Fedora 20.
https://admin.fedoraproject.org/updates/elfutils-0.160-1.fc20

Comment 8 Fedora Update System 2014-09-08 15:08:55 UTC
elfutils-0.160-1.fc19 has been submitted as an update for Fedora 19.
https://admin.fedoraproject.org/updates/elfutils-0.160-1.fc19

Comment 9 Fedora Update System 2014-09-19 10:04:25 UTC
elfutils-0.160-1.fc20 has been pushed to the Fedora 20 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 10 Fedora Update System 2014-09-25 10:41:58 UTC
elfutils-0.160-1.fc19 has been pushed to the Fedora 19 stable repository.  If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.