Bug 1112610

Summary: eu-stack: "Callback returned failure" for seemingly OK shared libraries
Product: [Fedora] Fedora Reporter: Martin Milata <mmilata>
Component: elfutilsAssignee: Mark Wielaard <mjw>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: low Docs Contact:
Priority: unspecified    
Version: 20CC: aoliva, fche, jakub, jan.kratochvil, jfilak, mjw, mjw, mmilata, pmachata, roland
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: elfutils-0.160-1.fc19 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-07-28 19:04:10 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Do not rely on link_map.l_addr. none

Description Martin Milata 2014-06-24 10:30:14 UTC
Description of problem:
Some functions (e.g. dwfl_module_getelf) return DWFL_E_CB even though the related library seems to be alright. It can be reproduced by running eu-stack on the provided coredump.

Version-Release number of selected component (if applicable):
elfutils-0.158-4.fc20.x86_64

How reproducible:
Hopefully always.

Steps to Reproduce:
1. Obtain F20 system.
2. # yum install koji mock
3. # usermod -a -G mock user
4. # su - user
5. $ wget http://.../ccpp-callback-failure.tar.xz
6. $ wget http://.../setup-retracing.py
   (I'll provide the actual locations in next comment)
7. $ tar xJvf ccpp-callback-failure.tar.xz
8. $ ./setup-retracing.py ccpp-callback-failure/
   (This will download the RPMs that contain libraries referenced
    by the core dump from koji and unpack them to fedora-20-x86_64
    mock chroot.)
9. $ mock -r fedora-20-x86_64 --copyin ccpp-callback-failure /tmp/ccpp-callback-failure
10. $ mock -r fedora-20-x86_64 --shell
11. <mock># cd /tmp/ccpp-callback-failure && eu-stack --core coredump

Actual results:
PID 1664 - core
(...)
TID 1700:
#0  0x00000031df00bd20 pthread_cond_wait@@GLIBC_2.3.2
#1  0x00000031f9c236b0 PR_WaitCondVar
#2  0x00000031e65470de
eu-stack: dwfl_thread_getframes tid 1700 at 0x31e65470dd in libmozjs-17.0.so: Callback returned failure
TID 1699:
#0  0x00000031df00bd20 pthread_cond_wait@@GLIBC_2.3.2
#1  0x00000031f9c236b0 PR_WaitCondVar
#2  0x00000031e64b0700
eu-stack: dwfl_thread_getframes tid 1699 at 0x31e64b06ff in libmozjs-17.0.so: Callback returned failure
(...)

Expected results:
Complete stack traces for the affected thread.

Additional info:
Running GDB on the core seems to produce complete stack trace.
libmozjs-17.0.so is present and with the same build id as in the core.
It's likely that the bug is not related to unwinding as we get the same error when calling dwfl_module_getelf.

Comment 2 Jan Kratochvil 2014-06-26 18:05:06 UTC
Created attachment 912508 [details]
Do not rely on link_map.l_addr.

Problem 1:
$ eu-stack --core /tmp/ccpp-callback-failure/coredump
#18 0x00007fc661daece8 meta_run
#19 0x0000000000402131
eu-stack: dwfl_thread_getframes tid 1664 at 0x402130 in [exe]: Callback returned failure

Problem 1 is in how ABRT arranges the retracing chroot and/or how ABRT calls eu-stack.  This is not a bug in elfutils.

Solution 1.A:
$ eu-stack -e /usr/bin/gnome-shell --core /tmp/ccpp-callback-failure/coredump
[...]
#18 0x00007fc661daece8 meta_run
#19 0x0000000000402131 main
[...]

Solution 1.B:
$ gdb /tmp/ccpp-callback-failure/coredump
Missing separate debuginfo for the main executable file
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/a3/93a342ffc386364d06e61da0b06c1e1f972eb4
$ yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/a3/93a342ffc386364d06e61da0b06c1e1f972eb4
$ eu-stack --core /tmp/ccpp-callback-failure/coredump
#18 0x00007fc661daece8 meta_run
#19 0x0000000000402131 main


Problem 2:
TID 1700:
#0  0x00000031df00bd20 pthread_cond_wait@@GLIBC_2.3.2
#1  0x00000031f9c236b0 PR_WaitCondVar
#2  0x00000031e65470de
eu-stack: dwfl_thread_getframes tid 1700 at 0x31e65470dd in libmozjs-17.0.so: Callback returned failure
TID 1699:
#0  0x00000031df00bd20 pthread_cond_wait@@GLIBC_2.3.2
#1  0x00000031f9c236b0 PR_WaitCondVar
#2  0x00000031e64b0700
eu-stack: dwfl_thread_getframes tid 1699 at 0x31e64b06ff in libmozjs-17.0.so: Callback returned failure

Solution: Attached l_addr.patch.
It needs a testcase and then I will post it to the elfutils mailing list.  The problem was that user dumping the core file had DSOs prelinked but downloaded DSOs from RPMs are not prelinked - sure an elfutils bug (in my code although not in the unwinder but in DSOs r_debug reader).
TID 1700:
#0  0x00000031df00bd20 pthread_cond_wait@@GLIBC_2.3.2
#1  0x00000031f9c236b0 PR_WaitCondVar
#2  0x00000031e65470de js::SourceCompressorThread::threadLoop()
[...]
TID 1699:
#0  0x00000031df00bd20 pthread_cond_wait@@GLIBC_2.3.2
#1  0x00000031f9c236b0 PR_WaitCondVar
#2  0x00000031e64b0700 js::GCHelperThread::threadLoop()
[...]

Comment 3 Jan Kratochvil 2014-07-28 19:04:10 UTC
Problem 2 fix is now checked in upstream:
  475849fdb25265706772905b856cd7028c566a71

The NEEDINFO:
Jakub Filak has asked for a new rpm release with this fix, IIUC for all Fedoras.

Comment 4 Mark Wielaard 2014-07-28 22:08:20 UTC
I backported the fix to rawhide elfutils-0.159-8.fc22. If that works for you we'll can push that version to older fedora releases too.

Comment 5 Mark Wielaard 2014-09-01 12:05:13 UTC
elfutils 0.160 has been released and included in fedora f21/rawhide. As said in comment #4. Let us know if that works for you then we can see whether to backport it to earlier fedora releases. Thanks.

Comment 6 Martin Milata 2014-09-02 16:12:35 UTC
Verified that 0.160 works fine - thanks for the quick fix! Backport to earlier versions will be much appreciated.

Comment 7 Fedora Update System 2014-09-08 15:05:46 UTC
elfutils-0.160-1.fc20 has been submitted as an update for Fedora 20.
https://admin.fedoraproject.org/updates/elfutils-0.160-1.fc20

Comment 8 Fedora Update System 2014-09-08 15:08:55 UTC
elfutils-0.160-1.fc19 has been submitted as an update for Fedora 19.
https://admin.fedoraproject.org/updates/elfutils-0.160-1.fc19

Comment 9 Fedora Update System 2014-09-19 10:04:25 UTC
elfutils-0.160-1.fc20 has been pushed to the Fedora 20 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 10 Fedora Update System 2014-09-25 10:41:58 UTC
elfutils-0.160-1.fc19 has been pushed to the Fedora 19 stable repository.  If problems still persist, please make note of it in this bug report.