Bug 1129777

Summary: RFE: dwfl_core_file_report: use NT_FILE core note if the link_map chain is broken
Product: [Fedora] Fedora Reporter: Martin Milata <mmilata>
Component: elfutilsAssignee: Jan Kratochvil <jan.kratochvil>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: rawhideCC: aoliva, fche, jakub, jan.kratochvil, mjw, mmilata, pmachata, roland
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: elfutils-0.161-2.fc20 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-01-19 01:35:24 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
elfutils trunk patch for NT_FILE support none

Description Martin Milata 2014-08-13 15:46:37 UTC
The dwfl_core_file_report function follows dynamic linker's link_map chain in order to determine the shared libraries used by the executable. As this data structure is located in writable memory it can be overwritten by garbage, which is sometimes the case [1].

Since version 3.7 (commit 2aa362c49), Linux kernel adds NT_FILE note to core files which contains the files mapped by the process, including shared libraries.

Would it make sense to try to detect in dwfl_core_file_report if the link_map chain is broken and fall back on NT_FILE in this case?

I'm aware we can save the /proc/PID/maps file and call dwfl_linux_proc_maps_report on it, however it does not report the VDSO segment to elfutils as is the case with dwfl_core_file_report.

[1] https://github.com/abrt/satyr/issues/127#issuecomment-46957546

Comment 1 Jan Kratochvil 2014-08-13 21:05:34 UTC
elfutils already has been (even before the unwinder implementation) scanning all mmap()ped files in address space / core file. This works even for core files thanks to the first ELF page being dumped, build-id present and /usr/lib/debug/.build-id/ links present.

Isn't the problem that AFAIK ABRT does not provide /usr/lib/debug/.build-id/ with all the links expected there?

(I admit I haven't tried to reproduce it now, there may be some bugs.)

The possible disadvantage is that elfutils may have false positives - mapping more libraries than those really dlopen()ed.  For example a library mmap()ed as a data file for tools-kind application.  But for the unwinding purposes false positives should not matter.

Comment 2 Jan Kratochvil 2014-08-13 21:11:52 UTC
Still NT_FILE would sure work even without /usr/lib/debug/.build-id/ present.
So as a general elfutils RFE it is valid but AFAIK ABRT could provide the /usr/lib/debug/.build-id/ links.

Comment 3 Martin Milata 2014-08-15 10:23:41 UTC
Indeed, the /usr/lib/debug/.build-id/ links usually aren't there - they are provided by the -debuginfo packages which we don't want to rely on (due to their size).

Modifying the rpmbuild script to include the link to binary/library in the main package (and doing mass rebuild) would solve this. I guess ABRT should probably investigate crash-time live-process unwinding which would solve this issue as well.

Comment 4 Jan Kratochvil 2014-08-28 20:48:35 UTC
(In reply to Martin Milata from comment #3)
> I guess ABRT should probably investigate crash-time live-process unwinding
> which would solve this issue as well.

That should be the most reliable way, elfutils fix of deleted files for it is now pending in Bug 1130781.

Comment 5 Jan Kratochvil 2014-09-02 20:40:38 UTC
Created attachment 933905 [details]
elfutils trunk patch for NT_FILE support

The binary updated file tests/test-core.core.bz2 is required for:
PASS: run-unstrip-n.sh

Comment 6 Jan Kratochvil 2014-09-02 20:42:54 UTC
Martin, could you test the patch that it has no regressions against ABRT core files testsuite with removed that its own implementation?

If there is no such test possible I can sure post it for elfutils as is.

Comment 7 Jan Kratochvil 2014-09-02 20:43:57 UTC
The patch is also on elfutils branch: jankratochvil/NT_FILE
https://git.fedorahosted.org/cgit/elfutils.git/commit/?h=jankratochvil/NT_FILE&id=71f16023e472cd520f7b9186c6865cee08c56015

Comment 8 Martin Milata 2014-09-04 14:59:25 UTC
Tested the patched version (on a core with undamaged link map) and it produces the same output as the unpatched version - no regression here.

Comment 9 Jan Kratochvil 2014-09-07 20:02:38 UTC
Thanks for this test, it is also useful. But I wanted to rather test if the NT_FILE elfutils implementation brings benefits to ABRT:

At least from
  https://github.com/abrt/satyr/commit/29a96f46028808994dfaf03fb0c29dde9601f44e
I guess that inferior with corrupted link_map would get elfutils-backtraced before the revert above (as it was using /proc/PID/maps and not link_map).  While after the revert corrupted link_map breaks the elfutils-backtrace and after applying the NT_FILE elfutils above it the elfutils-backtrace with corrupted link_map could be working again.

(Feel free to IRC me if it is not fully clear.)

If you do not have such testcase at hand that is sure also OK.

Comment 10 Martin Milata 2014-09-08 09:30:01 UTC
Yes, I don't have such test case at hand. But it probably should be part of ABRT once elfutils have this patch merged. I'll look into creating it.

Comment 11 Jan Kratochvil 2014-09-08 11:39:07 UTC
BTW elfutils has a testcase tests/*linkmap-cut*:
  https://git.fedorahosted.org/cgit/elfutils.git/commit/?h=jankratochvil/NT_FILE

link_map was corrupted there by hand (by hexedit) and NT_FILE pathnames were made relative (/home/../filename -> ./filename).

I was more expecting you can grep some real world core files from ABRT for such case but if you do not have any I can go ahead with the patch I have.
Thanks for the regression testing.

Comment 12 Martin Milata 2014-09-08 14:43:01 UTC
I think I'd be able to find some real world example - downloading all the matching libraries is bit of a pain, though. I have created "artificial" test case[1] and the unwinding works fine with patched elfutils (as opposed to unpatched).

[1] https://github.com/sorki/will-crash/pull/8/files

Comment 13 Jan Kratochvil 2014-09-10 06:50:30 UTC
[PATCH 1/2] Add is_executable to Dwfl_Module.
https://lists.fedorahosted.org/pipermail/elfutils-devel/2014-September/004129.html
[PATCH 2/2] Support note NT_FILE for locating files
https://lists.fedorahosted.org/pipermail/elfutils-devel/2014-September/004130.html

Comment 14 Jan Kratochvil 2014-09-26 20:47:20 UTC
Checked into upstream repository now.
Keeping this Bug open for next Fedora elfutils update.

Comment 15 Fedora Update System 2015-01-13 12:06:58 UTC
elfutils-0.161-2.fc21 has been submitted as an update for Fedora 21.
https://admin.fedoraproject.org/updates/elfutils-0.161-2.fc21

Comment 16 Fedora Update System 2015-01-13 13:08:29 UTC
elfutils-0.161-2.fc20 has been submitted as an update for Fedora 20.
https://admin.fedoraproject.org/updates/elfutils-0.161-2.fc20

Comment 17 Fedora Update System 2015-01-14 07:32:29 UTC
Package elfutils-0.161-2.fc20:
* should fix your issue,
* was pushed to the Fedora 20 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing elfutils-0.161-2.fc20'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2015-0677/elfutils-0.161-2.fc20
then log in and leave karma (feedback).

Comment 18 Fedora Update System 2015-01-19 01:35:24 UTC
elfutils-0.161-2.fc21 has been pushed to the Fedora 21 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 19 Fedora Update System 2015-01-20 20:59:35 UTC
elfutils-0.161-2.fc20 has been pushed to the Fedora 20 stable repository.  If problems still persist, please make note of it in this bug report.