The dwfl_core_file_report function follows dynamic linker's link_map chain in order to determine the shared libraries used by the executable. As this data structure is located in writable memory it can be overwritten by garbage, which is sometimes the case [1]. Since version 3.7 (commit 2aa362c49), Linux kernel adds NT_FILE note to core files which contains the files mapped by the process, including shared libraries. Would it make sense to try to detect in dwfl_core_file_report if the link_map chain is broken and fall back on NT_FILE in this case? I'm aware we can save the /proc/PID/maps file and call dwfl_linux_proc_maps_report on it, however it does not report the VDSO segment to elfutils as is the case with dwfl_core_file_report. [1] https://github.com/abrt/satyr/issues/127#issuecomment-46957546
elfutils already has been (even before the unwinder implementation) scanning all mmap()ped files in address space / core file. This works even for core files thanks to the first ELF page being dumped, build-id present and /usr/lib/debug/.build-id/ links present. Isn't the problem that AFAIK ABRT does not provide /usr/lib/debug/.build-id/ with all the links expected there? (I admit I haven't tried to reproduce it now, there may be some bugs.) The possible disadvantage is that elfutils may have false positives - mapping more libraries than those really dlopen()ed. For example a library mmap()ed as a data file for tools-kind application. But for the unwinding purposes false positives should not matter.
Still NT_FILE would sure work even without /usr/lib/debug/.build-id/ present. So as a general elfutils RFE it is valid but AFAIK ABRT could provide the /usr/lib/debug/.build-id/ links.
Indeed, the /usr/lib/debug/.build-id/ links usually aren't there - they are provided by the -debuginfo packages which we don't want to rely on (due to their size). Modifying the rpmbuild script to include the link to binary/library in the main package (and doing mass rebuild) would solve this. I guess ABRT should probably investigate crash-time live-process unwinding which would solve this issue as well.
(In reply to Martin Milata from comment #3) > I guess ABRT should probably investigate crash-time live-process unwinding > which would solve this issue as well. That should be the most reliable way, elfutils fix of deleted files for it is now pending in Bug 1130781.
Created attachment 933905 [details] elfutils trunk patch for NT_FILE support The binary updated file tests/test-core.core.bz2 is required for: PASS: run-unstrip-n.sh
Martin, could you test the patch that it has no regressions against ABRT core files testsuite with removed that its own implementation? If there is no such test possible I can sure post it for elfutils as is.
The patch is also on elfutils branch: jankratochvil/NT_FILE https://git.fedorahosted.org/cgit/elfutils.git/commit/?h=jankratochvil/NT_FILE&id=71f16023e472cd520f7b9186c6865cee08c56015
Tested the patched version (on a core with undamaged link map) and it produces the same output as the unpatched version - no regression here.
Thanks for this test, it is also useful. But I wanted to rather test if the NT_FILE elfutils implementation brings benefits to ABRT: At least from https://github.com/abrt/satyr/commit/29a96f46028808994dfaf03fb0c29dde9601f44e I guess that inferior with corrupted link_map would get elfutils-backtraced before the revert above (as it was using /proc/PID/maps and not link_map). While after the revert corrupted link_map breaks the elfutils-backtrace and after applying the NT_FILE elfutils above it the elfutils-backtrace with corrupted link_map could be working again. (Feel free to IRC me if it is not fully clear.) If you do not have such testcase at hand that is sure also OK.
Yes, I don't have such test case at hand. But it probably should be part of ABRT once elfutils have this patch merged. I'll look into creating it.
BTW elfutils has a testcase tests/*linkmap-cut*: https://git.fedorahosted.org/cgit/elfutils.git/commit/?h=jankratochvil/NT_FILE link_map was corrupted there by hand (by hexedit) and NT_FILE pathnames were made relative (/home/../filename -> ./filename). I was more expecting you can grep some real world core files from ABRT for such case but if you do not have any I can go ahead with the patch I have. Thanks for the regression testing.
I think I'd be able to find some real world example - downloading all the matching libraries is bit of a pain, though. I have created "artificial" test case[1] and the unwinding works fine with patched elfutils (as opposed to unpatched). [1] https://github.com/sorki/will-crash/pull/8/files
[PATCH 1/2] Add is_executable to Dwfl_Module. https://lists.fedorahosted.org/pipermail/elfutils-devel/2014-September/004129.html [PATCH 2/2] Support note NT_FILE for locating files https://lists.fedorahosted.org/pipermail/elfutils-devel/2014-September/004130.html
Checked into upstream repository now. Keeping this Bug open for next Fedora elfutils update.
elfutils-0.161-2.fc21 has been submitted as an update for Fedora 21. https://admin.fedoraproject.org/updates/elfutils-0.161-2.fc21
elfutils-0.161-2.fc20 has been submitted as an update for Fedora 20. https://admin.fedoraproject.org/updates/elfutils-0.161-2.fc20
Package elfutils-0.161-2.fc20: * should fix your issue, * was pushed to the Fedora 20 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing elfutils-0.161-2.fc20' as soon as you are able to. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2015-0677/elfutils-0.161-2.fc20 then log in and leave karma (feedback).
elfutils-0.161-2.fc21 has been pushed to the Fedora 21 stable repository. If problems still persist, please make note of it in this bug report.
elfutils-0.161-2.fc20 has been pushed to the Fedora 20 stable repository. If problems still persist, please make note of it in this bug report.