Bug 852390

Summary: Unable to extract build-ids from ARM coredumps
Product: [Fedora] Fedora Reporter: Michal Toman <mtoman>
Component: kernelAssignee: Peter Robinson <pbrobinson>
Status: CLOSED CURRENTRELEASE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: medium    
Version: 17CC: fche, gansalmon, itamar, jan.kratochvil, jonathan, kernel-maint, madhu.chinakonda, mjw, mjw, pbrobinson, pknirsch, pmachata, roland, rvokal
Target Milestone: ---   
Target Release: ---   
Hardware: arm   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-03-31 14:54:09 EDT Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
Bug Depends On:    
Bug Blocks: 245418    
Attachments:
Description Flags
eu-unstrip output from binary, libraries, live process and core none

Description Michal Toman 2012-08-28 07:13:47 EDT
Created attachment 607478 [details]
eu-unstrip output from binary, libraries, live process and core

Description of problem:
As mentioned few months ago on ARM list - http://lists.fedoraproject.org/pipermail/arm/2012-May/003253.html eu-unstrip is unable to extract build-id information required for debugging by ABRT from ARM coredumps. I'm not sure whether the information is even present in the coredump itself.

Version-Release number of selected component (if applicable):
elfutils-0.154-2.fc17.armv7hl
but it's the same since F15 bootstrap

How reproducible:
always

Steps to Reproduce:
1. set "ulimit -c unlimited"
2. crash whatever application (eg. "sleep 100 & kill -11 %")
3. run "eu-unstrip -n --core core.pid"
  
Actual results:
very few build-ids (or none) are shown

Expected results:
all build-ids are listed correctly

Additional info:
After installing all relevant debuginfos manually, the debug process works and I am able to get a full backtrace.

Results are not influenced by the presence of ABRT coredump hook.

eu-unstrip works for binaries, libraries and live processes (attaching a sample output for sleep)
Comment 1 Petr Machata 2013-02-24 00:27:54 EST
Looking into the core dump taken at
  http://mtoman.fedorapeople.org/arm/core.12331

Of the segments that are physically present in the file, none seem to be ELF files.  E.g. consider this:

  Type           Offset   VirtAddr   PhysAddr   FileSiz  MemSiz   Flg Align
  NOTE           0x0002f4 0x00000000 0x00000000 0x00037c 0x000000     0x0
  LOAD           0x001000 0x00008000 0x00000000 0x000000 0x005000 R E 0x1000
  LOAD           0x001000 0x00014000 0x00000000 0x001000 0x001000 R   0x1000

The first loadable segment is not in the file.  The second one (at offset 0x1000) contains the following data:

00001000: 0020 a0e3 0030 a0e3 0000 00eb f1ff ffea  . ...0..........
00001010: f040 2de9 14d0 4de2 0210 90e9 0060 a0e1  .@-...M......`..
00001020: 2840 9de5 0100 5ce1 0400 000a 2840 8de5  (@....\.....(@..
00001030: 0600 a0e1 14d0 8de2 f040 bde8 1bf4 ffea  .........@......
[...]

That's not an ELF header.  The ELF header probably was in the previous segment, and that was elided.  The same story repeats with all R E segments, e.g. those that would probably contain the ELF header.

As a backup plan, libdwfs looks for DT_DEBUG.  For that it needs to load a PHDR, which is on address 0x8034, i.e. in the elided segment.  I don't have an ARM machine handy at this moment, but I'll experiment with it more on Monday.  For now it seems libdwfl simply doesn't have enough information to figure out what was loaded.
Comment 2 Petr Machata 2013-02-24 16:18:52 EST
Looking into some other core dumps, it seems that the build ID bits tend to be exactly in those R E segment that the linked core dump lacks.  Any such note would be preceded by a NOTE header with a word GNU in it, e.g. like this:

0000f1c0: 0100 0000 0000 0000 0400 0000 1400 0000  ................
0000f1d0: 0300 0000 474e 5500 2eb7 febf a4e0 3eb0  ....GNU.......>.
0000f1e0: b46d 3fe6 3021 431a f376 8f77 0000 0000  .m?.0!C..v.w....

Here, the note starts at 0xf1c8, it's a GNU note of type NT_GNU_BUILD_ID.  There are no such patterns visible in the ARM core dump.  So there really is no way whatsoever to deduce what modules that core dump consists of, the information simply is absent.  It might be a kernel bug that those segments are elided, but I'll need to look more into why exactly the kernel does that.
Comment 3 Roland McGrath 2013-02-25 14:10:26 EST
It's probably a kernel configuration problem.
Check the /proc/PID/coredump_filter value while running the kernel that produced the core dump.  If it's not 0x33 then the configuration is not what it should be.
Check CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS.
Comment 4 Michal Toman 2013-02-26 11:29:30 EST
That's it. I've rebuilt the kernel with CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS=y and everything works fine.
Comment 5 Peter Robinson 2013-02-26 11:35:26 EST
This issue unique to the omap config for historical reasons, it'll be fixed for 3.8.0+ on f17/18 and 3.9+ on rawhide. It's not an issue on tegra/unified kernels.
Comment 6 Mark Wielaard 2013-02-26 13:29:06 EST
(In reply to comment #5)
> This issue unique to the omap config for historical reasons, it'll be fixed
> for 3.8.0+ on f17/18 and 3.9+ on rawhide. It's not an issue on tegra/unified
> kernels.

Should we close this bug or move it to the kernel till it is updated?
Or is the kernel already updated?
Comment 7 Peter Robinson 2013-03-31 14:54:09 EDT
Fixed in 3.8+
Comment 8 Jan Kratochvil 2013-07-23 14:29:32 EDT
Just reassigning closed Bug as it was fixed in kernel, not in elfutils.