Bug 1317289

Summary: GDB should warn about library mismatches between coredump and installed system
Product: [Fedora] Fedora Reporter: Florian Weimer <fweimer>
Component: gdbAssignee: Jan Kratochvil <jan.kratochvil>
Status: CLOSED UPSTREAM QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: rawhideCC: fweimer, gbenson, jan.kratochvil, palves, pmuldoon, sergiodj, tom
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-03-13 21:07:50 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Florian Weimer 2016-03-13 19:34:56 UTC
I tried to use a coredump as a support tool, and this resulted in a ping-pong with the bug submitter to figure out what library versions were actually installed.  The problem here was that the installation was severely corrupted, to the degree that RPM would not run.  See bug 1314592 for the gory details.

I was mystified initially because GDB would not print any warning that the core file referenced libraries which differed between run-time and my installation.  It would be great if GDB could do a better job in identifying library mis-matches, maybe even providing a few helpful hints what the library version was.  File size and mtime might be a first start.  Mtime was sufficient for locating the correct RPM version in Koji in my case.

I understand this may need kernel support, but I have to assign this bug to a component.

Comment 1 Jan Kratochvil 2016-03-13 19:43:05 UTC
All file sizes, NVRA etc. can be misleading.  It was decided the best/only valid binary file identifier is its build-id.

Fedora GDB has some build-id support.  Why it did not work?  You should provide some screen copy-pastes.  There is some rpm support on top of it but still if rpm fails Fedora GDB should print the missing build-ids.

There is some new rework of that build-id support but that is unfinished and it is questionable if it will ever be finished (at least by me).

There was also a support to download proper libraries (+executable), that is to map build-id to NVRA for files not on local disk:
  https://fedoraproject.org/wiki/Darkserver
Darkserver itself is now discontinued by Kushal Das should prepare some new variant of the same service.  Besides that ABRT project has some similar support but that is running on ABRT retrace server and while they sure provide all the tooling as a Free software I am not aware anyone would use/run it on their own.

Comment 2 Florian Weimer 2016-03-13 19:54:14 UTC
(In reply to Jan Kratochvil from comment #1)
> All file sizes, NVRA etc. can be misleading.  It was decided the best/only
> valid binary file identifier is its build-id.
> 
> Fedora GDB has some build-id support.  Why it did not work?

I think the build-id support only covers the link between program/DSO and debuginfo.  What I needed was a link between process image (coredump in my case, but I think this applies to running processes as well) and program/DSO.

You can reproduce this if you run GDB on the coredump from bug bug 1314592 comment 8, on an up-to-date Fedora 23 system.  Here is the output you get:

$ gdb rpm /tmp/core.rpm.0.0aa738bc898246da8ee147c017e492e2.3562.1457546053000000 
GNU gdb (GDB) Fedora 7.10.1-30.fc23
Copyright (C) 2015 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from rpm...Reading symbols from /usr/lib/debug/usr/bin/rpm.debug...done.
done.
[New LWP 3562]

warning: .dynamic section for "/lib64/libsqlite3.so.0" is not at the expected address (wrong library or version mismatch?)
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `rpm -qf /home/george'.
Program terminated with signal SIGABRT, Aborted.
#0  0x00007efdfea9ba98 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:55
55	  return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
Missing separate debuginfos, use: dnf debuginfo-install bzip2-libs-1.0.6-19.fc23.x86_64
(gdb) bt
#0  0x00007efdfea9ba98 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:55
#1  0x00007efdfea9d69a in __GI_abort () at abort.c:89
#2  0x00007efdfd494ce4 in _mm_srli_epi64 (__B=<optimized out>, __A=...) at /usr/lib/gcc/x86_64-redhat-linux/5.3.1/include/emmintrin.h:1216
#3  poly1305_combine (bytes=93888924937728, m=0x7ffc27bf56d0 "hp360.GnK.localnet", st=0xdea) at poly1305-donna-x64-sse2-incremental-source.c:452
#4  Poly1305Finish (state=<optimized out>, mac=0x2612000 <error: Cannot access memory at address 0x2612000>)
    at poly1305-donna-x64-sse2-incremental-source.c:541
#5  0x0000000000000000 in ?? ()


The backtrace is completely nonsensical.  There is no warning that the rpm binary does not match the coredump, or that the libfreebl3.so version does not match, either.

Comment 3 Jan Kratochvil 2016-03-13 20:18:29 UTC
(In reply to Florian Weimer from comment #2)
> I think the build-id support only covers the link between program/DSO and
> debuginfo.  What I needed was a link between process image (coredump in my
> case, but I think this applies to running processes as well) and program/DSO.

No, there is specific build-id support both in kernel (by Roland McGrath):
  /proc/PID/coredump_filter - bit 4 - Dump ELF headers (=build-id)
And there is specific core dump build-id support in Fedora GDB.


> You can reproduce this if you run GDB on the coredump from bug bug 1314592
> comment 8, on an up-to-date Fedora 23 system.  Here is the output you get:
> 
> $ gdb rpm
> /tmp/core.rpm.0.0aa738bc898246da8ee147c017e492e2.3562.1457546053000000 

You should not enter the command name.  Enter only the core file name.  I did not want to break upstream GDB functionality as this feature is not upstramed.

gdb -q core.rpm.0.0aa738bc898246da8ee147c017e492e2.3562.1457546053000000 
[New LWP 3562]
Missing separate debuginfo for the main executable file
Try: dnf --enablerepo='*debug*' install /usr/lib/debug/.build-id/6b/e6b0ddcb9da5681eb72310118c209c8fd3d22d
Core was generated by `rpm -qf /home/george'.
Program terminated with signal SIGABRT, Aborted.
#0  0x00007efdfea9ba98 in ?? ()
"/tmp/core.rpm.0.0aa738bc898246da8ee147c017e492e2.3562.1457546053000000" is a core file.
Please specify an executable to debug.
(gdb) _

This is IMO somehow valid output.  If you insist it still can give you a numeric-only backtrace but that does not work much in GDB as GDB cannot fetch .eh_frame even if it was recorded in the core file. eu-stack can.

When I run the suggested command on "on an up-to-date Fedora 23 system" I get:

# dnf --enablerepo='*debug*' install /usr/lib/debug/.build-id/6b/e6b0ddcb9da5681eb72310118c209c8fd3d22d
No package /usr/lib/debug/.build-id/6b/e6b0ddcb9da5681eb72310118c209c8fd3d22d available.
Error: Unable to find a match.

That is the problem I was talking about that Fedora repos contain only the release GA variant and the latest stable update, nothing more.  darkserver or ABRT retrace server are a solution for that but none is available for end-user workstations now.

Not sure what you expect more, NEEDINFOed, you can maybe close it or reassign to some virtual DarkServer-v2 component.

Comment 4 Florian Weimer 2016-03-13 20:59:41 UTC
(In reply to Jan Kratochvil from comment #3)
> You should not enter the command name.  Enter only the core file name.  I
> did not want to break upstream GDB functionality as this feature is not
> upstramed.

Oh, that was the critical piece of information I was missing.  Is there a reason not print the mismatch warnings when the command is specified?  This is quite confusing.

I verified that I get warnings if RPM is the correct version, but some libraries are not, as long as I leave out the command name.  So this is purely a usability issue.

(The question of getting the actual debuginfo RPMs is a separate matter.)

Comment 5 Jan Kratochvil 2016-03-13 21:07:50 UTC
(In reply to Florian Weimer from comment #4)
> Is there a reason not print the mismatch warnings when the command is
> specified?  This is quite confusing.

No matter what current Fedora patches does / does not I find it as an obsolete one.  Technicall if you specify just the core file GDB internally switches to a different mode of loading files - restricting loads only to files verifiable by their build-id.  A new implementation of that was written here:
  [PATCH v12 00/32] Validate binary before use
  https://sourceware.org/ml/gdb-patches/2015-08/msg00590.html
But the new one isn't yet even feature complete against the old one.  How it will all end up I am not sure.