Bug 702089

Summary: RFE: Track package build-id
Product: [Fedora] Fedora Reporter: Jan Kratochvil <jan.kratochvil>
Component: rpmAssignee: Fedora Packaging Toolset Team <packaging-team>
Status: CLOSED CURRENTRELEASE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: rawhideCC: ffesti, herrold, james.antill, mjw, pmatilai, roland, sankarshan
Target Milestone: ---Keywords: FutureFeature
Target Release: ---   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-08-29 10:22:37 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 708321    

Description Jan Kratochvil 2011-05-04 18:25:57 UTC
Description of problem:
Fedora binaries have built-in so-called "build-id" uniquely identifying each file.  There should be a way to ask rpm "which file has this build-id"?

Version-Release number of selected component (if applicable):
FutureFeature

How reproducible:
FutureFeature

Steps to Reproduce:
$ rpm -q --whatprovidesbuildid 34e7ab8bdcfd85fcb903d8a8f1e9b1d181fa6210
info-4.13a-13.fc14.x86_64 /usr/bin/info
$ rpm -q --buildid info
34e7ab8bdcfd85fcb903d8a8f1e9b1d181fa6210 /usr/bin/info
7dfd1912703394851fbf8f2044dcfacd52f98ef8 /sbin/install-info 
[...]

The information is currently stored only in the separate debug info rpms which when not installed are not available to GDB and other tools.

$ ls -l /usr/lib/debug/.build-id/34/e7ab8bdcfd85fcb903d8a8f1e9b1d181fa6210
lrwxrwxrwx 1 root root 20 May  4 20:21 /usr/lib/debug/.build-id/34/e7ab8bdcfd85fcb903d8a8f1e9b1d181fa6210 -> ../../../../bin/info*
$ rpm -qf /usr/lib/debug/.build-id/34/e7ab8bdcfd85fcb903d8a8f1e9b1d181fa6210
texinfo-debuginfo-4.13a-13.fc14.x86_64

Displaying build-id of a binary:
$ eu-readelf -n /usr/bin/info

Additional info:
In fact there can be multiple files with the same build-id - such as hard links - or even two regular files can have sometimes the same build-id if they come from the same sources.  This is unrelated to this Bug.  (See Bug 641377.)

One could also just move the binary symlinks from *-debuginfo.rpm to the main binary rpms.  It would have some rpm size costs and target filesystem costs.
I thought it would be cheaper with native rpm headers + database support.
IIRC this decision to have the symlinks only in *-debuginfo.rpm was done by Roland McGrath.

Comment 1 James Antill 2011-05-04 19:12:30 UTC
 The most obvious question is: Why do you need it mapped to the file specifically, and not just the package (which contains the files, so you can narrow it down later).

 The "obvious" solution is to have something which would add:

Provides: build-id(34e7ab8bdcfd85fcb903d8a8f1e9b1d181fa6210)
Provides: build-id(7dfd1912703394851fbf8f2044dcfacd52f98ef8)

...to the info package, you could then easily get at it for installed packages and search for it in available packages.
 Do you have rough numbers for how many provides entries that would add?

Comment 2 Jan Kratochvil 2011-05-05 01:28:05 UTC
for i in $(repoquery -qa --enablerepo='*-debuginfo' '*-debuginfo'|sort -u);do echo "$[$(repoquery --enablerepo='*-debuginfo' -ql $i|grep '/usr/lib/debug/.build-id....'|wc -l)/2] $i";done

 2236 kernel-debug-debuginfo-0:2.6.35.12-90.fc14.x86_64
  422 grass-debuginfo-0:6.3.0-17.fc14.x86_64
  417 wine-debuginfo-0:1.3.18-1.fc14.x86_64
  408 paraview-debuginfo-0:3.8.1-3.fc14.x86_64
  354 qt-debuginfo-1:4.7.2-8.fc14.x86_64
  325 openoffice.org-debuginfo-1:3.3.0-20.5.fc14.x86_64
-----
39962 total /usr/lib/debug/.build-id/ links (79924 as the whole .debug pair)
      (the total count may be overstated due to some packages duplicity above)

# kernel lookup time (used by GDB for vdso.ko):
find /lib/modules/`uname -r` -name "*.ko"|sort -u >/tmp/kos
sync; echo 3 > /proc/sys/vm/drop_caches
time eu-readelf -n `cat /tmp/kos` >/dev/null
real	0m17.774s
# minus benchmark startup overhead:
sync; echo 3 > /proc/sys/vm/drop_caches
time eu-readelf -n `head -n1 /tmp/kos` >/dev/null
real	0m1.147s

It may be bearable but with serious Red Hat investments in GDB performance like
  http://fedoraproject.org/wiki/Features/GdbIndex
I do not find such performance degradation by design as acceptable.

BTW I do not see why `kernel-debug-debuginfo' is present but `kernel-debuginfo' is not:
$ repoquery -q --enablerepo='*-debuginfo' kernel-debuginfo
$ _

Comment 3 Panu Matilainen 2011-05-05 13:41:06 UTC
Hmm, files containing build-id's are a relatively small percentage of all files (on my two F15 boxes it's around 5.5%) so the overhead for build-id data and an index for it doesn't seem outright unacceptable.

As for build-id provides in debuginfo packages, that would be quite trivial to add (and AFAICS Suse does this, among some other quite significant changes to debuginfo packaging).

Roland (CC'd), I've seen you comment about "grand debuginfo revamp" on couple of occasions - could you outline what you have in mind?

Comment 4 Jan Kratochvil 2011-05-05 14:05:12 UTC
Roland: ^ Comment 3.  The revamp AFAIK has never been completed/coded.

Comment 5 Roland McGrath 2011-05-05 17:33:13 UTC
The main idea was based on the DWARF compression work and the idea of a "consolidated debug archive".  mjw is now in charge of that work.

In that vision, the compression stuff would yield a single file representing all the information (hopefully in much smaller form) that is now in the many .debug files.  That file would have a build ID index in it as well as a file name index, so upon finding the right CDAR (consolidated debug archive file), it would be efficient to look up a build ID and find the information now stored in individual .debug files in debuginfo rpms.

I never got very clear on the rest of the plan.  There are two key aspects to be determined.  First, how to go from build ID to the right CDAR.  A simple idea is to use the existing symlink convention, and just have all those symlinks point to the one file.  A probably better plan is to have some sort of system-wide database that maps a build ID to both the main file (i.e. stripped binary in /usr/bin/foo or whereever) and to the CDAR or individual .debug file.  That would have to be maintained by rpm, yum, or another magic database-maintaining program that they invoke either magically or in %post scripts or something.  (Ideally this plan would include a common database for installed systems that could be used by non-rpm systems as well.)

Second, is how to change the debuginfo rpm packaging.  The simplest thing is to keep it about how it is now, with each srpm generating one debuginfo rpm that contains the CDAR file and the source files.  A better idea is to split it into separate subpackages for the source and for the CDAR (opensuse does that now AIUI).  A related question is whether to keep a single big debuginfo package (i.e. single CDAR) for everything in one srpm, or to split it into a separate debuginfo subpackage for each binary subpackage.  AIUI opensuse does the latter.  Doing that requires built-in support in rpmbuild (what I gather opensuse has done) or some fancier new spec macro magic that I don't know how to do.  Today's debuginfo rpms are generated by relatively straightforward spec macro magic.  To generate different debuginfo subpackages corresponding to each binary subpackage, the equivalent of find-debuginfo.sh would have to have a way to know what all the binary subpackages are and which binaries go into which one.

People today like the idea of separate debuginfo subpackages for each binary subpackage, because then if you want just the debuginfo for one thing you don't download/install a whole big other thing.  However, the hope behind the CDAR design is that the compression of DWARF and other information (ELF symbol table data would be somewhat consolidated by the CDAR design too) would make one CDAR for several related binaries smaller than the aggregate size of the individual .debug files for the same set of binaries would be even after doing DWARF compression on each file.

Comment 6 Jan Kratochvil 2011-05-09 00:19:30 UTC
(In reply to comment #5)
> First, how to go from build ID to the right CDAR.  A simple idea is to use
> the existing symlink convention, and just have all those symlinks point to
> the one file.  A probably better plan is to have some sort of system-wide
> database that maps a build ID to both the main file (i.e. stripped binary in
> /usr/bin/foo or whereever) and to the CDAR or individual .debug file.

kernel-2.6.35.12-90.fc14.x86_64

# find /usr/lib/debug/.build-id/ | wc -l
13604
# find /usr/lib/debug/.build-id/ -type l | wc -l
13347
# du /var/lib/rpm
158088  /var/lib/rpm

Difference after: cp -a /usr/lib/debug/.build-id/ /mnt/

Filesystem           1K-blocks      Used Available Use% Mounted on
/tmp/300m.ext4-1k       297485     10254    271871   4% /mnt
/tmp/300m.ext4-1k       297485     18160    263965   7% /mnt
18160-10254=7906
/tmp/300m.ext4-4k       297560     16552    265648   6% /mnt
/tmp/300m.ext4-4k       297560     44196    238004  16% /mnt
44196-16552=27644
/tmp/300m.btrfs         307200        56    307144   1% /mnt
/tmp/300m.btrfs         307200     17824    289376   6% /mnt
17824-56=17768

I find the size acceptable.  It has very slow full scan but that is never needed.  It has fast lookups and IMO bearable updates (rpm -i, rpm -e).

Comment 7 Fedora Admin XMLRPC Client 2012-04-13 23:06:27 UTC
This package has changed ownership in the Fedora Package Database.  Reassigning to the new owner of this component.

Comment 8 Fedora Admin XMLRPC Client 2012-04-13 23:10:04 UTC
This package has changed ownership in the Fedora Package Database.  Reassigning to the new owner of this component.

Comment 9 Mark Wielaard 2012-04-23 10:54:04 UTC
This would also benefit the darkserver project https://darkserver.fedoraproject.org/
Darkserver is a service written to help people finding details of build-id(s). People will be able query the service based on build-id(s) or rpm package names. The service will provide output in JSON format as it will be easier for other tools to parse the output. https://fedoraproject.org/wiki/Darkserver

Comment 10 Panu Matilainen 2019-08-29 10:22:37 UTC
(In reply to Jan Kratochvil from comment #0)
> One could also just move the binary symlinks from *-debuginfo.rpm to the
> main binary rpms.  

This landed in rpm as a part of the major debuginfo rewrite couple of years ago, and all Fedora packages starting with F27 have the info embedded. So I guess we can consider the case closed.

(/me, going through old bugs)