Hide Forgot
Description of problem: Fedora binaries have built-in so-called "build-id" uniquely identifying each file. There should be a way to ask rpm "which file has this build-id"? Version-Release number of selected component (if applicable): FutureFeature How reproducible: FutureFeature Steps to Reproduce: $ rpm -q --whatprovidesbuildid 34e7ab8bdcfd85fcb903d8a8f1e9b1d181fa6210 info-4.13a-13.fc14.x86_64 /usr/bin/info $ rpm -q --buildid info 34e7ab8bdcfd85fcb903d8a8f1e9b1d181fa6210 /usr/bin/info 7dfd1912703394851fbf8f2044dcfacd52f98ef8 /sbin/install-info [...] The information is currently stored only in the separate debug info rpms which when not installed are not available to GDB and other tools. $ ls -l /usr/lib/debug/.build-id/34/e7ab8bdcfd85fcb903d8a8f1e9b1d181fa6210 lrwxrwxrwx 1 root root 20 May 4 20:21 /usr/lib/debug/.build-id/34/e7ab8bdcfd85fcb903d8a8f1e9b1d181fa6210 -> ../../../../bin/info* $ rpm -qf /usr/lib/debug/.build-id/34/e7ab8bdcfd85fcb903d8a8f1e9b1d181fa6210 texinfo-debuginfo-4.13a-13.fc14.x86_64 Displaying build-id of a binary: $ eu-readelf -n /usr/bin/info Additional info: In fact there can be multiple files with the same build-id - such as hard links - or even two regular files can have sometimes the same build-id if they come from the same sources. This is unrelated to this Bug. (See Bug 641377.) One could also just move the binary symlinks from *-debuginfo.rpm to the main binary rpms. It would have some rpm size costs and target filesystem costs. I thought it would be cheaper with native rpm headers + database support. IIRC this decision to have the symlinks only in *-debuginfo.rpm was done by Roland McGrath.
The most obvious question is: Why do you need it mapped to the file specifically, and not just the package (which contains the files, so you can narrow it down later). The "obvious" solution is to have something which would add: Provides: build-id(34e7ab8bdcfd85fcb903d8a8f1e9b1d181fa6210) Provides: build-id(7dfd1912703394851fbf8f2044dcfacd52f98ef8) ...to the info package, you could then easily get at it for installed packages and search for it in available packages. Do you have rough numbers for how many provides entries that would add?
for i in $(repoquery -qa --enablerepo='*-debuginfo' '*-debuginfo'|sort -u);do echo "$[$(repoquery --enablerepo='*-debuginfo' -ql $i|grep '/usr/lib/debug/.build-id....'|wc -l)/2] $i";done 2236 kernel-debug-debuginfo-0:2.6.35.12-90.fc14.x86_64 422 grass-debuginfo-0:6.3.0-17.fc14.x86_64 417 wine-debuginfo-0:1.3.18-1.fc14.x86_64 408 paraview-debuginfo-0:3.8.1-3.fc14.x86_64 354 qt-debuginfo-1:4.7.2-8.fc14.x86_64 325 openoffice.org-debuginfo-1:3.3.0-20.5.fc14.x86_64 ----- 39962 total /usr/lib/debug/.build-id/ links (79924 as the whole .debug pair) (the total count may be overstated due to some packages duplicity above) # kernel lookup time (used by GDB for vdso.ko): find /lib/modules/`uname -r` -name "*.ko"|sort -u >/tmp/kos sync; echo 3 > /proc/sys/vm/drop_caches time eu-readelf -n `cat /tmp/kos` >/dev/null real 0m17.774s # minus benchmark startup overhead: sync; echo 3 > /proc/sys/vm/drop_caches time eu-readelf -n `head -n1 /tmp/kos` >/dev/null real 0m1.147s It may be bearable but with serious Red Hat investments in GDB performance like http://fedoraproject.org/wiki/Features/GdbIndex I do not find such performance degradation by design as acceptable. BTW I do not see why `kernel-debug-debuginfo' is present but `kernel-debuginfo' is not: $ repoquery -q --enablerepo='*-debuginfo' kernel-debuginfo $ _
Hmm, files containing build-id's are a relatively small percentage of all files (on my two F15 boxes it's around 5.5%) so the overhead for build-id data and an index for it doesn't seem outright unacceptable. As for build-id provides in debuginfo packages, that would be quite trivial to add (and AFAICS Suse does this, among some other quite significant changes to debuginfo packaging). Roland (CC'd), I've seen you comment about "grand debuginfo revamp" on couple of occasions - could you outline what you have in mind?
Roland: ^ Comment 3. The revamp AFAIK has never been completed/coded.
The main idea was based on the DWARF compression work and the idea of a "consolidated debug archive". mjw is now in charge of that work. In that vision, the compression stuff would yield a single file representing all the information (hopefully in much smaller form) that is now in the many .debug files. That file would have a build ID index in it as well as a file name index, so upon finding the right CDAR (consolidated debug archive file), it would be efficient to look up a build ID and find the information now stored in individual .debug files in debuginfo rpms. I never got very clear on the rest of the plan. There are two key aspects to be determined. First, how to go from build ID to the right CDAR. A simple idea is to use the existing symlink convention, and just have all those symlinks point to the one file. A probably better plan is to have some sort of system-wide database that maps a build ID to both the main file (i.e. stripped binary in /usr/bin/foo or whereever) and to the CDAR or individual .debug file. That would have to be maintained by rpm, yum, or another magic database-maintaining program that they invoke either magically or in %post scripts or something. (Ideally this plan would include a common database for installed systems that could be used by non-rpm systems as well.) Second, is how to change the debuginfo rpm packaging. The simplest thing is to keep it about how it is now, with each srpm generating one debuginfo rpm that contains the CDAR file and the source files. A better idea is to split it into separate subpackages for the source and for the CDAR (opensuse does that now AIUI). A related question is whether to keep a single big debuginfo package (i.e. single CDAR) for everything in one srpm, or to split it into a separate debuginfo subpackage for each binary subpackage. AIUI opensuse does the latter. Doing that requires built-in support in rpmbuild (what I gather opensuse has done) or some fancier new spec macro magic that I don't know how to do. Today's debuginfo rpms are generated by relatively straightforward spec macro magic. To generate different debuginfo subpackages corresponding to each binary subpackage, the equivalent of find-debuginfo.sh would have to have a way to know what all the binary subpackages are and which binaries go into which one. People today like the idea of separate debuginfo subpackages for each binary subpackage, because then if you want just the debuginfo for one thing you don't download/install a whole big other thing. However, the hope behind the CDAR design is that the compression of DWARF and other information (ELF symbol table data would be somewhat consolidated by the CDAR design too) would make one CDAR for several related binaries smaller than the aggregate size of the individual .debug files for the same set of binaries would be even after doing DWARF compression on each file.
(In reply to comment #5) > First, how to go from build ID to the right CDAR. A simple idea is to use > the existing symlink convention, and just have all those symlinks point to > the one file. A probably better plan is to have some sort of system-wide > database that maps a build ID to both the main file (i.e. stripped binary in > /usr/bin/foo or whereever) and to the CDAR or individual .debug file. kernel-2.6.35.12-90.fc14.x86_64 # find /usr/lib/debug/.build-id/ | wc -l 13604 # find /usr/lib/debug/.build-id/ -type l | wc -l 13347 # du /var/lib/rpm 158088 /var/lib/rpm Difference after: cp -a /usr/lib/debug/.build-id/ /mnt/ Filesystem 1K-blocks Used Available Use% Mounted on /tmp/300m.ext4-1k 297485 10254 271871 4% /mnt /tmp/300m.ext4-1k 297485 18160 263965 7% /mnt 18160-10254=7906 /tmp/300m.ext4-4k 297560 16552 265648 6% /mnt /tmp/300m.ext4-4k 297560 44196 238004 16% /mnt 44196-16552=27644 /tmp/300m.btrfs 307200 56 307144 1% /mnt /tmp/300m.btrfs 307200 17824 289376 6% /mnt 17824-56=17768 I find the size acceptable. It has very slow full scan but that is never needed. It has fast lookups and IMO bearable updates (rpm -i, rpm -e).
This package has changed ownership in the Fedora Package Database. Reassigning to the new owner of this component.
This would also benefit the darkserver project https://darkserver.fedoraproject.org/ Darkserver is a service written to help people finding details of build-id(s). People will be able query the service based on build-id(s) or rpm package names. The service will provide output in JSON format as it will be easier for other tools to parse the output. https://fedoraproject.org/wiki/Darkserver
(In reply to Jan Kratochvil from comment #0) > One could also just move the binary symlinks from *-debuginfo.rpm to the > main binary rpms. This landed in rpm as a part of the major debuginfo rewrite couple of years ago, and all Fedora packages starting with F27 have the info embedded. So I guess we can consider the case closed. (/me, going through old bugs)