Bug 1002341 - undesired cross-rpm elf build-id duplication due to strict content-hash handling in debugedit.c
Summary: undesired cross-rpm elf build-id duplication due to strict content-hash handl...
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: rpm
Version: rawhide
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Packaging Maintenance Team
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: 901635 1031060 1031063
TreeView+ depends on / blocked
 
Reported: 2013-08-29 00:00 UTC by Frank Ch. Eigler
Modified: 2017-04-04 15:12 UTC (History)
18 users (show)

Fixed In Version:
Doc Type: Enhancement
Doc Text:
Clone Of:
: 1031060 1031063 (view as bug list)
Environment:
Last Closed: 2017-04-04 15:12:50 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
prototype for build-id salting at the debugedit level (3.71 KB, patch)
2013-08-29 00:01 UTC, Frank Ch. Eigler
no flags Details | Diff
find-debuginfo.sh + debugedit.c patch (2.69 KB, patch)
2013-08-31 19:56 UTC, Jan Kratochvil
no flags Details | Diff
%{buildsubdir} -> NVRA rpm patch (1.71 KB, patch)
2013-09-04 05:46 UTC, Jan Kratochvil
no flags Details | Diff
%{buildsubdir} -> NVRA rpm patch #2 (2.23 KB, patch)
2013-09-09 20:20 UTC, Jan Kratochvil
no flags Details | Diff

Description Frank Ch. Eigler 2013-08-29 00:00:47 UTC
It has been observed in the wild that distinct RPM builds (usually of
related software, slight version variance) can result in some files
with ELF build-id tags that match each other.  This is undesirable if
there is a need to concurrently install the related -debuginfo
subrpms, because then file conflicts are detected by rpm under
/usr/lib/debug/.build-id/XX/YYYYYY*.

It is sometimes desirable to install debuginfo for multiple versions,
if e.g. more than one kernel happens to be installed and one may want
to debug each of them after a reboot; or to have a machine serve as a
debuginfo repository for other purposes (such as systemtap module
compilation).

The problem comes from the build-id numbers being coded in Fedora/RHEL
as the default ELF-content-hash.  For similar enough software, the
compiler's output content may match for some executables/libraries,
and thus result in the duplicate-build-id.  That results in conflicts.

Roland McGrath has opined [1] that using a uuid (random, apprx
guaranteed non-conflicting) build-id would not be appropriate because
it would sacrifice repeatable builds.  I'm not so sure about that this
is actually so valuable.  Still, it turns out that we don't have to
sacrifice the property while having unique build-ids across similar
builds.  The proposed answer is to salt the content hash: to include
an additional build-identification string into the hash calculations,
for example the RPM's n-v-r-a full name.

The attached patch does this at the rpm/find-debuginfo/debugedit
level.  Testing indicates that the resulting files/packages are fine:

- just as functional as before, both execution (.text segments
  bitwise identical) and debugging
- stripped .debug files are indeed named differently by n-v-r-a,
  even for identical sources & direct object code
- the new .debug hashes are preserved across recompilation of the
  n-v-r-a rpm (let's call this maintaining the Roland Invariant) 
- the change is system-wide, but transparent in that existing
  debuggers etc. work without modification or awareness that the
  hashing algorithm has changed
- no changes are required to binutils/ld (because debugedit already
  recalculates the build-ids from scratch (!)).  The same salting
  concept may be welcome there (for non-fedora-distro usage).
- no changes are required to individual packages / spec files

[1] http://fedoraproject.org/wiki/Releases/FeatureBuildId

Comment 1 Frank Ch. Eigler 2013-08-29 00:01:51 UTC
Created attachment 791552 [details]
prototype for build-id salting at the debugedit level

Comment 2 Mark Wielaard 2013-08-29 08:01:59 UTC
Nice. I like the simple solution, that it works transparently and the fact that it maintains the "Roland Invariant". Since you use the ${name} in the salt this should also help in the case of an SCL (Software Collection) packages (which changes the package name, but might have some identical binaries).

Comment 3 Kyle McMartin 2013-08-29 13:31:55 UTC
Nice work, Frank. I wish I'd thought of it myself. ;-)

Comment 4 Roland McGrath 2013-08-29 17:58:04 UTC
The original plan for this, which still is the actually right answer, is that no two builds not intended to be identical should use the same source file names.  That is, include N-V-R.A in the source directory name, where historically only N-V was included.  If that's done, then no two builds would ever collide on build IDs unless they lacked -g, and any such lacks should just be fixed.

Frank's hack is an approximation of this, and suffices for the build ID part of the reason that N-V-R.A in source directory names is desireable.  It doesn't address the other reasons that it's desireable, and by papering over the build ID problem with the status quo it just further delays the proper resolution (which has already been delayed five years).

Comment 5 Mark Wielaard 2013-08-30 08:23:44 UTC
(In reply to Roland McGrath from comment #4)
> The original plan for this, which still is the actually right answer, is
> that no two builds not intended to be identical should use the same source
> file names.  That is, include N-V-R.A in the source directory name, where
> historically only N-V was included.  If that's done, then no two builds
> would ever collide on build IDs unless they lacked -g, and any such lacks
> should just be fixed.

Aha, so that is why debugedit recalculates the build-id.
Because it changes the DWARF source file locations.

Currently we call debugedit (from find-debuginfo.sh) with:
-b "$RPM_BUILD_DIR" -d /usr/src/debug

So one solution would just be to call it with:
-b "$RPM_BUILD_DIR" -d /usr/src/debug/${N-V-R.A}

But maybe we should adjust rpmbuild/fedpkg/koji to use ${N-V-R.A} directly?
I don't know where that is precisely.
Is that something that is controlled by an rpm macro?

Comment 6 Mark Wielaard 2013-08-30 08:35:26 UTC
Just an note to make sure to double check the ${N-V-R.A} is truly unique with respect to SCL (Software Collection) packages. Currently it seems they are both build with the same "basename" as their non-SCLified counterparts (e.g. both are build inside a directory called elfutils-0.154). But I am probably mixing my terminology. Just want to make sure that when using the  ${N-V-R.A} for the source directory name inside the RPM_BUILD_DIR is indeed unique for a "base package" and a "SCL" package.

Comment 7 Jan Kratochvil 2013-08-30 09:04:44 UTC
I wrote some patches for NVRA rpm source dirs in 2008:
  http://people.redhat.com/jkratoch/multidebug/
This is probably why I got the Cc from Roland.  Going to rebase them and recheck what those patches did.

Comment 8 Jan Kratochvil 2013-08-31 19:56:55 UTC
Created attachment 792472 [details]
find-debuginfo.sh + debugedit.c  patch

(In reply to Jan Kratochvil from comment #7)
> I wrote some patches for NVRA rpm source dirs in 2008:
>   http://people.redhat.com/jkratoch/multidebug/

This was too ugly.


(In reply to Mark Wielaard from comment #5)
> Currently we call debugedit (from find-debuginfo.sh) with:
> -b "$RPM_BUILD_DIR" -d /usr/src/debug
> 
> So one solution would just be to call it with:
> -b "$RPM_BUILD_DIR" -d /usr/src/debug/${N-V-R.A}

I have tried this idea and it seems to work (tested on dwdiff.fc19) but I haven't tried a test-mass-rebuild with it.

Comment 9 Frank Ch. Eigler 2013-09-03 14:08:02 UTC
I was under the impression that debugedit with a larger destdir was avoided
partly because of string length limitations: rpm tools/debugedit.c:

      if (strlen (dest_dir) > strlen (base_dir))
        {
          fprintf (stderr, "Dest dir longer than base dir is not supported\n");
          exit (1);
        }

Comment 10 Kyle McMartin 2013-09-03 14:10:04 UTC
yeah, the build path (base_dir) has to be larger than the install path (dest_dir) since debugedit just does a simple replacement and pad with NUL instead of fiddling with the binary in any way.

Comment 11 Jan Kratochvil 2013-09-03 14:12:03 UTC
Is it a problem in this case?  Test build went fine for me but I did not test it in Koji as it would require a chain build I am not aware how to do.

Or maybe we could finally rewrite debugedit as full read-in/write-out?

Comment 12 Kyle McMartin 2013-09-03 14:20:20 UTC
We can ask rel-eng for a private tag for this if you want.

I don't think it's a problem, but I don't off-hand remember what the base_dir is... I *think* it's /builddir/build/BUILD/N-V but my memory is hazy.

Comment 13 Jan Kratochvil 2013-09-03 14:40:34 UTC
If here is an agreement with my Comment 8 we could go with the test rebuild.

In fact Koji path should be equal to mock path so it should be all reproducible locally.  I will ask at #fedora-devel whether Koji tag is OK or if I rather should run a test mass rebuild on some Red Hat machine.

Comment 14 Frank Ch. Eigler 2013-09-03 15:17:25 UTC
Without mechanically ensuring enlargement of the builddir path length, you're likely to get sporadic failures, e.g. from packages that do their own
insufficiently-long BuildRoot:-or-equivalent setup in their spec files.

Comment 15 Jan Kratochvil 2013-09-03 19:30:50 UTC
repoquery --disablerepo='*' --enablerepo=rawhide-source --archlist=src --qf '-%{release}.x86_64 %{name}-%{version}' -qa|perl -lne 'print length((/^(\S+)/)[0])." $_"'|sort -n|tail
41 -0.9.20120403gitf7d3f802f3750.fc20.x86_64 gauche-gtk-0.6

So the largest enlargement is by 41 characters which can be workarounded for mock (from its original '/builddir') by:

config_opts['chroothome'] = '/builddir11111111112222222222333333333344444444445555555555'
config_opts['environment']['HOME'] = config_opts['chroothome']
config_opts['macros']['%_topdir'] = '%s/build' % config_opts['chroothome']

But that also means such packages may no longer be buildable by users with regular rpmbuild.  Maybe that will at least force their maintainers to make the revision strings shorter. :-)

Comment 16 Jan Kratochvil 2013-09-03 19:31:19 UTC
s/revision strings/release strings/

Comment 18 Jan Kratochvil 2013-09-03 19:40:35 UTC
Although it may be better to fix rather the build source directory, which will have no problems with debugedit then.  It may be incompatible with some .spec files but Fedora can fix that for the next release.

Comment 19 Frank Ch. Eigler 2013-09-03 19:49:20 UTC
(... whereas the salting approach works for current & unmodified spec files.)

Comment 20 Jan Kratochvil 2013-09-04 05:46:41 UTC
Created attachment 793468 [details]
%{buildsubdir} -> NVRA rpm patch

But the salting approach works only for the kernel package, isn't it?
Other packages conflict due to /usr/src/debug/N-V/ .

Running a new mass rebuild with this patch instead.  It needs no mock changes and it has no problem with debugedit.

Comment 21 Jan Kratochvil 2013-09-04 20:31:22 UTC
Mass rebuild results so far for the Comment 20 patch:

352 rebuilt fine
 28 failed - 5 are unrelated FTBFS, 4 are real regressions; rest still untested.

A clear reason can be seen for example in:
Canna-3.7p3-40.fc20 in .spec:
 - cat $RPM_BUILD_DIR/Canna-%{version}/zipcode[...]

But apparently the number of regressions needing a fix should be very small.

Comment 22 Jan Kratochvil 2013-09-09 20:20:05 UTC
Created attachment 795744 [details]
%{buildsubdir} -> NVRA rpm patch #2

Comment 23 Jan Kratochvil 2013-09-10 13:25:20 UTC
  4093 pass
   243 fail
    81 cleanfail
    42 cleanpass
-> 100.0*42/(4093+243) = 0.968% of packages PASS->FAIL to build with new rpm
With 14166 Fedora packages it is about 137 packages needing to be fixed.

I have verified few of the packages that they really have needless .spec assumptions:
Canna-3.7p3-40.fc20.src.rpm
 - cat $RPM_BUILD_DIR/Canna-%{version}/zipcode[...]
GMT-4.5.9-6.fc21.src.rpm
 - ln -s %{gmthome}/coast $RPM_BUILD_DIR/GMT%{version}/share
GAPDoc-1.5.1-4.fc20.src.rpm
 - cp -a %{name}-%{version} $RPM_BUILD_ROOT%{_gap_dir}/pkg
R-3.0.1-4.fc20.src.rpm
 - %define __perl_provides %{_builddir}/R-%{version}/%{name}-prov
   chmod +x %{__perl_provides}
android-tools-20130123git98d0789-2.fc20.src.rpm
 - Wrong %setup, also not using -T + NVRA cannot work with two top directories
   %setup -q -b 1 -n extras
   %setup -q -b 0 -n %{packdname}

Is the Comment 22 patch acceptable for F-21 rpmbuild?  It needs doc updates and to write a Fedora Change page.

While it fixes the build-id conflicts it still does not enable installing multiple *-debuginfo.rpm (except for kernel-debuginfo.rpm) because files like /usr/lib/debug/usr/bin/BINARY will fail.  It would need an additional change (so that /usr/lib/debug/.build-id/** turn symlinks->files, /usr/lib/debug/usr/** turn files->symlinks and conflicts of those symlinks are ignored, like with some multi-arch coloring).

Comment 24 Frank Ch. Eigler 2013-09-10 14:28:11 UTC
"It would need an additional change (so that /usr/lib/debug/.build-id/** turn symlinks->files, /usr/lib/debug/usr/** turn files->symlinks and conflicts of those symlinks are ignored, like with some multi-arch coloring)."

Those changes are IMHO well overdue.

Comment 25 Jan Zeleny 2013-09-11 11:55:40 UTC
Changing the needinfo to Panu, as he is the most qualified one to answer this.

Comment 26 Panu Matilainen 2013-10-15 08:11:48 UTC
I've a rather uneasy feeling about the directory rename. A less disruptive way of achieving NVRA in the paths could be arranging $RPM_BUILD_DIR to be per-package private, ie create an intermediate N(E)VRA directory as $RPM_BUILD_DIR / %{_builddir}. AFAICS this would achieve the desired N(E)VRA salting in absolutely all packages, avoid the breakages witnessed in comment #23 and make the whole thing a bit "safer" against bad tarballs and the like.

Comment 27 Jan Kratochvil 2013-10-15 08:21:36 UTC
The disadvantage would be that /usr/src/debug/* directories would all have one additional subdirectory (of mostly the same name).  This does not seem clean to me.

With the few .spec files failing compatibility test I checked in Comment 23 it mostly pointed to already imperfectly coded .spec files anyway.

Comment 28 Panu Matilainen 2013-10-15 08:53:56 UTC
The specs in comment #23 aren't doing anything particularly evil though, and if Fedora specs have such "imperfections" despite all the gazillion guidelines, think of all the specs elsewhere in the world. Unnecessarily breaking non-evil specs is bad.

There might well be better alternatives yet, but the private builddir approach solves other (ages old) issues besides the one at hand here.

Comment 29 Frank Ch. Eigler 2013-10-15 09:40:50 UTC
(In reply to Panu Matilainen from comment #26)
> I've a rather uneasy feeling about the directory rename. A less disruptive
> way of achieving NVRA in the paths could be arranging $RPM_BUILD_DIR to be
> per-package private, ie create an intermediate N(E)VRA directory as
> $RPM_BUILD_DIR / %{_builddir}.

Yes, that's fine.

> AFAICS this would achieve the desired N(E)VRA salting in absolutely all packages

No, because there exist packages (e.g. ocaml) that generate debuginfo without
embedded build-tree name references, which lead to pure content-based 
hash collisions.  The debugedit-level hash salting would still be beneficial.

Comment 30 Panu Matilainen 2013-10-15 11:11:42 UTC
(In reply to Frank Ch. Eigler from comment #29)
> (In reply to Panu Matilainen from comment #26)
> > AFAICS this would achieve the desired N(E)VRA salting in absolutely all packages
> 
> No, because there exist packages (e.g. ocaml) that generate debuginfo without
> embedded build-tree name references, which lead to pure content-based 
> hash collisions.  The debugedit-level hash salting would still be beneficial.

Ah, what I really meant (I suppose) is that it'd cause practically all packages to contain the "salting" in their source filenames no matter what the spec does. Whether other tooling actually uses the source filenames for this is a different issue.

Anyway, purely from rpm POV the debugedit-level salting would be by far the least disruptive (and thus preferred) option and one that seems more independent of other tooling as well.

Comment 31 Jan Kratochvil 2013-10-15 11:42:06 UTC
So the pros and cons are:
pro: keeping compatibility with ~0.968% of .spec files
con: needless single subdirectory in /usr/src/debug/*/
con: it is only rpm-level hack, upstream tools are still not fixed

Comment 32 Jan Kratochvil 2013-10-16 16:15:21 UTC
FYI I am not going to work on the remaining part if rpms are going to be broken this way.  I believed it can be done the right way and not just hacked up.

Comment 33 Panu Matilainen 2013-10-21 12:04:05 UTC
Um, what exactly are you referring with "this way"? I see about three different approaches being suggested, each with different set of pros and cons.

Comment 34 Jan Kratochvil 2013-10-21 12:06:10 UTC
I was referring to the Comment 28 introducing /usr/src/debug/* directories containing always just one subdirectory.

Comment 35 Panu Matilainen 2017-04-04 15:12:50 UTC
This should be fixed now in rawhide as of rpm >= 4.13.0.1-4 as a side-effect of the backports to enable parallel-installable debuginfo.


Note You need to log in before you can comment on or make changes to this bug.