Bug 501582 - ld corrupts build ID generation
Summary: ld corrupts build ID generation
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: binutils
Version: 11
Hardware: All
OS: Linux
low
medium
Target Milestone: ---
Assignee: Nick Clifton
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-05-19 20:30 UTC by Roland McGrath
Modified: 2011-01-28 20:02 UTC (History)
8 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2009-07-14 09:50:23 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
tar file contains input files for link (231.10 KB, application/octet-stream)
2009-05-19 20:30 UTC, Roland McGrath
no flags Details
Patch/workaround to avoid reading uninitialised memory (1.65 KB, patch)
2009-05-27 14:47 UTC, Nick Clifton
no flags Details | Diff
Fix elf_checksum_contents (661 bytes, patch)
2009-07-13 12:20 UTC, Andreas Schwab
no flags Details | Diff

Description Roland McGrath 2009-05-19 20:30:52 UTC
Created attachment 344692 [details]
tar file contains input files for link

Description of problem:

ld --build-id is not producing the same ID bits for two identical runs.
This is a regression since binutils-2.18.50.0.9-8.fc10.x86_64

Version-Release number of selected component (if applicable):
binutils-2.19.51.0.2-17.fc11.x86_64

How reproducible:
100%

Steps to Reproduce:
1. unpack attached tarball
2. /usr/bin/ld -b elf64-x86-64 -m elf_x86_64  --build-id=sha1 -L. -o hello4 -T link.res; eu-unstrip -n -e hello4
3. repeat #2, see different build ID bits
  
Actual results:

Repeated links get different build ID bits every time.

Expected results:

Repeated links should produce the same bits every time.

Additional info:

See attached file with test input files.

Comment 1 Nick Clifton 2009-05-27 14:47:21 UTC
Created attachment 345618 [details]
Patch/workaround to avoid reading uninitialised memory

Comment 2 Nick Clifton 2009-05-27 14:53:25 UTC
Hi Roland,

  I have uploaded a patch which fixes this problem, although to my mind it is more of a workaround than a real fix.

  The problem is that when computing the contribution of the contents of the .got section to the build-id, the values being examined are random.  The patch works around the problem by making sure that the bfd_malloc_and_get_section function always uses zero-initialised memory for the buffer that it returns.

  I think that the real issue is why are the contents of the .got section not available at the time when the build-id is computed ?  Presumably the build-id is being computed too early, but I have yet to investigate this.

  Anyway let me know what you think of the patch, and I will continue to look at this problem, although it may take me some time.

Cheers
  Nick

Comment 3 Roland McGrath 2009-05-27 17:59:32 UTC
I don't grok ld internals enough to have any opinion about the patch.
The ID calculation needs to be the very last thing done before writing out the contents, since it includes all of them.  The original placement of the hook was as advised by somebody you knows ld guts far better than I do.

What is most curious to me is how this changed from the old binutils version that does not exhibit the bug.

Comment 4 Bug Zapper 2009-06-09 16:07:49 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 11 development cycle.
Changing version to '11'.

More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 5 John Mallett 2009-07-06 12:49:40 UTC
line 95 on the spec file needs to be changed 

%define fpcopt -k"--build-id -z noexecstack"
to 
%define fpcopt "--build-id -z noexecstack"
removed the -k 
also you could have --build-id sha1 if you want

Comment 6 Andreas Schwab 2009-07-13 12:20:42 UTC
Created attachment 351466 [details]
Fix elf_checksum_contents

The real problem is that elf_checksum_contents calls bfd_malloc_and_get_section with a pointer to sec->contents as buffer pointer.  Then bfd_malloc_and_get_section allocates a buffer and stores it into the buffer pointer, ie. sec->contents.  Then bfd_get_section_contents looks whether SEC_IN_MEMORY is set for this section, and when it is it copies the (uninitialized) contents from sec->contents over itself.

Comment 7 Nick Clifton 2009-07-14 09:50:23 UTC
Hi Andreas,

  Thanks for sorting this out.  I have modified the build-id patch in the rawhide binutils to include your fix and checked it in.  See  binutils-2_19_51_0_11-25_fc12.

Cheers
  Nick

Comment 8 Andreas Schwab 2009-07-14 12:18:45 UTC
The change to section.c is no longer needed.

Comment 9 Nick Clifton 2009-07-17 10:14:36 UTC
Hi Andreas,

> The change to section.c is no longer needed.

True but it does no harm, so I am not going to create a new point release just for this. :-)

Cheers
  Nick

Comment 10 Peter Bergner 2010-09-16 22:39:12 UTC
Hi Nick,

The section.c change actually seems to be causing a large link time degradation for programs with a large bss section.  This was first noticed on the stream benchmark, but the test case below (powerpc64-linux assembler, but this seems to be arch independent) takes nearly a minute to link with the section.c change and well below 1 second when using --build-id=none on the link (strace and gdb confirm that the bfd_zmalloc() call in bfd_malloc_and_get_section() is where we're spending all of our link time).

[bergner@alien3 ~]$ cat build-id-bug.s 
        .file   "foo.c"
        .section        ".toc","aw"
        .section        ".text"
        .align 2
        .p2align 4,,15
        .globl main
        .section        ".opd","aw"
        .align 3
main:
        .quad   .L.main,.TOC.@tocbase
        .previous
        .type   main, @function
.L.main:
        li 3,0
        blr
        .long 0
        .section        ".bss"
        .align 3
        .type   a, @object
        .size   a, 800000000
a:
        .zero   800000000

[bergner@alien3 ~]$ time gcc -m64 build-id-bug.s 

real    0m53.586s
user    0m1.713s
sys     0m0.195s

[bergner@alien3 ~]$ time gcc -m64 -Wl,--build-id=none build-id-bug.s

real    0m0.034s
user    0m0.008s
sys     0m0.003s

sometime later...the following C test case also seems to generate the slowdown in either 32-bit or 64-bit modes:

int a[200000000];
int
main (void)
{
  return a[10];
}

Peter

Comment 11 Jakub Jelinek 2011-01-28 16:09:26 UTC
If the only place where section contents is requested for SHT_NOBITS sections is the elf_checksum_contents routine, then obviously for SHT_NOBITS it doesn't make any sense to checksum all the zeros, so it should just skip SHT_NOBITS sections.

Comment 12 Roland McGrath 2011-01-28 18:39:08 UTC
Note that this is a non-problem for Fedora rpm builds (no rebuilds needed, phew!) even if Fedora's ld became buggy again for some package builds, because rpmbuild's debugedit recomputes the build IDs from scratch anyway.

Comment 13 Jakub Jelinek 2011-01-28 19:29:35 UTC
Has that changed today?
http://koji.fedoraproject.org/koji/getfile?taskID=2747376&name=build.log
The reason I raised it up (readded the binutils build-id patch in the rawhide build and filed upstream bug) was because I couldn't build gcc rpms today, with errors like:
extracting debug info from /builddir/build/BUILDROOT/gcc-4.6.0-0.5.fc15.i386/usr/bin/i686-redhat-linux-gcc
extracting debug info from /builddir/build/BUILDROOT/gcc-4.6.0-0.5.fc15.i386/usr/bin/gcjh
*** ERROR: same build ID in nonidentical files!
        /usr/bin/gcjh
   and  /usr/bin/gtnameserv

Comment 14 Roland McGrath 2011-01-28 19:55:44 UTC
Oh, right.  I forgot about those checks.  Nothing has changed.  I was just pointing out that there should be no danger of silent bad build IDs in rpmbuild packages.  But those checks are based on the ID that debugedit emits, so if that is happening then there might be a problem with debugedit, which is worrying.

You can look at the IDs that ld produced with eu-readelf -n.
Then you can run /usr/lib/rpm/debugedit -b /... -d /usr/src/debug -i by hand.  It should recompute the IDs after editting the DWARF, and then emit what it wrote.  Then eu-readelf -n should show that new ID.

Comment 15 Roland McGrath 2011-01-28 20:02:36 UTC
Hmm, debugedit doesn't recompute the ID unless it changed any DWARF data.  So perhaps those are binaries that didn't have any DWARF in them?


Note You need to log in before you can comment on or make changes to this bug.