Created attachment 344692 [details] tar file contains input files for link Description of problem: ld --build-id is not producing the same ID bits for two identical runs. This is a regression since binutils-2.18.50.0.9-8.fc10.x86_64 Version-Release number of selected component (if applicable): binutils-2.19.51.0.2-17.fc11.x86_64 How reproducible: 100% Steps to Reproduce: 1. unpack attached tarball 2. /usr/bin/ld -b elf64-x86-64 -m elf_x86_64 --build-id=sha1 -L. -o hello4 -T link.res; eu-unstrip -n -e hello4 3. repeat #2, see different build ID bits Actual results: Repeated links get different build ID bits every time. Expected results: Repeated links should produce the same bits every time. Additional info: See attached file with test input files.
Created attachment 345618 [details] Patch/workaround to avoid reading uninitialised memory
Hi Roland, I have uploaded a patch which fixes this problem, although to my mind it is more of a workaround than a real fix. The problem is that when computing the contribution of the contents of the .got section to the build-id, the values being examined are random. The patch works around the problem by making sure that the bfd_malloc_and_get_section function always uses zero-initialised memory for the buffer that it returns. I think that the real issue is why are the contents of the .got section not available at the time when the build-id is computed ? Presumably the build-id is being computed too early, but I have yet to investigate this. Anyway let me know what you think of the patch, and I will continue to look at this problem, although it may take me some time. Cheers Nick
I don't grok ld internals enough to have any opinion about the patch. The ID calculation needs to be the very last thing done before writing out the contents, since it includes all of them. The original placement of the hook was as advised by somebody you knows ld guts far better than I do. What is most curious to me is how this changed from the old binutils version that does not exhibit the bug.
This bug appears to have been reported against 'rawhide' during the Fedora 11 development cycle. Changing version to '11'. More information and reason for this action is here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping
line 95 on the spec file needs to be changed %define fpcopt -k"--build-id -z noexecstack" to %define fpcopt "--build-id -z noexecstack" removed the -k also you could have --build-id sha1 if you want
Created attachment 351466 [details] Fix elf_checksum_contents The real problem is that elf_checksum_contents calls bfd_malloc_and_get_section with a pointer to sec->contents as buffer pointer. Then bfd_malloc_and_get_section allocates a buffer and stores it into the buffer pointer, ie. sec->contents. Then bfd_get_section_contents looks whether SEC_IN_MEMORY is set for this section, and when it is it copies the (uninitialized) contents from sec->contents over itself.
Hi Andreas, Thanks for sorting this out. I have modified the build-id patch in the rawhide binutils to include your fix and checked it in. See binutils-2_19_51_0_11-25_fc12. Cheers Nick
The change to section.c is no longer needed.
Hi Andreas, > The change to section.c is no longer needed. True but it does no harm, so I am not going to create a new point release just for this. :-) Cheers Nick
Hi Nick, The section.c change actually seems to be causing a large link time degradation for programs with a large bss section. This was first noticed on the stream benchmark, but the test case below (powerpc64-linux assembler, but this seems to be arch independent) takes nearly a minute to link with the section.c change and well below 1 second when using --build-id=none on the link (strace and gdb confirm that the bfd_zmalloc() call in bfd_malloc_and_get_section() is where we're spending all of our link time). [bergner@alien3 ~]$ cat build-id-bug.s .file "foo.c" .section ".toc","aw" .section ".text" .align 2 .p2align 4,,15 .globl main .section ".opd","aw" .align 3 main: .quad .L.main,.TOC.@tocbase .previous .type main, @function .L.main: li 3,0 blr .long 0 .section ".bss" .align 3 .type a, @object .size a, 800000000 a: .zero 800000000 [bergner@alien3 ~]$ time gcc -m64 build-id-bug.s real 0m53.586s user 0m1.713s sys 0m0.195s [bergner@alien3 ~]$ time gcc -m64 -Wl,--build-id=none build-id-bug.s real 0m0.034s user 0m0.008s sys 0m0.003s sometime later...the following C test case also seems to generate the slowdown in either 32-bit or 64-bit modes: int a[200000000]; int main (void) { return a[10]; } Peter
If the only place where section contents is requested for SHT_NOBITS sections is the elf_checksum_contents routine, then obviously for SHT_NOBITS it doesn't make any sense to checksum all the zeros, so it should just skip SHT_NOBITS sections.
Note that this is a non-problem for Fedora rpm builds (no rebuilds needed, phew!) even if Fedora's ld became buggy again for some package builds, because rpmbuild's debugedit recomputes the build IDs from scratch anyway.
Has that changed today? http://koji.fedoraproject.org/koji/getfile?taskID=2747376&name=build.log The reason I raised it up (readded the binutils build-id patch in the rawhide build and filed upstream bug) was because I couldn't build gcc rpms today, with errors like: extracting debug info from /builddir/build/BUILDROOT/gcc-4.6.0-0.5.fc15.i386/usr/bin/i686-redhat-linux-gcc extracting debug info from /builddir/build/BUILDROOT/gcc-4.6.0-0.5.fc15.i386/usr/bin/gcjh *** ERROR: same build ID in nonidentical files! /usr/bin/gcjh and /usr/bin/gtnameserv
Oh, right. I forgot about those checks. Nothing has changed. I was just pointing out that there should be no danger of silent bad build IDs in rpmbuild packages. But those checks are based on the ID that debugedit emits, so if that is happening then there might be a problem with debugedit, which is worrying. You can look at the IDs that ld produced with eu-readelf -n. Then you can run /usr/lib/rpm/debugedit -b /... -d /usr/src/debug -i by hand. It should recompute the IDs after editting the DWARF, and then emit what it wrote. Then eu-readelf -n should show that new ID.
Hmm, debugedit doesn't recompute the ID unless it changed any DWARF data. So perhaps those are binaries that didn't have any DWARF in them?