Description of problem: The gcc update pushed to FC6 updates on January 10 caused a major performance regression in linking the files compiled by this gcc. This make the write-compile-test cycle for Mozilla development quite painful. Version-Release number of selected component (if applicable): Jan 10 14:27:20 Updated: libgcc.i386 4.1.1-51.fc6 Jan 10 14:28:21 Updated: libstdc++.i386 4.1.1-51.fc6 Jan 10 14:28:28 Updated: libgcj.i386 4.1.1-51.fc6 Jan 10 14:28:29 Updated: libgomp.i386 4.1.1-51.fc6 Jan 10 14:28:30 Updated: libgfortran.i386 4.1.1-51.fc6 Jan 10 14:28:51 Updated: libgcj-devel.i386 4.1.1-51.fc6 Jan 10 14:29:01 Updated: libstdc++-devel.i386 4.1.1-51.fc6 Jan 10 14:29:02 Updated: libmudflap.i386 4.1.1-51.fc6 Jan 10 14:29:04 Updated: cpp.i386 4.1.1-51.fc6 Jan 10 14:29:08 Updated: gcc.i386 4.1.1-51.fc6 Jan 10 14:29:10 Updated: libobjc.i386 4.1.1-51.fc6 Jan 10 14:29:33 Updated: gcc-gfortran.i386 4.1.1-51.fc6 Jan 10 14:31:03 Updated: gcc-debuginfo.i386 4.1.1-51.fc6 Jan 10 14:31:05 Updated: gcc-c++.i386 4.1.1-51.fc6 Jan 10 14:31:15 Updated: gcc-java.i386 4.1.1-51.fc6 Jan 10 14:32:24 Updated: libgnat.i386 4.1.1-51.fc6 How reproducible: Always Steps to Reproduce: My steps to reproduce were building Mozilla trunk, --enable-optimize="-O2 -fno-omit-frame-pointer" and --enable-debug. I'm looking at the time it takes to link libgklayout.so, which is built in mozilla/layout/build/ and consists of the code compiled in the mozilla/dom/, mozilla/content/, and mozilla/layout/ subdirectories of the Mozilla tree. Expected results: Before the gcc update above, i.e., with all of the above packages at version 4.1.1-30, compiling dom, content, and layout produced .o files and .a files that led to libgklayout linking on my laptop in: real 0m22.756s user 0m10.965s sys 0m1.543s Actual results: After upgrading the packages to 4.1.1-51.fc6, linking libgklayout.so takes: real 2m49.734s user 2m25.087s sys 0m4.782s (Both of these timings are the second link in a row, to get "warm" times.) Additional info: I did test that the performance regression is related to the compiler used to compile the .o/.a files, not the compiler used for the final link command.
The most noticeable difference between an objdump -Cd of the before and after .o files is that inline functions used to (fast) be output into sections like ".gnu.linkonce.t._Z12VERIFY_COORDi" and now (slow) they are output into sections like ".text._Z12VERIFY_COORDi".
The change on the GCC side was an intentional bugfix. GCC configury wasn't able to parse the enhanced FC6+ binutils version numbers (containing -%{release} at the end) and therefore assumed the linker doesn't support COMDAT groups. See e.g. #215317 for an example of a bug that was fixed by this. Now, unlike FC5 ld shouldn't be horribly slow with this, see binutils-2.17.50.0.6-kept-section.patch Some slowdown is certainly to be expected, guess if you prepare a tarball with all the objects you are linking together and the exact ld command line, some oprofiling of ld could reveal one or two spots that can be still speeded up.
I have a 67MB tar.bz2 file with an ld command that links (i686 arch). I've posted it at http://dbaron.org/tmp/rh-bug-223181.tar.bz2 . Please let me know once you've downloaded it so I can remove it from my Web space. (It's not quite pure, since I've downgraded gcc, and I forgot about one of the directories involved, mozilla/view/, so I didn't save it and thus a single one of the .a files with only three .o files is actually built using the old compiler. However, it nevertheless shows the performance problems.)
Please try rawhide binutils. It seems binutils-2.17.50.0.6-kept-section.patch doesn't measurably help on this testcase (but the time is spent mostly in _bfd_elf_check_kept_section and functions it calls). rawhide binutils contains a different fix for this: http://sources.redhat.com/ml/binutils/2006-11/msg00190.html but even backporting that patch alone didn't make any visible difference. With 2.17.50.0.6 binutils ld takes around 120 sec on my box, while 2.17.50.0.9 binutils takes just 7 sec.
It has been fixed in binutils in CVS: [hjl@gnu rh-bug-223181]$ time make LD=/usr/bin/ld /usr/bin/ld --eh-frame-hdr ... ... real 2m20.714s user 2m14.343s sys 0m5.264s [hjl@gnu rh-bug-223181]$ time make ./ld --eh-frame-hdr ... ... real 0m16.837s user 0m13.281s sys 0m1.749s
In fact, this bug is fixed in Linux binutils 2.17.50.0.6 from kernel.org: [hjl@gnu rh-bug-223181]$ LD_LIBRARY_PATH=~/usr/lib time make LD=~/usr/bin/ld /export/home/hjl/usr/bin/ld --eh-frame-hdr ... ... 13.75user 1.74system 0:17.06elapsed 90%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+77781minor)pagefaults 0swaps But you have to run patches/README to apply patches to correct this before building binutils.
Yeah, it's also fixed for me in binutils from development. It would be good to get the necessary patches into updates; I've heard complaints on IRC from some other people as well.
There are 2 choices going forward: 1. I run patches/README before creating Linux binutils tar ball. 2. Red Hat adds "patches/README" at the end of %setup in binutils.spec, which will fix this bug with a rebuild.
binutils-2.17.50.0.6-3.fc6 in FC6 testing updates should fix this, please check it out. It contains my version of the fix rather than hjl's which I wasn't aware of until I wrote the patch.
It took longer than I hoped, but binutils-2.17.50.0.6-3.fc6 has been finally pushed today: https://www.redhat.com/archives/fedora-test-list/2007-January/msg00259.html
The binutils in updates-testing are a big improvement over the released binutils, but they're not as good as the ones in devel. I'm seeing: real 0m46.389s user 0m27.146s sys 0m4.655s with binutils-2.17.50.0.6-3.fc6 (updates-testing) whereas with binutils-2.17.50.0.9-1 (development) I see: real 0m31.537s user 0m14.255s sys 0m2.377s (In both cases, those are the second time, so that the files are in memory.)
The code in F7 binutils is more invasive, certainly not appropriate for FC6.