Bug 1543912 - rpmbuild: crash in debuginfo generation with lto and "-Wl,--gc-sections"
Summary: rpmbuild: crash in debuginfo generation with lto and "-Wl,--gc-sections"
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: binutils
Version: 28
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Nick Clifton
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-02-09 14:44 UTC by Zbigniew Jędrzejewski-Szmek
Modified: 2019-05-03 07:23 UTC (History)
17 users (show)

Fixed In Version: binutils-2.29.1-23.fc28
Clone Of:
Environment:
Last Closed: 2019-05-03 07:23:52 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
Corrupt-looking "systemd-bootchart" ELF x86_64, generated as per comment #7 (527.84 KB, application/x-sharedlib)
2018-02-28 14:21 UTC, Dave Malcolm
no flags Details
Test attachment (506 bytes, text/plain)
2018-03-06 19:44 UTC, Dave Malcolm
no flags Details
/tmp/cc* as per comment #19 (1.18 KB, application/x-gzip)
2018-03-06 19:50 UTC, Dave Malcolm
no flags Details
Proposed patch (2.70 KB, patch)
2018-03-12 17:49 UTC, Nick Clifton
no flags Details | Diff
Proposed patch (3.52 KB, patch)
2018-03-13 11:42 UTC, Nick Clifton
no flags Details | Diff
cdrom_id executable (137.94 KB, application/x-sharedlib)
2018-03-19 18:38 UTC, Dave Malcolm
no flags Details


Links
System ID Private Priority Status Summary Last Updated
GNU Compiler Collection 84847 0 P3 RESOLVED [8 Regression] Incompatibility between early LTO debug and "-Wl,--gc-sections" leads to corrupt DWARF debuginfo 2020-06-22 12:42:32 UTC
Red Hat Bugzilla 1558551 0 unspecified CLOSED "Invalid DW_AT_decl_file file number" messages emitted by dwz 2021-02-22 00:41:40 UTC
Sourceware 20882 0 P2 RESOLVED GNU ld discards sections required by relocations in .debug_info with --gc-sections 2020-06-22 12:42:32 UTC

Internal Links: 1558551

Description Zbigniew Jędrzejewski-Szmek 2018-02-09 14:44:30 UTC
Description of problem:
Building systemd fails in rawhide:
https://kojipkgs.fedoraproject.org//work/tasks/730/24890730/build.log


Version-Release number of selected component (if applicable):
rpm-build-4.14.1-6.fc28   

How reproducible:
Happens also in mock rawhide root, so I think it's reproducible.

Steps to Reproduce:
1. fedpkg build in the systemd master branch

Actual results:
+ /usr/lib/rpm/find-debuginfo.sh -j6 --strict-build-id -m -i --build-id-seed 237-1.git04a361e.fc28 --unique-debug-suffix -237-1.git04a361e.fc28.x86_64 --unique-debug-src-base systemd-237-1.git04a361e.fc28.x86_64 --run-dwz --dwz-low-mem-die-limit 10000000 --dwz-max-die-limit 110000000 -S debugsourcefiles.list /builddir/build/BUILD/systemd-04a361e18f1574098d33dbdb2e030f4a44de59ee
extracting debug info from /builddir/build/BUILDROOT/systemd-237-1.git04a361e.fc28.x86_64/usr/bin/busctl
extracting debug info from /builddir/build/BUILDROOT/systemd-237-1.git04a361e.fc28.x86_64/usr/bin/systemd-inhibit
extracting debug info from /builddir/build/BUILDROOT/systemd-237-1.git04a361e.fc28.x86_64/usr/bin/systemd-escape
extracting debug info from /builddir/build/BUILDROOT/systemd-237-1.git04a361e.fc28.x86_64/usr/bin/bootctl
/usr/lib/rpm/debugedit: /builddir/build/BUILDROOT/systemd-237-1.git04a361e.fc28.x86_64/usr/bin/systemd-escape: Invalid .line_table offset 0x2b0803
/usr/lib/rpm/debugedit: /builddir/build/BUILDROOT/systemd-237-1.git04a361e.fc28.x86_64/usr/bin/busctl: Invalid .line_table offset 0x350803
extracting debug info from /builddir/build/BUILDROOT/systemd-237-1.git04a361e.fc28.x86_64/usr/bin/systemd-tty-ask-password-agent
/usr/lib/rpm/debugedit: /builddir/build/BUILDROOT/systemd-237-1.git04a361e.fc28.x86_64/usr/bin/bootctl: Invalid .line_table offset 0x4020000
/usr/lib/rpm/debugedit: /builddir/build/BUILDROOT/systemd-237-1.git04a361e.fc28.x86_64/usr/bin/bootctl: Could not find DWARF abbreviation 116
/usr/lib/rpm/debugedit: /builddir/build/BUILDROOT/systemd-237-1.git04a361e.fc28.x86_64/usr/bin/systemd-inhibit: Invalid .line_table offset 0x4020000
extracting debug info from /builddir/build/BUILDROOT/systemd-237-1.git04a361e.fc28.x86_64/usr/bin/systemd-cgtop
/usr/lib/rpm/debugedit: /builddir/build/BUILDROOT/systemd-237-1.git04a361e.fc28.x86_64/usr/bin/systemd-tty-ask-password-agent: Invalid .line_table offset 0x300803
/usr/lib/rpm/debugedit: /builddir/build/BUILDROOT/systemd-237-1.git04a361e.fc28.x86_64/usr/bin/systemd-cgtop: Invalid .line_table offset 0x70803
/usr/lib/rpm/debugedit: /builddir/build/BUILDROOT/systemd-237-1.git04a361e.fc28.x86_64/usr/bin/systemd-cgtop: Could not find DWARF abbreviation 89
../../gdb/dwarf2read.c:18867: internal-error: could not find partial DIE 0x0 in cache [from module /builddir/build/BUILDROOT/systemd-237-1.git04a361e.fc28.x86_64/usr/bin/bootctl]
A problem internal to GDB has been detected,
further debugging may prove unreliable.
Quit this debugging session? (y or n) [answered Y; input not from terminal]
../../gdb/dwarf2read.c:18867: internal-error: could not find partial DIE 0x0 in cache [from module /builddir/build/BUILDROOT/systemd-237-1.git04a361e.fc28.x86_64/usr/bin/bootctl]
A problem internal to GDB has been detected,
further debugging may prove unreliable.
Create a core file of GDB? (y or n) [answered Y; input not from terminal]
This is a bug, please report it.  For instructions, see:
<http://www.gnu.org/software/gdb/bugs/>.
../../gdb/dwarf2read.c:18867: internal-error: could not find partial DIE 0x123f in cache [from module /builddir/build/BUILDROOT/systemd-237-1.git04a361e.fc28.x86_64/usr/bin/systemd-cgtop]
A problem internal to GDB has been detected,
further debugging may prove unreliable.
Quit this debugging session? (y or n) [answered Y; input not from terminal]
../../gdb/dwarf2read.c:18867: internal-error: could not find partial DIE 0x123f in cache [from module /builddir/build/BUILDROOT/systemd-237-1.git04a361e.fc28.x86_64/usr/bin/systemd-cgtop]
A problem internal to GDB has been detected,
further debugging may prove unreliable.
Create a core file of GDB? (y or n) [answered Y; input not from terminal]
This is a bug, please report it.  For instructions, see:
<http://www.gnu.org/software/gdb/bugs/>.
/usr/bin/gdb-add-index: line 61: 20412 Aborted                 (core dumped) $GDB --batch -nx -iex 'set auto-load no' -ex "file $file" -ex "save gdb-index $dir"
gdb-add-index: gdb error generating index for /builddir/build/BUILDROOT/systemd-237-1.git04a361e.fc28.x86_64/usr/bin/bootctl
/usr/bin/gdb-add-index: line 61: 20415 Aborted                 (core dumped) $GDB --batch -nx -iex 'set auto-load no' -ex "file $file" -ex "save gdb-index $dir"
gdb-add-index: gdb error generating index for /builddir/build/BUILDROOT/systemd-237-1.git04a361e.fc28.x86_64/usr/bin/systemd-cgtop
extracting debug info from /builddir/build/BUILDROOT/systemd-237-1.git04a361e.fc28.x86_64/usr/bin/hostnamectl
/usr/lib/rpm/debugedit: /builddir/build/BUILDROOT/systemd-237-1.git04a361e.fc28.x86_64/usr/bin/hostnamectl: Invalid .line_table offset 0x2b0803
extracting debug info from /builddir/build/BUILDROOT/systemd-237-1.git04a361e.fc28.x86_64/usr/bin/systemd-hwdb
/usr/lib/rpm/debugedit: /builddir/build/BUILDROOT/systemd-237-1.git04a361e.fc28.x86_64/usr/bin/systemd-hwdb: Invalid .line_table offset 0x4020000
/usr/lib/rpm/find-debuginfo.sh: line 474: /tmp/find-debuginfo.j6ARIH/res.*: No such file or directory
RPM build errors:
error: Bad exit status from /var/tmp/rpm-tmp.A02H5o (%install)
    Bad exit status from /var/tmp/rpm-tmp.A02H5o (%install)
Child return code was: 1
EXCEPTION: [Error()]
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/mockbuild/trace_decorator.py", line 89, in trace
    result = func(*args, **kw)
  File "/usr/lib/python3.6/site-packages/mockbuild/util.py", line 582, in do
    raise exception.Error("Command failed. See logs for output.\n # %s" % (command,), child.returncode)
mockbuild.exception.Error: Command failed. See logs for output.
 # bash --login -c /usr/bin/rpmbuild -bb --target x86_64 --nodeps /builddir/build/SPECS/systemd.spec

Comment 1 Mark Wielaard 2018-02-09 15:20:04 UTC
I see that these binaries are build with -flto. At least before gcc8 -flto might generate somewhat bogus debuginfo (gcc8 got early-debug support, which at least gives it a fighting chance to produce non-bogus DWARF).

It seems debugedit doesn't like some parts of the debuginfo, but doesn't crash. However gdb-add-index, which is just a script around gdb, really does crash while trying to read the debuginfo.

Obviously crashing isn't good, even on invalid DWARF.
But could you try building without lto as workaround for now?

Comment 2 Zbigniew Jędrzejewski-Szmek 2018-02-09 21:46:25 UTC
It builds fine without lto. The result is slightly bigger, but not too much (~7%).  

> At least before gcc8 -flto might generate somewhat bogus debuginfo (gcc8 got early-debug support, which at least gives it a fighting chance to produce non-bogus DWARF).

But those failures were with gcc-8.0.1-0.12.fc28. Should I wait for some newer version before enabling lto again? Or the updated binutils?

Comment 3 Mark Wielaard 2018-02-09 22:12:37 UTC
(In reply to Zbigniew Jędrzejewski-Szmek from comment #2)
> It builds fine without lto. The result is slightly bigger, but not too much
> (~7%).  
> 
> > At least before gcc8 -flto might generate somewhat bogus debuginfo (gcc8 got early-debug support, which at least gives it a fighting chance to produce non-bogus DWARF).
> 
> But those failures were with gcc-8.0.1-0.12.fc28. Should I wait for some
> newer version before enabling lto again? Or the updated binutils?

Ah, sorry, I missed that this was on rawhide.
Lets reassign this to gcc for now.
I really don't know how good debuginfo generation with lto is/should be.
Hopefully the gcc hackers can tell us what to expect.

Summary for gcc hackers: When systemd is compiled with gcc8 -flto and -g then rpm debugedit will complain about Invalid .line_table offsets and Could not find DWARF abbreviations. And gdb (which is used for gdb-add-index) will crash with an internal-error: could not find partial DIE.

Comment 4 Marek Polacek 2018-02-12 10:27:33 UTC
I also see this when building systemd-bootchart:
+ /usr/lib/rpm/find-debuginfo.sh -j40 --strict-build-id -m -i --build-id-seed 233-1.fc28 --unique-debug-suffix -233-1.fc28.x86_64 --unique-debug-src-base systemd-bootchart-233-1.fc28.x86_64 --run-dwz --dwz-low-mem-die-limit 10000000 --dwz-max-die-limit 110000000 -S debugsourcefiles.list /builddir/build/BUILD/systemd-bootchart-233
extracting debug info from /builddir/build/BUILDROOT/systemd-bootchart-233-1.fc28.x86_64/usr/lib/systemd/systemd-bootchart
/usr/lib/rpm/debugedit: /builddir/build/BUILDROOT/systemd-bootchart-233-1.fc28.x86_64/usr/lib/systemd/systemd-bootchart: Invalid .line_table offset 0x4020000

Comment 5 Fedora End Of Life 2018-02-20 15:38:59 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 28 development cycle.
Changing version to '28'.

Comment 6 Zbigniew Jędrzejewski-Szmek 2018-02-22 19:35:53 UTC
Result is more or less the same today with
gcc.x86_64 8.0.1-0.15.fc29
binutils.x86_64 2.30-6.fc29 
gdb-headless.x86_64 8.1-8.fc28

scratch build: https://koji.fedoraproject.org/koji/taskinfo?taskID=25239618

Comment 7 Dave Malcolm 2018-02-27 22:10:07 UTC
I was able to reproduce the issue seen by Marek in comment #4 for systemd-bootchart (I'm assuming that this is a simpler reproducer than all of systemd).

On minimizing the flags needed to trigger the issue, it seems to be an incompatiblity between LTO and -Wl,--gc-sections.

Building systemd-bootchart leads to the crash in debugedit described above.
Examining the binary with "eu-readelf -w" and looking for "compile_unit" shows that the compile_unit DIEs are insane: e.g. all of the strings seem to be using the zeroth string, leading to clearly wrong values for attributes like the filename or compilation directory.

I reduced the command for the final link from:

  gcc -Wall -Wextra -Wundef -Wformat=2 -Wformat-security -Wformat-nonliteral -Wlogical-op -Wmissing-include-dirs -Wold-style-definition -Wpointer-arith -Winit-self -Wdeclaration-after-statement -Wfloat-equal -Wsuggest-attribute=noreturn -Werror=missing-prototypes -Werror=implicit-function-declaration -Werror=missing-declarations -Werror=return-type -Werror=shadow -Wstrict-prototypes -Wredundant-decls -Wmissing-noreturn -Wshadow -Wendif-labels -Wstrict-aliasing=2 -Wwrite-strings -Wno-unused-parameter -Wno-missing-field-initializers -Wno-unused-result -Wno-format-signedness -Werror=overflow -Wdate-time -Wnested-externs -ffast-math -fno-common -fdiagnostics-show-option -fno-strict-aliasing -fvisibility=hidden -fstack-protector -fstack-protector-strong -fPIE --param=ssp-buffer-size=4 -flto -ffat-lto-objects -ffunction-sections -fdata-sections -O2 -g -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions -fstack-protector-strong -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -mcet -fcf-protection -Wl,--gc-sections -Wl,--as-needed -Wl,--no-undefined -Wl,-z -Wl,relro -Wl,-z -Wl,now -pie -Wl,-z -Wl,relro -specs=/usr/lib/rpm/redhat/redhat-hardened-ld -o systemd-bootchart src/bootchart.o src/store.o src/svg.o  ./.libs/libutils.a -lsystemd

to:
  gcc -g -o systemd-bootchart src/bootchart.o src/store.o src/svg.o  ./.libs/libutils.a -lsystemd -Wl,--gc-sections

while preserving this behavior:
  $ /usr/lib/rpm/debugedit systemd-bootchart
  /usr/lib/rpm/debugedit: systemd-bootchart: Invalid .line_table offset 0x4020000
  Segmentation fault (core dumped)

Removing the "-Wl,--gc-sections" fixes the issue.

If I delete the final "systemd-bootchart" binary and edit out "-Wl,--gc-sections" from the Makefile's OUR_LDFLAGS and then rerun "make", then the final link re-runs; this time debugedit succeeds, and examining the binary with "eu-readelf -w" shows sane compile_unit DIEs.

Building without "-flto -ffat-lto-objects" in the Makefile's OUR_CFLAGS also "fixes" it, but presumably you want LTO.

So it seems to be an issue with "-Wl,--gc-sections" with LTO-enabled .o files.

systemd-bootchart's Makefile.in has:
  OUR_LDFLAGS = @OUR_LDFLAGS@

which is coming from this in the configure.ac:
      AS_CASE([$CFLAGS], [*-O[[12345sz\ ]]*],
          [CC_CHECK_FLAGS_APPEND([with_ldflags], [LDFLAGS], [\
                 -Wl,--gc-sections])],
          [AC_MSG_RESULT([skipping --gc-sections, optimization not enabled])])
      AC_SUBST([OUR_CFLAGS], "$with_ldflags $sanitizer_cflags")

I'll see if I can debug further.

Comment 8 Mark Wielaard 2018-02-27 22:52:03 UTC
What gcc version is needed to reproduce this? Or would you mind attaching, or uploading, the systemd-bootchart that makes debugedit crash? Even if the produced DWARF is insane, debugedit really shouldn't crash, but provide an error message so the user knows what is insane about it.

Comment 9 Igor Gnatenko 2018-02-28 06:03:55 UTC
(In reply to Mark Wielaard from comment #8)
> What gcc version is needed to reproduce this? Or would you mind attaching,
> or uploading, the systemd-bootchart that makes debugedit crash? Even if the
> produced DWARF is insane, debugedit really shouldn't crash, but provide an
> error message so the user knows what is insane about it.

You can grab systemd-bootchart sources from dist-git (fedpkg co systemd-bootchart). And I guess he is using latest gcc/gdb/whatsoever available in rawhide.

Comment 10 Dave Malcolm 2018-02-28 14:17:05 UTC
FWIW, the testing in comment #7 was via:
  mock -r fedora-rawhide-x86_64 systemd-bootchart-233-2.fc28.src.rpm
and then:
  mock -r fedora-rawhide-x86_64 shell
when it fails.

where the chroot has:
  gcc-8.0.1-0.14.fc28.x86_64
  rpm-build-4.14.1-7.fc28.x86_64
  binutils-2.29.1-19.fc28.x86_64

I'm working on generating a minimal reproducer.

Comment 11 Dave Malcolm 2018-02-28 14:21:31 UTC
Created attachment 1401902 [details]
Corrupt-looking "systemd-bootchart" ELF x86_64, generated as per comment #7

Here's the buggy-looking systemd-bootchart (x86_64), built with "-Wl,--gc-sections" and LTO as per comment #7 and comment #10.

"eu-readelf -w" shows (amongst other things) the insane strings for the "compile_unit" DIEs (apparently all using string 0):

[...snip...]

DWARF section [32] '.debug_info' at offset 0x18868:
 [Offset]
 Compilation unit at offset 0:
 Version: 4, Abbreviation section offset: 0, Address size: 8, Offset size: 4
 [     b]  compile_unit
           producer             (strp) "__builtin_memset"
           language             (data1) C99 (12)
           name                 (strp) "__builtin_memset"
           comp_dir             (strp) "__builtin_memset"
 [    29]    formal_parameter
             abstract_origin      (ref_addr) [     0]
             location             (exprloc) 
              [   0] const1u 7
 Compilation unit at offset 5351:
 Version: 4, Abbreviation section offset: 0, Address size: 8, Offset size: 4
 [  14f2]  compile_unit
           producer             (strp) "__builtin_memset"
           language             (data1) C99 (12)
           name                 (strp) "__builtin_memset"
           comp_dir             (strp) "__builtin_memset"
 [  1510]    formal_parameter
             abstract_origin      (ref_addr) [     0]
             location             (exprloc) 
              [   0] const1u 7
 Compilation unit at offset 9108:
 Version: 4, Abbreviation section offset: 0, Address size: 8, Offset size: 4
 [  239f]  compile_unit
           producer             (strp) "__builtin_memset"
           language             (data1) C99 (12)
           name                 (strp) "__builtin_memset"
           comp_dir             (strp) "__builtin_memset"
           ranges               (sec_offset) range list [     2]
           low_pc               (addr) +0x000000250e2d0100

[...snip...]

$ /usr/lib/rpm/debugedit systemd-bootchart
/usr/lib/rpm/debugedit: systemd-bootchart: Invalid .line_table offset 0x4020000
Segmentation fault (core dumped)

Comment 12 Dave Malcolm 2018-03-05 19:32:24 UTC
I've created a minimal reproducer for this here:
  https://github.com/davidmalcolm/rhbz-1543912

Building within "mock -r fedora-rawhide-x86_64 shell" gives:

$ make
mkdir build
gcc -I ./src -flto -O2 -g -c src/bootchart.c -o build/bootchart.o
gcc -I ./src -flto -O2 -g -c src/log.c -o build/log.o
gcc -flto -g -Wl,--gc-sections \
  build/bootchart.o build/log.o \
          -o build/systemd-bootchart
/usr/lib/rpm/debugedit build/systemd-bootchart
/usr/lib/rpm/debugedit: build/systemd-bootchart: Invalid .line_table offset 0x2b0803
make: *** [Makefile:21: test] Segmentation fault (core dumped)

with bogus-looking compile_unit DIEs in the binary, as seen by eu-readelf -w, similar to those in comment #11:

DWARF section [23] '.debug_info' at offset 0x10a6:
 [Offset]
 Compilation unit at offset 0:
 Version: 4, Abbreviation section offset: 0, Address size: 8, Offset size: 4
 [     b]  compile_unit
           producer             (strp) "/builddir/build/BUILD/minimizing-2"
           language             (data1) C99 (12)
           name                 (strp) "/builddir/build/BUILD/minimizing-2"
           comp_dir             (strp) "/builddir/build/BUILD/minimizing-2"
           ranges               (sec_offset) range list [     2]
           low_pc               (addr) 0x000000250e030100
 Compilation unit at offset 105:
 Version: 4, Abbreviation section offset: 0, Address size: 8, Offset size: 4
 [    74]  compile_unit
           producer             (strp) "/builddir/build/BUILD/minimizing-2"
           language             (data1) C99 (12)
           name                 (strp) "/builddir/build/BUILD/minimizing-2"
           comp_dir             (strp) "/builddir/build/BUILD/minimizing-2"
           ranges               (sec_offset) range list [     2]
           low_pc               (addr) 0x00005f0006030100

(note how every string is using string 0)

Removing the -Wl,--gc-sections fixes the segfault, and leads to sane-looking DIEs:

 Compilation unit at offset 0:
 Version: 4, Abbreviation section offset: 0, Address size: 8, Offset size: 4
 [     b]  compile_unit
           producer             (strp) "GNU C17 8.0.1 20180218 (Red Hat 8.0.1-0.14) -mtune=generic -march=x86-64 -g -O2 -flto"
           language             (data1) C99 (12)
           name                 (strp) "src/bootchart.c"
           comp_dir             (strp) "/builddir/build/BUILD/minimizing-2"
 [    19]    variable
             name                 (strp) "program_invocation_name"
(snip)
 Compilation unit at offset 105:
 Version: 4, Abbreviation section offset: 100, Address size: 8, Offset size: 4
 [    74]  compile_unit
           producer             (strp) "GNU C17 8.0.1 20180218 (Red Hat 8.0.1-0.14) -mtune=generic -march=x86-64 -g -O2 -flto"
           language             (data1) C99 (12)
           name                 (strp) "src/log.c"
           comp_dir             (strp) "/builddir/build/BUILD/minimizing-2"
 [    82]    subprogram
             external             (flag_present) yes
             name                 (strp) "log_internal"
             decl_file            (data1) 1
             decl_line            (data1) 3
             decl_column          (data1) 6
             prototyped           (flag_present) yes
 Compilation unit at offset 139:
 Version: 4, Abbreviation section offset: 131, Address size: 8, Offset size: 4
 [    96]  compile_unit
           producer             (strp) "GNU GIMPLE 8.0.1 20180218 (Red Hat 8.0.1-0.14) -mtune=generic -march=x86-64 -mtune=generic -march=x86-64 -g -O2 -fno-openmp -fno-openacc -fltrans"
           language             (data1) C99 (12)
           name                 (strp) "<artificial>"
           comp_dir             (strp) "/builddir/build/BUILD/minimizing-2"
           ranges               (sec_offset) range list [     0]
           low_pc               (addr) 000000000000000000 <bootchart.c.fb8dcb95>
           stmt_list            (sec_offset) 0
 [    b4]    imported_unit
             import               (ref_addr) [     b]

Comment 13 Dave Malcolm 2018-03-05 19:34:49 UTC
(In reply to Dave Malcolm from comment #12)
> I've created a minimal reproducer for this here:
>   https://github.com/davidmalcolm/rhbz-1543912
> 
> Building within "mock -r fedora-rawhide-x86_64 shell" gives:

rpm versions as in comment #10, fwiw.

Comment 14 Nick Clifton 2018-03-06 11:53:27 UTC
(In reply to Dave Malcolm from comment #12)
> I've created a minimal reproducer for this here:
 
> gcc -flto -g -Wl,--gc-sections \
>   build/bootchart.o build/log.o \
>           -o build/systemd-bootchart
> /usr/lib/rpm/debugedit build/systemd-bootchart

Adding -Wl,--print-gc-sections to this command line gives a little more 
information:

ld: Removing unused section '.rodata.cst4' in file '/lib64/crt1.o'
ld: Removing unused section '.data' in file '/lib64/crt1.o'
ld: Removing unused section '.data' in file '.../crtbegin.o'
ld: Removing unused section '.debug_abbrev' in file '/dev/shm/ccqhNEybdebugobj'
ld: Removing unused section '.debug_str' in file '/dev/shm/ccqhNEybdebugobj'


> (note how every string is using string 0)

I am guessing that it is the removal of the .debug_str section that is causing
this.

Cheers
  Nick

Comment 15 Jakub Jelinek 2018-03-06 11:57:53 UTC
Does ld --gc-sections really remove .debug_* sections as unused?  That sounds very wrong.  Why doesn't it remove all other .debug_* sections?  Has it some list of .debug_* sections it keeps?

Comment 16 Nick Clifton 2018-03-06 12:22:27 UTC
(In reply to Jakub Jelinek from comment #15)

Hi Jakub,

> Does ld --gc-sections really remove .debug_* sections as unused?  That
> sounds very wrong.  Why doesn't it remove all other .debug_* sections?  Has
> it some list of .debug_* sections it keeps?

It keeps all debug sections, *except* those sections that are in a section
group that contains both debug sections and non-debug sections.  In this
particular case the debug sections can be discarded if the non-debug sections
in the group are also going to be discarded.

At least that is what should happen.  If someone can tell me how to capture 
the object files that are being produced by the LTO optimizer then I can
investigate further and find out why those specific sections are being
discarded.

Cheers
  Nick

PS.  The linker will also discard fragmentary debug sections, but that should
  not be the case with this particular BZ.

Comment 17 Jakub Jelinek 2018-03-06 12:23:48 UTC
Doesn't -save-temps keep them around?

Comment 18 Nick Clifton 2018-03-06 12:49:02 UTC
(In reply to Jakub Jelinek from comment #17)
> Doesn't -save-temps keep them around?

Sadly no.  It only preserves the assembler and object files that are the inputs to LTO.  It does not preserve the files that are the output from LTO.

Additionally the "-v" gcc directive will not display the linker/collect2 command 
line that is used to link the LTO object file once it has been created. :-(

Comment 19 Dave Malcolm 2018-03-06 19:31:38 UTC
Nick: I don't know if this is what you're after, but I'm able to capture the "ld" invocation and debug it, stepping through where those "Removing unused section" lines are emitted.

Using the reproducer from comment #12, I can run collect2 under the debugger by adding:
  -save-temps -v -wrapper gdb,--args
to the gcc invocation:

gcc -flto -g -Wl,--gc-sections \
  build/bootchart.o build/log.o \
          -o build/systemd-bootchart \
  -save-temps -v -wrapper gdb,--args

Then, debugging collect2:

  (gdb) break main
  (gdb) run
[hits the breakpoint]
  (gdb) set variable debug = true

collect2 then prints the invocation args of ld, along with the pertinent values of COLLECT_GCC and COLLECT_GCC_OPTIONS (albeit quoted in a way that's a little fiddly to re-do in the shell).

I was then able to re-run that invocation of ld under the debugger; in my case it was:

  COLLECT_GCC=gcc \
  COLLECT_GCC_OPTIONS='-dA -save-temps -v -flto -g -o build/systemd-bootchart -mtune=generic -march=x86-64' \
  gdb --args \
    /usr/bin/ld \
       -plugin /usr/libexec/gcc/x86_64-redhat-linux/8/liblto_plugin.so -plugin-opt=/usr/libexec/gcc/x86_64-redhat-linux/8/lto-wrapper -plugin-opt=-fresolution=log.res -plugin-opt=-pass-through=-lgcc -plugin-opt=-pass-through=-lgcc_s -plugin-opt=-pass-through=-lc -plugin-opt=-pass-through=-lgcc -plugin-opt=-pass-through=-lgcc_s --build-id --no-add-needed --eh-frame-hdr --hash-style=gnu -m elf_x86_64 -dynamic-linker /lib64/ld-linux-x86-64.so.2 -o build/systemd-bootchart /usr/lib/gcc/x86_64-redhat-linux/8/../../../../lib64/crt1.o /usr/lib/gcc/x86_64-redhat-linux/8/../../../../lib64/crti.o /usr/lib/gcc/x86_64-redhat-linux/8/crtbegin.o -L/usr/lib/gcc/x86_64-redhat-linux/8 -L/usr/lib/gcc/x86_64-redhat-linux/8/../../../../lib64 -L/lib/../lib64 -L/usr/lib/../lib64 -L/usr/lib/gcc/x86_64-redhat-linux/8/../../.. --gc-sections --print-gc-sections build/bootchart.o build/log.o -lgcc --as-needed -lgcc_s --no-as-needed -lc -lgcc --as-needed -lgcc_s --no-as-needed /usr/lib/gcc/x86_64-redhat-linux/8/crtend.o /usr/lib/gcc/x86_64-redhat-linux/8/../../../../lib64/crtn.o

Breakpoints in "ld" in bfd_elf_gc_sections and elf_gc_sweep let me step through that code, and the emission of:
  /usr/bin/ld: Removing unused section '.debug_info' in file '/tmp/cc24KR4Wdebugobj'
albeit with elf_gc_sweep inlined into bfd_elf_gc_sections.

(gdb) bt
#0  bfd_elf_gc_sections (abfd=0x55555591df70, info=0x5555559044c0 <link_info>) at elflink.c:13456
#1  0x0000555555575de4 in lang_gc_sections () at ldlang.c:6858
#2  lang_process () at ldlang.c:7297
#3  0x00005555555628f2 in main (argc=-142166400, argv=0x7ffff786c8b0 <_IO_stdfile_2_lock>) at ./ldmain.c:408

Presumably one can capture the pertinent object files and avoid LTO in the reproducer with a strategically-placed breakpoint in ld?  (or just debug it directly?)

This happens after exec_lto_wrapper; putting a breakpoint there and stepping to after it; I can locate the /tmp/cc*debugobj in the filesystem.  Is this what you're after?

Comment 20 Dave Malcolm 2018-03-06 19:44:46 UTC
Created attachment 1404986 [details]
Test attachment

(trying to debug why BZ won't let me attach the ELF binary)

Comment 21 Dave Malcolm 2018-03-06 19:50:35 UTC
Created attachment 1404987 [details]
/tmp/cc* as per comment #19

Here's an example of a /tmp/cc*debugobj as found on the filesystem immediately after the return from exec_lto_wrapper within ld (and the rest of /tmp/cc* as a .tar.gz, since BZ seems to be blocking me from attaching that binary directly).

Comment 22 Nick Clifton 2018-03-07 12:28:07 UTC
Hi Dave,

  Thanks - that helped.  I did not know about the -wrapper option for gcc 
  so that was very interesting.  Just FYI, I found out that you can achieve
  the same results by adding: -W,-debug to the gcc command line, as this
  turns on debugging in collect2 (rather than in ld).

  I have now found out what it happening, although I am still a little bit
  confused as to why it should be occurring.  The LTO pass is producing an
  object file that only contains debug information:

    /dev/shm/ccqhNEybdebugobj

  (There was a clue in the filename, but I did not pay attention to this).

  The linker's garbage collection code reasons that any file that does not
  have *any* allocatable sections[1] can automatically be discarded, since it
  cannot have any effect on the execution of the resulting binary, right ? ...

  So this is why the .debug_abbrev and .debug_str sections are being discarded.
  I do not know why the LTO pass is producing this file.  Perhaps someone else
  more familiar with LTO can explain.

  A simple fix would be remove the code from the linker's garbage collector
  that discards non-allocatable input files, but this seems rather heavy 
  handed.  I would rather know if there is some way to deduce that a given
  file really is needed, despite it only containing debug information, and
  then choose to keep it.  I guess it all comes down to why the LTO pass is
  generating this file, and what it contains.

Cheers
  Nick

[1].  Actually the linker first checks to see if any allocatable sections
  can be discarded.  Then if the file did not contain any allocatable
  sections, or all of the allocatable sections can be discarded, then the
  whole file is discarded.

Comment 23 Mark Wielaard 2018-03-07 15:28:39 UTC
Although this is a compiler/linker bug rpm debugedit really should not crash and burn. So I submitted a patch upstream so it will give a better error message in these cases: http://lists.rpm.org/pipermail/rpm-maint/2018-March/007542.html

So on the testcase it will now say:

debugedit: ./baddebug: Invalid .line_table offset 0x4020000
debugedit: Bad string pointer index 2561577

Comment 24 Dave Malcolm 2018-03-08 21:39:45 UTC
(In reply to Nick Clifton from comment #22)

[...]

>   I have now found out what it happening, although I am still a little bit
>   confused as to why it should be occurring.  The LTO pass is producing an
>   object file that only contains debug information:
> 
>     /dev/shm/ccqhNEybdebugobj

[...]

>   I do not know why the LTO pass is producing this file.  Perhaps someone
> else
>   more familiar with LTO can explain.

[...]

Structurally, I believe things look like this:

gcc: invokes collect2
  collect2: invokes ld
    ld: calls various LTO plugin callbacks
      lto-plugin.c: all_symbols_read_handler calls exec_lto_wrapper
        lto-plugin.c: exec_lto_wrapper:
           captures stdout from lto-wrapper and calls add_output_files on it,
           which calls add_input_file on each one

Running lto-wrapper, it emits the following to stdout:
  /tmp/ccfyHbEUdebugobj
  /tmp/ccLI33Xx.ltrans0.ltrans.o

The pertinent code that makes the debugobj is in gcc/lto-wrapper.c:

  /* Handle early generated debug information.  At compile-time
     we output early DWARF debug info into .gnu.debuglto_ prefixed
     sections.  LTRANS object DWARF debug info refers to that.
     So we need to transfer the .gnu.debuglto_ sections to the final
     link.  Ideally the linker plugin interface would allow us to
     not claim those sections and instruct the linker to keep
     them, renaming them in the process.  For now we extract and
     rename those sections via a simple-object interface to produce
     regular objects containing only the early debug info.  We
     then partially link those to a single early debug info object
     and pass that as additional output back to the linker plugin.  */

  /* Prepare the partial link to gather the compile-time generated
     debug-info into a single input for the final link.  */
  debug_obj = make_temp_file ("debugobj");

This was added in 7b53e7148ee92203b6f970fae6ea437394f14863 (r251220):

    2017-08-21  Richard Biener  <rguenther>

https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=7b53e7148ee92203b6f970fae6ea437394f14863

I believe this is the "early LTO debug" information:

  https://gcc.gnu.org/wiki/early-debug
  https://gcc.gnu.org/ml/gcc-patches/2016-08/msg01842.html

I believe the idea is to bridge the gap between the language-specific frontends vs the language-independent lto1: we capture richer, frontend-specific debuginfo (on things like types, inheritance hierarchies??) while the frontends are running, and then merge it with the debuginfo describing how to step through the lto1-optimized function bodies.  I believe the cc*debugobj is the former, the ltrans.o is the latter.

Perhaps a quick-and-dirty fix might be for the plugin to somehow mark the sections from the files added by lto-wrapper as not to be discarded?  Does ld have an API for that?

Caveat: I'm hazy on the details in places here, sorry.

(Aldy: did you work on this? I have a vague recollection of that, but maybe it was just from the frontend side?)

Comment 25 Aldy Hernandez 2018-03-12 08:08:36 UTC
(In reply to Dave Malcolm from comment #24)

> I believe this is the "early LTO debug" information:
> 
>   https://gcc.gnu.org/wiki/early-debug
>   https://gcc.gnu.org/ml/gcc-patches/2016-08/msg01842.html
...
...
> (Aldy: did you work on this? I have a vague recollection of that, but maybe
> it was just from the frontend side?)

Unfortunately my work  was only from the front-end side.  All the magic bits with LTO were done by Richi.

Comment 26 Nick Clifton 2018-03-12 17:49:21 UTC
Created attachment 1407308 [details]
Proposed patch

Hi Guys,

  Unfortunately the linker plugin API does not provide a way to mark sections
  as important, nor does the linker command line.  But there is a hack-ish kind
  of way, by using a linker script fragment.

  The attached patch demonstrates this idea, although I am not ready to post it
  to gcc-patches yet as I am still running the regression tests.  But I thought
  that anyone reading this BZ might be interested to see the idea and maybe
  suggest on a better way to solve the problem.

Cheers
  Nick

Comment 27 Dave Malcolm 2018-03-12 18:11:57 UTC
(In reply to Nick Clifton from comment #26)
> Created attachment 1407308 [details]
> Proposed patch

I don't have a better solution right now, sorry.

FWIW here are some nitpicks on the patch:
[...]
+/* Create a linker script fragment that will ensure that all of the contents
+   of DEBUG_OBJ are kept.  This is necessary as normally the linker's garbage
+   collection algorithm will consider any file that just contains
+   non-allocatable sections as extraneous, and remove them.  See BZ 1543912

Probably should be "RHBZ", rather than just "BZ" given this is for upstream
gcc.

I'll open an upstream gcc bug for this.

+   Note - this does mean that the debug information will kept even if it
+   could be safely removed, but this is safer than having the linker remove
+   debug information that really is needed.  The correct way to solve this
+   problem is not to generate a separate debug information file at all, but
+   instead include the debug sections in section groups, along with the code
+   sections that they describe.  Then the linker can accurately determine
+   whether or not the debug information is needed.  */

This sounds plausible, but maybe we're too late into stage 4 of gcc 8 to be reorganizing that.

+static const char *
+gen_debug_obj_keeper (const char * debug_obj)
+{
+  const char * keep_debug_obj = make_temp_file ("debugobjkeeper");

make_temp_file returns a "char *", malloc-ed buffer, so this local, and the return type should lose the "const" qualifier.

+  FILE * keeper = fopen (keep_debug_obj, "w");
+
+  if (keeper == NULL)
+    return NULL;

This error-handling path should free "keep_debug_obj", or it's a leak.

+  /* FIXME: We do not need to KEEP all of the debug sections.
+     In fact KEEPing just one should be enough to ensure that the
+     entire file is not just discarded.  But we reallly ought to make

s/reallly/really/

+     sure that the linker script fragment that we are generating
+     does reference at least one section in debug_obj.  For now we
+     rely upon the fact that pretty much any debug file is going to
+     contain strings and/or abbreviations.  */
+  fprintf (keeper, "\
+SECTIONS\n\
+{\n\
+	.debug_str : { KEEP (%s(.debug_str)) }\n\
+	.debug_abbrev : { KEEP (%s(.debug_abbrev)) }\n\
+}\n", debug_obj, debug_obj);
+  fclose (keeper);
+  return keep_debug_obj;
+}
+
 /* Execute gcc. ARGC is the number of arguments. ARGV contains the arguments. */
 
 static void
@@ -1460,12 +1498,18 @@
       skip_debug = true;
     }
 
+  
   if (lto_mode == LTO_MODE_LTO)
     {
       printf ("%s\n", flto_out);
       if (!skip_debug)
 	{
+	  const char * keep_debug_obj;

Lose the "const" qualifier here.

+
 	  printf ("%s\n", debug_obj);
+	  keep_debug_obj = gen_debug_obj_keeper (debug_obj);

gen_debug_obj_keeper can fail and return NULL, so we're missing a check for that here.  I'm not sure how to handle it though.

+	  printf ("%s\n", keep_debug_obj);
+	  free ((void *) keep_debug_obj);

Without the "const" qualifier, you can drop the cast here.

 	  free (debug_obj);
 	  debug_obj = NULL;
 	}
@@ -1619,7 +1663,12 @@
 	}
       if (!skip_debug)
 	{
+	  const char * keep_debug_obj;

Likewise re the "const" qualifier and cast.

Should this repeated code be moved to a subroutine?

+
 	  printf ("%s\n", debug_obj);
+	  keep_debug_obj = gen_debug_obj_keeper (debug_obj);
+	  printf ("%s\n", keep_debug_obj);
+	  free ((void *) keep_debug_obj);
 	  free (debug_obj);
 	  debug_obj = NULL;
 	}

Comment 28 Nick Clifton 2018-03-13 11:42:59 UTC
Created attachment 1407504 [details]
Proposed patch

Hi Dave,

  Thanks for the patch review.  I am attaching a revised version which I think
  addresses all of the points you raised.  Well, except for the issue of what
  to do if the code fails to generate a linker script fragment.  I did not think
  that such a problem should be a fatal error, but on the other hand there does
  not appear to be a warning() type function.  So for now the code just silently
  fails.  :-(

  I also ran some more tests and found that my comment about "only KEEP one
  section" was wrong.  (Although I fail to see why, when I review the linker
  code, oh well).  So I have extended the KEEP script generator to cover all
  DWARF debug sections.

  I also ran a regression test for an x86_64 toolchain and the good news is 
  that there were no problems.

  If you let me know the upstream GCC bug number I will post this patch there
  too.

Cheers
  Nick

Comment 29 Dave Malcolm 2018-03-13 14:17:03 UTC
Bug filed upstream:
  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84847

Comment 30 Dave Malcolm 2018-03-13 15:18:19 UTC
According to H.J. Lu in 
  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84847#c9
this was fixed by this upstream binutils commit on master:
https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=b7c871edcd83ccdc5fcd8148a7f433efd6b52255

(not yet in a released tarball)

Nick: would cherry-picking that fix into our binutils rpms work?

Comment 31 Dave Malcolm 2018-03-13 15:19:32 UTC
(In reply to Dave Malcolm from comment #30)
> (not yet in a released tarball)
Actually, am not sure about that, sorry; it's dated 2017-05-17

Comment 32 Dave Malcolm 2018-03-13 15:45:38 UTC
(In reply to Dave Malcolm from comment #31)
> (In reply to Dave Malcolm from comment #30)
> > (not yet in a released tarball)
> Actually, am not sure about that, sorry; it's dated 2017-05-17

Indeed, that patch is in
  /usr/src/debug/binutils-2.30-6.fc29.x86_64/bfd/elflink.c
in the chroot I've been debugging this in.

Comment 33 Nick Clifton 2018-03-13 15:57:48 UTC
I am not sure why H.J. thinks that the bug is fixed, as it definitely is
still there for me.  This is with:

  gcc (GCC) 8.0.1 20180313 (experimental) [trunk revision 258480]
and
  GNU ld (GNU Binutils) 2.30.51.20180312

and the reproducer from comment #12.  My guess is that H.J. is using a slightly different version of gcc.  I will ask in PR 84847.

Comment 34 Nick Clifton 2018-03-14 11:40:54 UTC
H.J. has created a much better patch than mine, and one that applies to the linker rather than gcc, so I can take care of it myself...

Fixed in binutils-2.29.1-23.fc28 and binutils-2.30-13.fc29.

Comment 35 Dave Malcolm 2018-03-14 12:48:32 UTC
(In reply to Nick Clifton from comment #34)
> H.J. has created a much better patch than mine, and one that applies to the
> linker rather than gcc, so I can take care of it myself...
> 
> Fixed in binutils-2.29.1-23.fc28 and binutils-2.30-13.fc29.

Thanks.  This fixes the reproducer here:
 https://github.com/davidmalcolm/rhbz-1543912

Tested with:
  binutils-2.30-12.fc29.x86_64: /tmp/cc*debugout sections are removed; bogus debuginfo
  binutils-2.30-13.fc29.x86_64: /tmp/cc*debugout sections are not removed; valid-looking debuginfo

I'm going to test the build of systemd and systemd-bootchart with this.

Comment 36 Dave Malcolm 2018-03-14 13:06:42 UTC
(In reply to Dave Malcolm from comment #35)
[...]
> I'm going to test the build of systemd and systemd-bootchart with this.

Building systemd-bootchart-233-2.fc28.src.rpm, using gcc-8.0.1-0.17.fc29.x86_64:

  With binutils-2.30-12.fc29.x86_64:
    Fails, as before

  With binutils-2.30-13.fc29.x86_64:
    Succeeds, with sane-looking DIEs.
    That said, I do see:
+ /usr/lib/rpm/find-debuginfo.sh -j4 --strict-build-id -m -i --build-id-seed 233-2.fc29 --unique-debug-suffix -233-2.fc29.x86_64 --unique-debug-src-base systemd-bootchart-233-2.fc29.x86_64 --run-dwz --dwz-low-mem-die-limit 10000000 --dwz-max-die-limit 110000000 -S debugsourcefiles.list /builddir/build/BUILD/systemd-bootchart-233
extracting debug info from /builddir/build/BUILDROOT/systemd-bootchart-233-2.fc29.x86_64/usr/lib/systemd/systemd-bootchart
BUILDSTDERR: dwz: ./usr/lib/systemd/systemd-bootchart-233-2.fc29.x86_64.debug: Invalid DW_AT_decl_file file number 1
/usr/lib/rpm/sepdebugcrcfix: Updated 0 CRC32s, 1 CRC32s did match.
BUILDSTDERR: 678 blocks

Comment 37 Dave Malcolm 2018-03-14 14:54:59 UTC
Building systemd-237-1.git04a361e.fc29.src.rpm (last revision before:  https://src.fedoraproject.org/rpms/systemd/c/339b0245dff6d4ae617854a3b2757e5976c4d537?branch=master
) with gcc-8.0.1-0.17.fc29.x86_64:


With binutils-2.30-12.fc29.x86_64:
  Reproduces the problem in comment #0:

/usr/lib/rpm/debugedit: /usr/lib/rpm/debugedit: /builddir/build/BUILDROOT/systemd-237-1.git04a361e.fc29.x86_64/usr/lib64/libudev.so.1.6.9: Invalid .line_table offset 0x4020000
/usr/lib/rpm/debugedit: /builddir/build/BUILDROOT/systemd-237-1.git04a361e.fc29.x86_64/usr/lib64/libnss_resolve.so.2: Invalid .line_table offset 0x2b0803
/builddir/build/BUILDROOT/systemd-237-1.git04a361e.fc29.x86_64/usr/lib64/libnss_myhostname.so.2: Invalid .line_table offset 0x2b0803
/usr/lib/rpm/debugedit: /builddir/build/BUILDROOT/systemd-237-1.git04a361e.fc29.x86_64/usr/lib64/security/pam_systemd.so: Invalid .line_table offset 0x4020000
/usr/lib/rpm/find-debuginfo.sh: line 474: /tmp/find-debuginfo.p1VQRm/res.*: No such file or directory


With binutils-2.30-13.fc29.x86_64:
  Build succeeds.
  I do see the "Invalid DW_AT_decl_file file number 1" mentioned in comment #36, one per ELF file:

extracting debug info from /builddir/build/BUILDROOT/systemd-237-1.git04a361e.fc29.x86_64/usr/lib/udev/cdrom_id
dwz: ./usr/lib64/libnss_mymachines.so.2-237-1.git04a361e.fc29.x86_64.debug: Invalid DW_AT_decl_file file number 1
dwz: ./usr/lib64/libnss_systemd.so.2-237-1.git04a361e.fc29.x86_64.debug: Invalid DW_AT_decl_file file number 1
dwz: ./usr/lib64/security/pam_systemd.so-237-1.git04a361e.fc29.x86_64.debug: Invalid DW_AT_decl_file file number 1
[...snip...]
dwz: ./usr/lib/udev/mtd_probe-237-1.git04a361e.fc29.x86_64.debug: Invalid DW_AT_decl_file file number 1
dwz: Too few files for multifile optimization
/usr/lib/rpm/sepdebugcrcfix: Updated 0 CRC32s, 289 CRC32s did match.
28996 blocks

but it doesn't stop the LTO build from succeeding. I don't know if this is a problem, and if so, if it's a pre-existing one.

Comment 38 Dave Malcolm 2018-03-14 14:56:35 UTC
Given that, it looks like the systemd maintainers can re-enable LTO once that binutils RPM is in the buildroots.

Comment 39 Nick Clifton 2018-03-15 10:54:48 UTC
Hi Dave,

(In reply to Dave Malcolm from comment #37)
> With binutils-2.30-13.fc29.x86_64:

>   Build succeeds.
>   I do see the "Invalid DW_AT_decl_file file number 1" mentioned in comment
> #36, one per ELF file:

That is a little worrying.  Can you add -Wl,--print-gc-sections to the gcc command line, so that we can see if any other debug sections are being discarded ?

Also, if you have the time, would you mind running:

   "readelf -w cdrom_id > /dev/null"

on the cdrom_id file that dwz is complaining about.  (The idea is to just
catch the error messages, if any, from readelf's parsing of the dwarf
information in cdrom_id).  IE I am wondering if dwz is confused about
something, or if readelf also thinks that the debug information is corrupt.

Cheers
  Nick

Comment 40 Dave Malcolm 2018-03-19 18:36:51 UTC
(In reply to Nick Clifton from comment #39)
> (In reply to Dave Malcolm from comment #37)
> > With binutils-2.30-13.fc29.x86_64:
> 
> >   Build succeeds.
> >   I do see the "Invalid DW_AT_decl_file file number 1" mentioned in comment
> > #36, one per ELF file:
> 
> That is a little worrying.  Can you add -Wl,--print-gc-sections to the gcc
> command line, so that we can see if any other debug sections are being
> discarded ?

The link line seems to be:
  cc  -o src/udev/cdrom_id 'src/udev/cdrom_id@exe/cdrom_id_cdrom_id.c.o' -flto -Wl,--no-undefined -Wl,--as-needed -Wl,-z,relro -Wl,-z,now -pie -Wl,--gc-sections -O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions -fstack-protector-strong -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -mcet -fcf-protection -Wl,-z,relro -Wl,-z,now -specs=/usr/lib/rpm/redhat/redhat-hardened-ld -Wl,--start-group src/udev/libudev.a src/shared/libsystemd-shared-237.so src/udev/libudev-basic.a -Wl,--end-group '-Wl,-rpath,$ORIGIN/../shared' -Wl,-rpath-link,/builddir/build/BUILD/systemd-stable-04a361e18f1574098d33dbdb2e030f4a44de59ee/x86_64-redhat-linux-gnu/src/shared


Adding "-Wl,--print-gc-sections" gives:

/usr/bin/ld: Removing unused section '.rodata.cst4' in file '/usr/lib/gcc/x86_64-redhat-linux/8/../../../../lib64/Scrt1.o'
/usr/bin/ld: Removing unused section '.data' in file '/usr/lib/gcc/x86_64-redhat-linux/8/../../../../lib64/Scrt1.o'


> Also, if you have the time, would you mind running:
> 
>    "readelf -w cdrom_id > /dev/null"
> 
> on the cdrom_id file that dwz is complaining about.  (The idea is to just
> catch the error messages, if any, from readelf's parsing of the dwarf
> information in cdrom_id).  IE I am wondering if dwz is confused about
> something, or if readelf also thinks that the debug information is corrupt.

Running "readelf -w cdrom_id > /dev/null" on the file shows nothing on stderr.

However, running "eu-readelf -w" on it shows 574 lines of the form:
eu-readelf: cannot get attribute value: invalid DWARF

Comment 41 Dave Malcolm 2018-03-19 18:38:20 UTC
Created attachment 1410075 [details]
cdrom_id executable

The cdrom_id executable that shows the errors

Comment 42 Mark Wielaard 2018-03-19 19:00:45 UTC
(In reply to Dave Malcolm from comment #40)
> However, running "eu-readelf -w" on it shows 574 lines of the form:
> eu-readelf: cannot get attribute value: invalid DWARF

Make sure you have the latest eu-readelf installed.
When building with GCC8 there will be DW_AT_GNU_locviews attributes and older eu-readelf doesn't know is a sec_offset into the .debug_loc.

Comment 43 Dave Malcolm 2018-03-19 20:22:57 UTC
(In reply to Mark Wielaard from comment #42)
> (In reply to Dave Malcolm from comment #40)
> > However, running "eu-readelf -w" on it shows 574 lines of the form:
> > eu-readelf: cannot get attribute value: invalid DWARF
> 
> Make sure you have the latest eu-readelf installed.
> When building with GCC8 there will be DW_AT_GNU_locviews attributes and
> older eu-readelf doesn't know is a sec_offset into the .debug_loc.

FWIW this was with elfutils-0.170-1.fc27.x86_64, which seems to be the most recent package in rawhide.

https://koji.fedoraproject.org/koji/packageinfo?packageID=476 shows a elfutils-0.170-10.fc27, but all of the builds after -1 into rawhide seem to have failed.

(binutils-2.30-13.fc29.x86_64 and gcc-8.0.1-0.18.fc29.x86_64)

Comment 44 Mark Wielaard 2018-03-19 20:32:05 UTC
(In reply to Dave Malcolm from comment #43)
> (In reply to Mark Wielaard from comment #42)
> > (In reply to Dave Malcolm from comment #40)
> > > However, running "eu-readelf -w" on it shows 574 lines of the form:
> > > eu-readelf: cannot get attribute value: invalid DWARF
> > 
> > Make sure you have the latest eu-readelf installed.
> > When building with GCC8 there will be DW_AT_GNU_locviews attributes and
> > older eu-readelf doesn't know is a sec_offset into the .debug_loc.
> 
> FWIW this was with elfutils-0.170-1.fc27.x86_64, which seems to be the most
> recent package in rawhide.
> 
> https://koji.fedoraproject.org/koji/packageinfo?packageID=476 shows a
> elfutils-0.170-10.fc27, but all of the builds after -1 into rawhide seem to
> have failed.

Yes, you need 0.170-10. There is an issue in rawhide with aarch64, I guess that prevents all other arches to get updates?

Comment 45 Nick Clifton 2018-03-20 12:02:32 UTC
Hmm, well I can find nothing wrong with the debug information in cdrom_id.

Mark - any idea why eu-readelf is not happy ?

Comment 46 Mark Wielaard 2018-03-20 12:42:03 UTC
(In reply to Nick Clifton from comment #45)
> Hmm, well I can find nothing wrong with the debug information in cdrom_id.
> 
> Mark - any idea why eu-readelf is not happy ?

I submitted a patch to produce an error message when it cannot resolve the DW_AT_decl_file attribute:
https://sourceware.org/ml/elfutils-devel/2018-q1/msg00083.html

And with that you see:

$ LD_LIBRARY_PATH=backends:libelf:libdw src/readelf --debug-dump=info ~/Downloads/cdrom_id >/dev/null
src/readelf: no srcfiles for CU [b]
[...lots more...]
src/readelf: no srcfiles for CU [293b]
[...lots more...]

And I think that is "correct". If you look at those CUs:

 [     b]  compile_unit         abbrev: 1
           producer             (strp) "GNU C99 8.0.1 20180312 (Red Hat 8.0.1-0.
18) -m64 -mtune=generic -mcet -march=x86-64 -g -O2 -std=gnu99 -flto -ffast-math 
-fno-common -fno-strict-aliasing -fvisibility=hidden -fstack-protector -fstack-p
rotector-strong -fPIE -ffunction-sections -fdata-sections -fexceptions -fstack-p
rotector-strong -fasynchronous-unwind-tables -fstack-clash-protection -fcf-prote
ction=full --param ssp-buffer-size=4 -fplugin=annobin"
           language             (data1) C99 (12)
           name                 (strp) "../src/udev/cdrom_id/cdrom_id.c"
           comp_dir             (strp) "/builddir/build/BUILD/systemd-stable-04a
361e18f1574098d33dbdb2e030f4a44de59ee/x86_64-redhat-linux-gnu"

 [  293b]  compile_unit         abbrev: 1
           producer             (strp) "GNU C99 8.0.1 20180312 (Red Hat 8.0.1-0.18) -m64 -mtune=generic -mcet -march=x86-64 -g -O2 -std=gnu99 -flto -ffast-math -fno-common -fno-strict-aliasing -fvisibility=hidden -fstack-protector -fstack-protector-strong -ffunction-sections -fdata-sections -fexceptions -fstack-protector-strong -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection=full -fPIC -fvisibility=default --param ssp-buffer-size=4 -fplugin=annobin"
           language             (data1) C99 (12)
           name                 (strp) "../src/libudev/libudev.c"
           comp_dir             (strp) "/builddir/build/BUILD/systemd-stable-04a361e18f1574098d33dbdb2e030f4a44de59ee/x86_64-redhat-linux-gnu"

Note that both are missing a DW_AT_stmt_list.

The other CU (the <artificial> one at 0x3148) does have a DW_AT_stmt_list.

Comment 47 Nick Clifton 2018-03-20 13:08:13 UTC
(In reply to Mark Wielaard from comment #46)


> src/readelf: no srcfiles for CU [b]

> And I think that is "correct". If you look at those CUs:
> 
>  [     b]  compile_unit         abbrev: 1
>            producer             (strp) "GNU C99 8.0.1 20180312 (Red Hat

> Note that both are missing a DW_AT_stmt_list.
> 
> The other CU (the <artificial> one at 0x3148) does have a DW_AT_stmt_list.

But ... this does correspond to the abbrevs that these CUs are using.  The CU 
at offset [   b] for example uses abbrev 1 at offset 0 in the .debug_abbrev 
section, which looks like this:

  Number TAG (0x0)
   1      DW_TAG_compile_unit    [has children]
    DW_AT_producer     DW_FORM_strp
    DW_AT_language     DW_FORM_data1
    DW_AT_name         DW_FORM_strp
    DW_AT_comp_dir     DW_FORM_strp
    DW_AT value: 0     DW_FORM value: 0

ie - no DW_AT_stsmt_list is part of the abbrev.
Whereas the abbrev that CU <atrificial> is using looks like this:

  Number TAG (0x47a)
   1      DW_TAG_compile_unit    [has children]
    DW_AT_producer     DW_FORM_strp
    DW_AT_language     DW_FORM_data1
    DW_AT_name         DW_FORM_strp
    DW_AT_comp_dir     DW_FORM_strp
    DW_AT_ranges       DW_FORM_sec_offset
    DW_AT_low_pc       DW_FORM_addr
    DW_AT_stmt_list    DW_FORM_sec_offset
    DW_AT value: 0     DW_FORM value: 0

So if there is a problem, it is with the abbrevs being generated for cdrom_id.c and libudev.c.  Not with the linker discarding debug information, right ?

(I am basically saying that I think that this problem is now a gcc debug information generation problem and not a linker problem and that I can 
leave it to wiser heads than mine....)

Comment 48 Mark Wielaard 2018-03-20 13:16:59 UTC
(In reply to Nick Clifton from comment #47)
> So if there is a problem, it is with the abbrevs being generated for
> cdrom_id.c and libudev.c.  Not with the linker discarding debug information,
> right ?
> 
> (I am basically saying that I think that this problem is now a gcc debug
> information generation problem and not a linker problem and that I can 
> leave it to wiser heads than mine....)

I think what you are saying is that at the very lowest level, the .debug_abbrev and .debug_info sections match up, so this could be considered "valid DWARF".

The problem is that the .debug_line section no longer matches up with the file attribute references in the .debug_info, because of the missing DW_AT_stmt_list entries in the CU DIEs. So it is still invalid DWARF, just at a slightly different place/level.

I assume you are right, that is caused by the compiler, not the linker.

Comment 49 Dave Malcolm 2018-03-20 13:25:22 UTC
(In reply to Mark Wielaard from comment #48)
> I assume you are right, that is caused by the compiler, not the linker.

Thanks Nick and Mark.

This feels like it ought to be tracked as a separate bug, so I've opened bug 1558551 for it; I'm looking into it from the gcc side.

Comment 50 Ben Cotton 2019-05-02 21:13:56 UTC
This message is a reminder that Fedora 28 is nearing its end of life.
On 2019-May-28 Fedora will stop maintaining and issuing updates for
Fedora 28. It is Fedora's policy to close all bug reports from releases
that are no longer maintained. At that time this bug will be closed as
EOL if it remains open with a Fedora 'version' of '28'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 28 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 51 Mark Wielaard 2019-05-03 07:23:52 UTC
This bug is fixed. But the additional bug #1558551 isn't yet.


Note You need to log in before you can comment on or make changes to this bug.