Bug 1546608 - ld does not merge .gnu.build.attributes
Summary: ld does not merge .gnu.build.attributes
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: binutils
Version: 28
Hardware: x86_64
OS: Linux
low
low
Target Milestone: ---
Assignee: Nick Clifton
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Keywords:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-02-19 01:59 UTC by John Reiser
Modified: 2019-02-02 03:34 UTC (History)
4 users (show)

(edit)
Clone Of:
(edit)
Last Closed: 2019-02-02 03:34:43 UTC


Attachments (Terms of Use)

Description John Reiser 2018-02-19 01:59:57 UTC
Description of problem: The output Section for .gnu.build.attributes is just the relocated concatenation of all the corresponding input Sections, without any merging of the same attributes for adjacent address ranges.  So if there are N separate compilation units all compiled by the same compiler configuration, then the output Section will be N copies of the same .gnu.build.attributes. That wastes space as N grows, such as in a large project or when using larger .a archive libraries.

Also, the static binder "ld" is the program that handles alignment, and is the only program that knows when two address ranges become contiguous because of alignment.  For example (reported by "readelf --wide --notes"):
  GA$<tool>gcc 8.0.1 20180131  0x00000000       OPEN        Applies to region from 0x10580 to 0x1178a
  GA$<tool>gcc 8.0.1 20180131  0x00000000       OPEN        Applies to region from 0x11790 to 0x119a9
No program other than ld knows that the range from 0x1178a to 0x11790 has been "bridged" by alignment.  This means that all the annobin checkers (built-by.sh, check-abi.sh, etc.) are working with incomplete information: the range 0x1178a to 0x11790 could be occupied by code that was not produced by gcc 8.0.1 20180131.  readelf (binutils-2.29.1-19.fc28.x86_64) does not diagnose such a case.


Version-Release number of selected component (if applicable):
binutils-2.29.1-19.fc28.x86_64


How reproducible: every time


Steps to Reproduce:
1. readelf --wide --notes /bin/date   ## /bin/date from coreutils-8.29-3.fc28.x86_64
2.
3.

Actual results: 31 copies of the 12 lines
=====
  GA$<version>3p4              0x00000010       OPEN        Applies to region from 0x3760 to 0x3ef7
  GA$<tool>gcc 8.0.1 20180131  0x00000000       OPEN        Applies to region from 0x3760 to 0x3ef7
  GA*GOW:0x452a                0x00000000       OPEN        Applies to region from 0x3760 to 0x3ef7
  GA*<stack prot>strong        0x00000000       OPEN        Applies to region from 0x3760 to 0x3ef7
  GA+stack_clash:true          0x00000000       OPEN        Applies to region from 0x3760 to 0x3ef7
  GA*cf_protection:0x8         0x00000000       OPEN        Applies to region from 0x3760 to 0x3ef7
  GA+GLIBCXX_ASSERTIONS:true   0x00000000       OPEN        Applies to region from 0x3760 to 0x3ef7
  GA*FORTIFY:0x2               0x00000000       OPEN        Applies to region from 0x3760 to 0x3ef7
  GA*<PIC>pic                  0x00000000       OPEN        Applies to region from 0x3760 to 0x3ef7
  GA!<short enum>false         0x00000000       OPEN        Applies to region from 0x3760 to 0x3ef7
  GA*<ABI>0x7001100000012      0x00000000       OPEN        Applies to region from 0x3760 to 0x3ef7
  GA*cet status:0x2020102      0x00000000       OPEN        Applies to region from 0x3760 to 0x3ef7
=====
with a different address range per copy.


Expected results: Merge contiguous ranges (after alignment) of the same attribute.  In this case there would be oly 12 lines, with the range from the minimum to the maximum address.


Additional info:
There are also empty ranges, such as:
  GA$<version>3p4              0x00000010       OPEN        Applies to region from 0x3ef7 to 0x3ef7
and unbounded ranges, such as:
  GA$<tool>gcc 8.0.1 20180131  0x00000000       OPEN        Applies to region from 0x3ef7

but I suppose those are the fault of -fplugin==annobin at compilation, or of expansion by readelf.

Comment 1 Nick Clifton 2018-02-19 12:23:41 UTC
Hi John,

  The notes can be merged by the objcopy program.  It has a special option:
  --merge-notes to do exactly this.

  The ability was not put into the linker in order to keep things simple -
  ie less chance of introducing bugs.  It also means that older versions
  of the linker can still correctly process files containing the annobin
  notes.

Cheers
  Nick

Comment 2 Jakub Jelinek 2018-02-19 12:33:26 UTC
Is objdump --merge-notes invoked during rpm post-processing?

Comment 3 Nick Clifton 2018-02-19 16:08:06 UTC
(In reply to Jakub Jelinek from comment #2)
> Is objdump --merge-notes invoked during rpm post-processing?

I do not think so.  Maybe it should be, but I think that there are enough new things in the build process at the moment, so leaving it out (for now) is not
such a bad thing.  Note - the annobin notes are in a non-loadable section, so
they do not take up space in the run-time image, only the on-disk image.

Cheers
  Nick

Comment 4 Nick Clifton 2018-02-19 16:21:44 UTC
Hi John,

(In reply to John Reiser from comment #0)

> Also, the static binder "ld" is the program that handles alignment, and is
> the only program that knows when two address ranges become contiguous
> because of alignment.  For example (reported by "readelf --wide --notes"):
>   GA$<tool>gcc 8.0.1 20180131  0x00000000       OPEN        Applies to
> region from 0x10580 to 0x1178a
>   GA$<tool>gcc 8.0.1 20180131  0x00000000       OPEN        Applies to
> region from 0x11790 to 0x119a9
> No program other than ld knows that the range from 0x1178a to 0x11790 has
> been "bridged" by alignment.  This means that all the annobin checkers
> (built-by.sh, check-abi.sh, etc.) are working with incomplete information:
> the range 0x1178a to 0x11790 could be occupied by code that was not produced
> by gcc 8.0.1 20180131.

Seriously ?  You are worried about the case where 6 bytes of unannotated code might have been included in an executable's code space ?  Theoretically possible
true, but is it really worth worrying about ?

Currently readelf will detect and warn about gaps of 16-bytes or larger in the code coverage of annobin notes.  I could reduce that, but in order to prevent false positives when there is a real linker-inserted alignment adjustment, the
code would then have to check to see if there were any symbols in the adjustment area.  Which would make readelf slower and would not help if the unannotated code did not contain any symbols.

I should also note that I am also working on an enhancement to the assembler, such that any time it creates an object file which does not contain any annobin
notes, it automatically adds a note of its own, saying basically: "this unannotated region came from assembler input 'foo'".  It may also include the assembler command line options as well.  I am ot sure if that is needed or not at the moment.

Cheers
  Nick

Comment 5 John Reiser 2018-02-19 17:29:29 UTC
Hi Nick,

(In reply to Nick Clifton from comment #4)

> Seriously ?  You are worried about the case where 6 bytes of unannotated
> code might have been included in an executable's code space ?  Theoretically
> possible, but is it really worth worrying about ?

6 to 15 bytes is enough to allow damage, particularly when there is such a region adjacent to the code for most compilation units.

Here's an idea: When ld generates bytes because of alignment, then ld could extend the previous region in .gnu.build.attributes so as to cover those bytes (as long as the new bytes are the only extra bytes.)  That allows making any subsequent adjacency/contiguity checking stronger.  It does "paper over the alignment holes", although this is controlled by the designation of filler bytes in the [usually defaulted] linker script.

Comment 6 Fedora End Of Life 2018-02-20 15:26:10 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 28 development cycle.
Changing version to '28'.

Comment 7 Fedora Update System 2019-01-30 12:46:14 UTC
binutils-2.31.1-17.fc29 has been submitted as an update to Fedora 29. https://bodhi.fedoraproject.org/updates/FEDORA-2019-ba3cbcfd20

Comment 8 Fedora Update System 2019-01-31 02:30:09 UTC
binutils-2.31.1-17.fc29 has been pushed to the Fedora 29 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2019-ba3cbcfd20

Comment 9 Fedora Update System 2019-02-02 03:34:43 UTC
binutils-2.31.1-17.fc29 has been pushed to the Fedora 29 stable repository. If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.