Bug 2023666 - Building annobin fails with "build-id too small", but only for ARM architecture
Summary: Building annobin fails with "build-id too small", but only for ARM architecture
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: clang
Version: 36
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Tom Stellard
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-11-16 10:11 UTC by Nick Clifton
Modified: 2022-07-22 02:57 UTC (History)
11 users (show)

Fixed In Version: redhat-rpm-config-207-1.fc36
Clone Of:
Environment:
Last Closed: 2022-07-22 02:57:21 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github rpm-software-management rpm issues 950 0 None closed Please support smaller build-ids 2021-11-16 10:59:42 UTC
LLVM 44138 0 P NEW please consider using a 128 bit fast build-id 2021-11-16 11:00:05 UTC

Description Nick Clifton 2021-11-16 10:11:52 UTC
Description of problem:

  Builds of the annobin package in rawhide have started to fail for the ARM 
  architecture.  The error message says that a build-id is too small.  All 
  other architectures build successfully.

Version-Release number of selected component (if applicable):

  annobin-10.23
  rpm-build-4.17.0-1
  clang-13.0.0-5

How reproducible:

 100%

Steps to Reproduce:
1. fedpkg clone annobin
2. fedpkg srpm
3. fedpkg scratch-build --srpm annobin-10.23-1.fc36.src.rpm --arches armv7hl

Actual results:

  build (rawhide, annobin-10.23-1.fc36.src.rpm) failed

Expected results:

  Successful build

Additional info:

  The build log contains this information:

RPM build errors:
error: build-id found in /builddir/build/BUILDROOT/annobin-10.23-1.fc36.arm/usr/lib/llvm/12.0.1/annobin-for-llvm.so too small
error: Generating build-id links failed

  For example of a build see:

https://koji.fedoraproject.org/koji/taskinfo?taskID=78945337

  The annobin-for-llvm.so binary is built by clang, so the problem may be related to that compiler.  The command line is:

clang++  -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -D_FORTIFY_SOURCE=2 -D_GLIBCXX_ASSERTIONS -shared -fPIC -Wall -O2 -flto -g -grecord-gcc-switches -Wl,--build-id -Wl,-z,now -I./.. annobin.cpp -o annobin-for-llvm.so

Comment 1 Nick Clifton 2021-11-16 10:55:49 UTC
Reassigning to Clang....

The problem appears to have started because of this change:

  https://src.fedoraproject.org/rpms/clang/c/6699b0a7c677c2b7ab77db146bfcc0580d1fdb42?branch=rawhide

The issue is that the default build-id created by the LLD linker is too small.  (The default hash alogorithm is chosen for speed, not cryptographic integrity).

A workaround is to add -Wl,--build-id=md5 to any Clang/Clang++/LLVM command line that involves linking.

Comment 2 Mark Wielaard 2021-11-16 10:59:42 UTC
Upstream bug is https://bugs.llvm.org/show_bug.cgi?id=44138

The default build-ids generated by lld are too tiny. They need to be globally unique and rpm enforces that:
https://github.com/rpm-software-management/rpm/issues/950

Comment 3 Tom Stellard 2021-11-16 16:30:45 UTC
This will should be fixed by this redhat-rpm-config change, which needs reivew:

https://src.fedoraproject.org/rpms/redhat-rpm-config/pull-request/155

Comment 4 Tom Stellard 2021-11-16 16:31:31 UTC
@

Comment 5 Tom Stellard 2021-11-16 16:33:03 UTC
@nickc Should we use the md5 or the sha1 algorithm?

Comment 6 Nick Clifton 2021-11-16 17:05:26 UTC
(In reply to Tom Stellard from comment #5)
> @nickc Should we use the md5 or the sha1 algorithm?

Neither!  Both have been deprecated.  Ideally we should be using something like SHA-256 or Blake3.  But this would mean adding new code to LLD.  So if we have to choose between MD5 and SHA1 then I would recommend SHA1.

Comment 7 Tom Stellard 2021-11-16 17:33:42 UTC
(In reply to Nick Clifton from comment #6)
> (In reply to Tom Stellard from comment #5)
> > @nickc Should we use the md5 or the sha1 algorithm?
> 
> Neither!  Both have been deprecated.  Ideally we should be using something
> like SHA-256 or Blake3.  But this would mean adding new code to LLD.  So if
> we have to choose between MD5 and SHA1 then I would recommend SHA1.

What's the bfd default algorithm?

Comment 8 Mark Wielaard 2021-11-16 17:44:04 UTC
(In reply to Tom Stellard from comment #7)
> What's the bfd default algorithm?

sha1:

       --build-id
       --build-id=style
           Request the creation of a ".note.gnu.build-id" ELF note section or
           a ".buildid" COFF section.  The contents of the note are unique
           bits identifying this linked file.  style can be "uuid" to use 128
           random bits, "sha1" to use a 160-bit SHA1 hash on the normative
           parts of the output contents, "md5" to use a 128-bit MD5 hash on
           the normative parts of the output contents, or "0xhexstring" to use
           a chosen bit string specified as an even number of hexadecimal
           digits ("-" and ":" characters between digit pairs are ignored).
           If style is omitted, "sha1" is used.

Comment 9 Nick Clifton 2021-11-17 12:23:05 UTC
Serge has actually submit an LLD patch upstream to add support for generating SHA-256 based build-ids:

  https://reviews.llvm.org/D113991

It is nice and short as the algorithm is already implemented in the code.  There is push back however
because it is perceived as being unnecessary and slow.

My argument is that a malicious actor could replace an existing library with a corrupt one, and if the
build-ids are based on 'fast' or 'md5' or 'sha1' then all of these can be spoofed.  So users would not
notice any change in the debugging experience for example.

Comment 11 Mark Wielaard 2021-11-17 13:36:35 UTC
(In reply to Nick Clifton from comment #9)
> My argument is that a malicious actor could replace an existing library with
> a corrupt one, and if the
> build-ids are based on 'fast' or 'md5' or 'sha1' then all of these can be
> spoofed.  So users would not
> notice any change in the debugging experience for example.

Spoofing and malicious actors are not really the point imho (if someone wants to be malicious they would just hard code a build-id in their binaries).
The real point is that you need enough bits and a good enough hashing function to guarantee the build-id is a globally unique identifier and you don't get accidental collisions, it is also desirable to get the same build-id for reproducible builds.

So you would like to get at least 128bits and using a secure hashing algorithm makes sure that you don't get accidental collisions (and that the build-id is reproducible). But in theory it doesn't need to be a secure hash, it does need to be strong enough to produce unique hashes. Any such hashing algorithm that produces at least 128bits should be fine. You can even simply use just 16 bytes of the result if the algorithm produces more. Just pick the fastest algorithm that produces at least 128 unique bits.

Comment 12 Ben Cotton 2022-02-08 21:27:32 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 36 development cycle.
Changing version to 36.

Comment 13 Tom Stellard 2022-07-22 02:57:21 UTC
This was fixed in redhat-rpm-config-207-1.fc36.


Note You need to log in before you can comment on or make changes to this bug.