Description of problem: Builds of the annobin package in rawhide have started to fail for the ARM architecture. The error message says that a build-id is too small. All other architectures build successfully. Version-Release number of selected component (if applicable): annobin-10.23 rpm-build-4.17.0-1 clang-13.0.0-5 How reproducible: 100% Steps to Reproduce: 1. fedpkg clone annobin 2. fedpkg srpm 3. fedpkg scratch-build --srpm annobin-10.23-1.fc36.src.rpm --arches armv7hl Actual results: build (rawhide, annobin-10.23-1.fc36.src.rpm) failed Expected results: Successful build Additional info: The build log contains this information: RPM build errors: error: build-id found in /builddir/build/BUILDROOT/annobin-10.23-1.fc36.arm/usr/lib/llvm/12.0.1/annobin-for-llvm.so too small error: Generating build-id links failed For example of a build see: https://koji.fedoraproject.org/koji/taskinfo?taskID=78945337 The annobin-for-llvm.so binary is built by clang, so the problem may be related to that compiler. The command line is: clang++ -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -D_FORTIFY_SOURCE=2 -D_GLIBCXX_ASSERTIONS -shared -fPIC -Wall -O2 -flto -g -grecord-gcc-switches -Wl,--build-id -Wl,-z,now -I./.. annobin.cpp -o annobin-for-llvm.so
Reassigning to Clang.... The problem appears to have started because of this change: https://src.fedoraproject.org/rpms/clang/c/6699b0a7c677c2b7ab77db146bfcc0580d1fdb42?branch=rawhide The issue is that the default build-id created by the LLD linker is too small. (The default hash alogorithm is chosen for speed, not cryptographic integrity). A workaround is to add -Wl,--build-id=md5 to any Clang/Clang++/LLVM command line that involves linking.
Upstream bug is https://bugs.llvm.org/show_bug.cgi?id=44138 The default build-ids generated by lld are too tiny. They need to be globally unique and rpm enforces that: https://github.com/rpm-software-management/rpm/issues/950
This will should be fixed by this redhat-rpm-config change, which needs reivew: https://src.fedoraproject.org/rpms/redhat-rpm-config/pull-request/155
@
@nickc Should we use the md5 or the sha1 algorithm?
(In reply to Tom Stellard from comment #5) > @nickc Should we use the md5 or the sha1 algorithm? Neither! Both have been deprecated. Ideally we should be using something like SHA-256 or Blake3. But this would mean adding new code to LLD. So if we have to choose between MD5 and SHA1 then I would recommend SHA1.
(In reply to Nick Clifton from comment #6) > (In reply to Tom Stellard from comment #5) > > @nickc Should we use the md5 or the sha1 algorithm? > > Neither! Both have been deprecated. Ideally we should be using something > like SHA-256 or Blake3. But this would mean adding new code to LLD. So if > we have to choose between MD5 and SHA1 then I would recommend SHA1. What's the bfd default algorithm?
(In reply to Tom Stellard from comment #7) > What's the bfd default algorithm? sha1: --build-id --build-id=style Request the creation of a ".note.gnu.build-id" ELF note section or a ".buildid" COFF section. The contents of the note are unique bits identifying this linked file. style can be "uuid" to use 128 random bits, "sha1" to use a 160-bit SHA1 hash on the normative parts of the output contents, "md5" to use a 128-bit MD5 hash on the normative parts of the output contents, or "0xhexstring" to use a chosen bit string specified as an even number of hexadecimal digits ("-" and ":" characters between digit pairs are ignored). If style is omitted, "sha1" is used.
Serge has actually submit an LLD patch upstream to add support for generating SHA-256 based build-ids: https://reviews.llvm.org/D113991 It is nice and short as the algorithm is already implemented in the code. There is push back however because it is perceived as being unnecessary and slow. My argument is that a malicious actor could replace an existing library with a corrupt one, and if the build-ids are based on 'fast' or 'md5' or 'sha1' then all of these can be spoofed. So users would not notice any change in the debugging experience for example.
(In reply to Nick Clifton from comment #9) > My argument is that a malicious actor could replace an existing library with > a corrupt one, and if the > build-ids are based on 'fast' or 'md5' or 'sha1' then all of these can be > spoofed. So users would not > notice any change in the debugging experience for example. Spoofing and malicious actors are not really the point imho (if someone wants to be malicious they would just hard code a build-id in their binaries). The real point is that you need enough bits and a good enough hashing function to guarantee the build-id is a globally unique identifier and you don't get accidental collisions, it is also desirable to get the same build-id for reproducible builds. So you would like to get at least 128bits and using a secure hashing algorithm makes sure that you don't get accidental collisions (and that the build-id is reproducible). But in theory it doesn't need to be a secure hash, it does need to be strong enough to produce unique hashes. Any such hashing algorithm that produces at least 128bits should be fine. You can even simply use just 16 bytes of the result if the algorithm produces more. Just pick the fastest algorithm that produces at least 128 unique bits.
This bug appears to have been reported against 'rawhide' during the Fedora 36 development cycle. Changing version to 36.
This was fixed in redhat-rpm-config-207-1.fc36.