Description of problem: chromium (or more specifically, the ffmpeg part of chromium) is failing on aarch64 on F33+ (everywhere else it is fine). Version-Release number of selected component (if applicable): 84.0.4147.125-2 How reproducible: Every aarch64 build fails. Actual results: g++ -shared -Wl,--fatal-warnings -fPIC -Wl,-z,noexecstack -Wl,-z,relro -Wl,-z,now -Wl,-z,defs -Wl,--as-needed -Wl,-O2 -Wl,--gc-sections -rdynamic -o "./libclearkeycdm.so" -Wl,-soname="libclearkeycdm.so" @"./libclearkeycdm.so.rsp" obj/third_party/ffmpeg/ffmpeg_internal/videodsp.o: in function `ff_prefetch_aarch64': (.text+0x10): relocation truncated to fit: R_AARCH64_CONDBR19 against symbol `ff_prefetch_aarch64' defined in .text section in obj/third_party/ffmpeg/ffmpeg_internal/videodsp.o collect2: error: ld returned 1 exit status Expected results: No errors. Additional info: third_party/ffmpeg/libavcodec/aarch64/videodsp.S #include "libavutil/aarch64/asm.S" function ff_prefetch_aarch64, export=1 subs w2, w2, #2 prfm pldl1strm, [x0] prfm pldl1strm, [x0, x1] add x0, x0, x1, lsl #1 b.gt X(ff_prefetch_aarch64) ret endfunc
Created attachment 1711927 [details] preprocessed source (gcc -E) preprocessed source
One note I just realized: this error is being thrown at link time, but only for the clear_cdm_key target. The code is compiled for the headless_shell and chrome targets as well, and it links successfully into those binaries. The clear_cdm_key target is built from the chrome buildenv, so it inherits the chrome build of videodev.o, but it cannot link into the libclearkey.so file. I forced it to rebuild the ffmpeg bits again and it did not change the outcome. See: [6/6] g++ -shared -Wl,--fatal-warnings -fPIC -Wl,-z,noexecstack -Wl,-z,relro -Wl,-z,now -Wl,-z,defs -Wl,--as-needed -Wl,-O2 -Wl,--gc-sections -rdynamic -o "./libclearkeycdm.so" -Wl,-soname="libclearkeycdm.so" @"./libclearkeycdm.so.rsp" FAILED: libclearkeycdm.so g++ -shared -Wl,--fatal-warnings -fPIC -Wl,-z,noexecstack -Wl,-z,relro -Wl,-z,now -Wl,-z,defs -Wl,--as-needed -Wl,-O2 -Wl,--gc-sections -rdynamic -o "./libclearkeycdm.so" -Wl,-soname="libclearkeycdm.so" @"./libclearkeycdm.so.rsp" obj/third_party/ffmpeg/ffmpeg_internal/videodsp.o: in function `ff_prefetch_aarch64': (.text+0x10): relocation truncated to fit: R_AARCH64_CONDBR19 against symbol `ff_prefetch_aarch64' defined in .text section in obj/third_party/ffmpeg/ffmpeg_internal/videodsp.o collect2: error: ld returned 1 exit status ninja: build stopped: subcommand failed.
Created attachment 1711929 [details] rsp file
I have an F33 aarch64 instance at the ready to test or provide more debugging info.
When I downgraded binutils to 2.34-4.fc32, the issue disappears. Reassigning to binutils.
Additional data point: forcing gold at link time works around this error: g++ -shared -Wl,--fatal-warnings -fPIC -Wl,-z,noexecstack -Wl,-z,relro -Wl,-z,now -Wl,-z,defs -Wl,--as-needed -Wl,-O2 -Wl,--gc-sections -rdynamic -o "./libclearkeycdm.so" -Wl,-soname="libclearkeycdm.so" @"./libclearkeycdm.so.rsp" -fuse-ld=gold This isn't really practical to do for chromium, because gold is not reliable for the other parts. Hopefully it does help narrow where the bug is though. Please let me know what more debugging I can generate for you.
Hi Tom, It sounds to me like this might due to how the linker is assigning sections to the address space. (ld.bfd and ld.gold have different heuristics for this). Reproducing the problem is going to be a pain without all of the object files specified in the .rsp file, and all of the system libraries too. As a possible workaround, if you change the code in ff_prefetch_aarch64 to: add x0, x0, x1, lsl #1 b.le 1f b X(ff_prefetch_aarch64) 1: ret Would that suffice ? Cheers Nick
That workaround resolves the issue. I still have this environment spun up on EC2, so if you need something to figure out what the proper binutils fix is, let me know.
Hi Tom, I have been talking to some of the engineers at ARM but there is a problem with the example code: function ff_prefetch_aarch64, export=1 [...] b.gt X(ff_prefetch_aarch64) Is this really a branch back to the start of the ff_prefetch_aarch64 function ? It seems likely that the X() macro mangles the name somehow, but this is not clear from the text. Are you able to upload a copy of the object file containing this assembled code ? That would help with the analysis. Also - does adding "-Wl,--stub-group-size=-1" or "-Wl,--stub-group-size=+1" to the final g++ command line make the problem go away ? Cheers Nick
Nick, I think you want the preprocessed assembly (i.e. videosp.s) instead, there are quite complicated macros and perhaps something changed on the binutils as side on how they are handled. See https://fossies.org/linux/ffmpeg/libavutil/aarch64/asm.S ?
(In reply to Jakub Jelinek from comment #10) > Nick, I think you want the preprocessed assembly (i.e. videosp.s) instead, Actually both would be good. The point about seeing the object file is that I can examine the relocations that have been created. This will tell me the real name of the function that is the destination of the branch, plus also the type of the relocation. The AArch64 ABI specifies that the R_AARCH64_CONDBR19 relocation should not be used if the branch could be out of range. So I need to know if this is an assembler problem - the assembler is not flagging up potentially bad code - or a linker problem - the linker is trying to relax the branch but generating the wrong relocation. Of course I can assemble the .s file, but there is the possibility that I might not be using the same set of command line options as Tom's build. So an already assembled file is best.
True. And with the workaround you've provided, it might be also interesting to see how it got linked (i.e. what the unconditional branch branched around by the negated conditional branch branches too, how far exactly it is).
Okay. 1. Neither -Wl,--stub-group-size=-1 nor -Wl,--stub-group-size=+1 had any impact on the issue, it still throws the error at link of libclearkeycdm.so. 2. I will attach a copy of the generated videodsp.o without the workaround patch as videodsp-no-patch.o 3. I generated preprocessed source without the workaround patch like this: gcc -MMD -MF obj/third_party/ffmpeg/ffmpeg_internal/videodsp.o.d -DHAVE_AV_CONFIG_H -D_POSIX_C_SOURCE=200112 -D_XOPEN_SOURCE=600 -DPIC -DFFMPEG_CONFIGURATION=NULL -D_ISOC99_SOURCE -D_LARGEFILE_SOURCE -DUSE_UDEV -DUSE_AURA=1 -DUSE_GLIB=1 -DUSE_NSS_CERTS=1 -DUSE_X11=1 -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE -DNDEBUG -DNVALGRIND -DDYNAMIC_ANNOTATIONS_ENABLED=0 -DOPUS_FIXED_POINT -I../../third_party/ffmpeg/chromium/config/Chromium/linux/arm64 -I../../third_party/ffmpeg -I../../third_party/ffmpeg/compat/atomics/gcc -I../.. -Igen -I../../third_party/opus/src/include -DHAVE_VFP_ARGS=1 -fPIC -fno-strict-aliasing --param=ssp-buffer-size=4 -fstack-protector -funwind-tables -fPIC -pipe -pthread -std=gnu11 -g0 -c ../../third_party/ffmpeg/libavcodec/aarch64/videodsp.S -E &> videodsp-preprocessed-E-no-patch.output 4. I then applied the patch and compiled again, this copy of videodsp.o will be attached as videodsp-patched.o 5. Same for the preprocessed source (videodsp-preprocessed-E-patched.output).
Created attachment 1712386 [details] binary object, no patch
Created attachment 1712387 [details] binary object, with patch
Created attachment 1712388 [details] preprocessed source (gcc -E), no patch
Created attachment 1712389 [details] preprocessed source (gcc -E), patch applied
Hi Tom, Thanks for the uploads. So, it does look like the code is looping back on itself. The disassembled output for the no-patch object file looks like this: % objdump -dr videodsp-no-patch.o 0000000000000000 <ff_prefetch_aarch64>: 0: 71000842 subs w2, w2, #0x2 4: f9800001 prfm pldl1strm, [x0] 8: f8a16801 prfm pldl1strm, [x0, x1] c: 8b010400 add x0, x0, x1, lsl #1 10: 5400000c b.gt 0 <ff_prefetch_aarch64> 10: R_AARCH64_CONDBR19 ff_prefetch_aarch64 14: d65f03c0 ret The problem is that the ff_prefetch_aarch64 symbol is global, so the linker changes the branch to go to the entry in the PLT table for the function. Which, because the program is so big, is too far away for a conditional branch. (You were lucky with the other builds that the branch was in range. This problem could have arisen at any time in the past, if the build had been big enough). So this actually looks like a programming bug. What the assembly code should look like is this: function ff_prefetch_aarch64, export=1 1: subs w2, w2, #2 prfm pldl1strm, [x0] prfm pldl1strm, [x0, x1] add x0, x0, x1, lsl #1 b.gt 1b ret endfunc Ie a conditional branch back to a local symbol placed at the start of the function. (You should find that this code is actually faster than the old code, since now there will be no indirection via the PLT). In general the rule is that AArch64 assembler code should never use a conditional branch to move to the start of a global function. Does this make sense ? Cheers Nick
Okay, that makes sense, except that: 1. Downgrading to the old binutils works. 2. This code has been in ffmpeg for a long time without issue. Is it plausible that binutils has gotten stricter in some way?
Its certainly possible that binutils has gotten stricter, but that would also imply that the code wasn't being used before since if the branch was out of range you'd end up going to somewhere fairly unpredictable. It could also be the case that we've got more PLT entries or that the PLT entries are larger and as a result the branch no longer reaches the appropriate PLT entry. It could also be the case that there's been a layout change in the linker. There's other scenarios I can theorize about, but those are the most likely in my mind. Nick could probably dig this out but I'm not sure it's all that useful in the end -- a conditional branch to a PLT entry is a recipe for disaster on many architectures precisely because conditional branches usually have a limited branching range. The code really needs to get fixed.
Tom, do you have the library linked with older binutils? Where does the branch branch to in that case? To the start of the function, or the PLT, and if the latter, how far is it from the function? I've tried to look at chromium-libs in older Fedora releases for aarch64, but can't find ff_prefetch_aarch64 in there...
Created attachment 1712571 [details] libclearkeycdm.so linked with old binutils and no patch applied to videodsp.S
Created attachment 1712572 [details] videodsp, no patch, old (f32) binutils
In the old binutils built shared library: 000000000048a24c <ff_prefetch_aarch64>: 48a24c: 71000842 subs w2, w2, #0x2 48a250: f9800001 prfm pldl1strm, [x0] 48a254: f8a16801 prfm pldl1strm, [x0, x1] 48a258: 8b010400 add x0, x0, x1, lsl #1 48a25c: 54ffff8c b.gt 48a24c <ff_prefetch_aarch64> 48a260: d65f03c0 ret 48a264: d503201f nop 48a268: d503201f nop 48a26c: d503201f nop and no ff_prefetch_aarch64 in .plt section, and in the object file: 0000000000000000 <ff_prefetch_aarch64>: 0: 71000842 subs w2, w2, #0x2 4: f9800001 prfm pldl1strm, [x0] 8: f8a16801 prfm pldl1strm, [x0, x1] c: 8b010400 add x0, x0, x1, lsl #1 10: 5400000c b.gt 0 <ff_prefetch_aarch64> 10: R_AARCH64_CONDBR19 ff_prefetch_aarch64 14: d65f03c0 ret So, Nick, why is a PLT entry added for it with the new binutils?
Trying: .text .globl foo foo: testl %edi, %edi jne foo ret on x86_64-linux and i686-linux, in both cases when this is linked into a shared library the conditional branch goes to the local foo, rather than to a PLT slot. While when the conditional branch is to bar (undefined external), then it goes through PLT on x86_64 and on i686 becomes a DT_TEXTREL relocation. So I'd say that aarch64 should behave similarly.
Hi Jakub, What about symbol preemption ? If foo (or ff_prefetch_aarch64) has default visibility then it could be replaced by another definition in a different component. Hence you need to indirect via the PLT. Despite saying this however, there has been a change in the binutils which is probably related to this problem: * elfnn-aarch64.c (elfNN_aarch64_final_link_relocate): Club BFD_RELOC_AARCH64_BRANCH19 and BFD_RELOC_AARCH64_TSTBR14 cases with BFD_RELOC_AARCH64_JUMP26. (elfNN_aarch64_check_relocs): Likewise. This patch is intended to handle the case where a conditional branch is made to an undefined symbol. (The linker used to just change this to a branch to 0 and not issue an error message). I am talking to the ARM engineers who created the patch to see if they have any ideas on the issue. Cheers Nick
Hi Tom, Would you mind seeing how a build goes with the latest rawhide binutils: binutils-2.35-13.fc34 It won't fix the problem - you need to patch the assembler source code for that - but it should issue a better error message, explicitly stating that that kind of conditional branch is not allowed. Cheers Nick
(In reply to Nick Clifton from comment #27) > It won't fix the problem - you need to patch the assembler source code for > that - but it > should issue a better error message, explicitly stating that that kind of > conditional branch > is not allowed. Actually I take that back. The patch should fix the build issue, and you do not need to change the assembler. At least that is what I hope will happen. Please let me know.
This is all working now (has been for a while). Thanks!