Bug 2162798 - Red Hat bfd linker gives "undefined reference to symbol" for specific programs using LLVM 15 with OpenMP offload and LTO
Summary: Red Hat bfd linker gives "undefined reference to symbol" for specific program...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 9
Classification: Red Hat
Component: binutils
Version: 9.0
Hardware: x86_64
OS: Linux
unspecified
low
Target Milestone: rc
: ---
Assignee: Nick Clifton
QA Contact: qe-baseos-tools-bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-01-20 23:30 UTC by Daniel Woodworth
Modified: 2023-07-18 14:25 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
If this bug requires documentation, please select an appropriate Doc Type value.
Clone Of:
Environment:
Last Closed: 2023-01-25 23:23:50 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHELPLAN-145944 0 None None None 2023-01-20 23:32:39 UTC

Description Daniel Woodworth 2023-01-20 23:30:48 UTC
Description of problem:

A "undefined reference to symbol" linker error is produced when linking a program under these very specific conditions:

* on Red Hat Enterprise Linux 9.0 or Fedora 34
* compiling with LLVM 15
* using the bfd linker
* the program has functions that have the same names as specific functions from zlib (such as "crc32_z")
* LTO is enabled (with -flto)
* OpenMP offloading is enabled (with "-fopenmp -fopenmp-targets=x86_64-pc-linux-gnu")

Version-Release number of selected component (if applicable):

2.35.2-17.el9 for Red Hat Enterprise Linux 9.0
2.35.2-6.fc34 for Fedora 34

How reproducible:

always

Steps to Reproduce:

$ yum install wget binutils-devel
$ wget https://github.com/llvm/llvm-project/archive/refs/tags/llvmorg-15.0.0.tar.gz
$ tar xzf llvmorg-15.0.0.tar.gz
$ cd llvm-project-llvmorg-15.0.0
$ mkdir build deploy
$ cmake -S llvm -B build -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_PROJECTS="clang" -DLLVM_ENABLE_RUNTIMES=openmp -DCMAKE_INSTALL_PREFIX="$(readlink -f deploy)" -DLLVM_BINUTILS_INCDIR=/usr/include
$ cmake --build build --target install -j $(nproc)
$ export PATH="$(readlink -f deploy/bin):$PATH"
$ cd ..
$ echo "int main() {return 0;} void crc32_z(void) {}" > zlib-omptarget-clash.c
$ clang -flto -fopenmp -fopenmp-targets=x86_64-pc-linux-gnu zlib-omptarget-clash.c -o zlib-omptarget-clash

Actual results:

/usr/bin/ld: /tmp/zlib-omptarget-clash-fa0dc9.o (symbol from plugin): undefined reference to symbol 'crc32_z@@ZLIB_1.2.9'
/usr/bin/ld: /usr/lib64/libz.so.1: error adding symbols: DSO missing from command line
/iusers/dwoodwor/rhel9-lto-omp-zlib-repro/llvm-project-llvmorg-15.0.0/deploy/bin/clang-linker-wrapper: error: 'ld' failed
clang-15: error: linker command failed with exit code 1 (use -v to see invocation)

Expected results:

Linking completes without errors.

Additional info:

This bug is exposed by a change in LLVM 15 which links zlib indirectly into libomptarget. I've filed an LLVM bug for this here: https://github.com/llvm/llvm-project/issues/58712

One workaround to the bug is to link the program against zlib directly with "-lz".

The bug appears to only happen on Red Hat Enterprise Linux 9 and Fedora 34 specifically: Red Hat Enterprise Linux 8 and Fedora 33 and 35 are unaffected.

This seems to only happen with Red Hat builds of the bfd linker; I don't see this linker error when using lld (so "-fuse-ld=lld" is another workaround), and I also don't see it with a local build of upstream binutils 2.35.2.

I'm not sure how to get the package sources for Red Hat Enterprise Linux, but I did find the ones for Fedora 34 in https://src.fedoraproject.org/rpms/binutils/tree/f34. I tested applying these patches to see which one causes the error, and that patch was binutils-plugin-as-needed.patch. binutils-elf-add-objects.patch also needs to be applied first for binutils-plugin-as-needed.patch to apply cleanly. I'm not very familiar with the linker sources myself, but the fix might be clear to someone with a better understanding of what binutils-plugin-as-needed.patch does.

Comment 1 Nick Clifton 2023-01-23 14:27:15 UTC
(In reply to Daniel Woodworth from comment #0)
Hi Daniel,

  I am currently unable to reproduce this problem :-(

> 2.35.2-17.el9 for Red Hat Enterprise Linux 9.0

Are you able to reproduce the problem using the 2.35.2-37.el9 binutils ?

  https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=2340527

(This was the version that I used for my local tests).


Also - does the problem occur if you use the bfd linker (ld.bfd) rather than the gold linker (ld.gold) ?

 

> I tested applying
> these patches to see which one causes the error, and that patch was
> binutils-plugin-as-needed.patch. 

Hmm, interesting.  Once I can reproduce the problem, this should make fixing it a lot easier.

Cheers
  Nick

Comment 2 Nick Clifton 2023-01-23 15:31:12 UTC
Hi Daniel,

(In reply to Nick Clifton from comment #1)
>   I am currently unable to reproduce this problem :-(
 
By which I mean, I can follow all the steps outlined in the description, but the link does not fail.

However, another question does occur to me:

  $ echo "int main() {return 0;} void crc32_z(void) {}" > zlib-omptarget-clash.c
  $ clang -flto -fopenmp -fopenmp-targets=x86_64-pc-linux-gnu zlib-omptarget-clash.c -o zlib-omptarget-clash

With these two commands you are creating a program that calls a zlib function, but you are not explicitly linking in the zlib library.  Why do you expect it to link correctly ? 

Sure in the past the zlib library has been brought in by the libomptarget, but you should not rely upon this.  If you use the zlib library, you should include it on the link command line.

I suspect that this might be a case of the linker giving you a valid error message...  

Cheers
  Nick

Comment 3 Daniel Woodworth 2023-01-23 17:00:17 UTC
I'm sorry for the squished source example—what it's doing is defining a function with the same name as one from the zlib library and _not_ calling it:

int main() {
  return 0;
}

void crc32_z(void) {
}

The zlib library was previously _not_ brought in by libomptarget, but the bug is exposed by a change which does bring it in in recent versions of LLVM.

I am also seeing the problem with ld.bfd:

$ clang -flto -fuse-ld=bfd -fopenmp -fopenmp-targets=x86_64-pc-linux-gnu zlib-omptarget-clash.c -o zlib-omptarget-clash
/usr/bin/ld.bfd: /tmp/zlib-omptarget-clash-a99f51.o (symbol from plugin): undefined reference to symbol 'crc32_z@@ZLIB_1.2.9'
/usr/bin/ld.bfd: /usr/lib64/libz.so.1: error adding symbols: DSO missing from command line
/iusers/dwoodwor/rhel9-lto-omp-zlib-repro/llvm-project-llvmorg-15.0.0/deploy/bin/clang-linker-wrapper: error: 'ld.bfd' failed
clang-15: error: linker command failed with exit code 1 (use -v to see invocation)

I'm not sure how to upgrade to 2.35.2-37.el9 binutils; I have been able to get it to upgrade to 2.35.2-24.el9, but I'm still seeing the bug there. I tried following your link but am having problems connecting to brewweb.engineering.redhat.com; I can try again later in case it's just temporarily down. It sounds like it probably is fixed in the latest package version if you can't reproduce it there, but I should double-check it in my environment to make sure there isn't something else going on.

Comment 4 Daniel Woodworth 2023-01-23 21:46:02 UTC
I'm still not able to access https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=2340527; is it an internal RedHat link?

Comment 5 Nick Clifton 2023-01-24 10:13:59 UTC
(In reply to Daniel Woodworth from comment #3)
Hi Daniel,

> I'm sorry for the squished source example—what it's doing is defining a
> function with the same name as one from the zlib library and _not_ calling
> it:

Ah - sorry - I did misread the example.


> I am also seeing the problem with ld.bfd:

Hmm, interesting.

> I'm not sure how to upgrade to 2.35.2-37.el9 binutils; I have been able to
> get it to upgrade to 2.35.2-24.el9, but I'm still seeing the bug there.

Hmm, OK, I will see if I can reproduce the problem with the -24.el9 release...


> I tried following your link but am having problems connecting to
> brewweb.engineering.redhat.com;

Sorry about that.  It is an internal Red Hat site.  I just assumed that you
would have access to it.  My bad.  The -37.el9 build might be accessible now, 
as it has finally passed through gating, but if you want I could just upload
the rpms...

Cheers
  Nick

Comment 6 Florian Weimer 2023-01-24 10:20:29 UTC
Bug 1896772 comment 1 describes the same issue.

As an additional data point, do you expect interposition to happen in your case? Or should the definition of crc32_z remain private to the main program?

Comment 7 Nick Clifton 2023-01-24 10:33:08 UTC
Hi Daniel,

  So I downloaded the -24.el9 binutils rpm and unpacked it locally:

    % /home/nickc/Downloads/usr/bin/ld.bfd --version
    GNU ld version 2.35.2-24.el9

  But when I use it to reproduce the issue:

    % clang -flto -fopenmp -fopenmp-targets=x86_64-pc-linux-gnu zlib-omptarget-clash.c -o zlib-omptarget-clash -fuse-ld=/home/nickc/Downloads/usr/bin/ld.bfd
    %
    
  (ie successful compilation and link)
   
  And adding "-v" to the command line to make sure that I am using the correct versions:

   % clang -flto -fopenmp -fopenmp-targets=x86_64-pc-linux-gnu zlib-omptarget-clash.c -o zlib-omptarget-clash -fuse-ld=/home/nickc/Downloads/usr/bin/ld.bfd -v
   
    clang version 15.0.0
    Target: x86_64-unknown-linux-gnu
    [...]
    "/home/nickc/Downloads/usr/bin/ld.bfd" -pie --hash-style=gnu --eh-frame-hdr -m elf_x86_64 -dynamic-linker /lib64/ld-linux-x86-64.so.2 -o zlib-omptarget-clash /dev/shm/zlib-omptarget-clash-wrapper-72a801.o /usr/lib/gcc/x86_64-redhat-linux/12/../../../../lib64/Scrt1.o /usr/lib/gcc/x86_64-redhat-linux/12/../../../../lib64/crti.o /usr/lib/gcc/x86_64-redhat-linux/12/crtbeginS.o -L /usr/lib/gcc/x86_64-redhat-linux/12 -L /usr/lib/gcc/x86_64-redhat-linux/12/../../../../lib64 -L /lib/../lib64 -L /usr/lib/../lib64 -L /lib -L /usr/lib -plugin /home/nickc/work/sources/llvm/15.0/llvm-project-llvmorg-15.0.0/deploy/bin/../lib/LLVMgold.so -plugin-opt=mcpu=x86-64 /dev/shm/zlib-omptarget-clash-3cf022.o -l omp -l omptarget -rpath /home/nickc/work/sources/llvm/15.0/llvm-project-llvmorg-15.0.0/deploy/lib -L /home/nickc/work/sources/llvm/15.0/llvm-project-llvmorg-15.0.0/deploy/lib -l gcc --as-needed -l gcc_s --no-as-needed -l pthread -l c -l gcc --as-needed -l gcc_s --no-as-needed /usr/lib/gcc/x86_64-redhat-linux/12/crtendS.o /usr/lib/gcc/x86_64-redhat-linux/12/../../../../lib64/crtn.o

  So right compiler, right linker, right plugin.  What else could be different between your environment and mine ?

  I assume that you are running these tests on an x86_64 box right ?
  
  It must be the libraries, or the crt files.  I am running these tests on an x86_64 box with Fedora 36 installed.  I will try setting up a RHEL-9 mock environment and testing there.

Cheers
  Nick

Comment 8 Nick Clifton 2023-01-24 11:14:43 UTC
Hi Daniel,

  Sorry, even in a mock RHEL-9.0 environment with binutils-2.35.2-17.el9 installed I still cannot reproduce this problem. :-(  I tried both ld.bfd and ld.gold and both work.

  I am a bit stuck now.  Any ideas as to what could be different between your test environment and mine ?

Cheers
  Nick

Comment 9 Daniel Woodworth 2023-01-24 21:05:28 UTC
(In reply to Florian Weimer from comment #6)
> As an additional data point, do you expect interposition to happen in your
> case? Or should the definition of crc32_z remain private to the main program?

The original use case is a benchmark which builds with its own copy of the zlib sources instead of linking against the system zlib, I think mainly for reproducibility. crc32_z remaining private to the main program makes more sense for this, but if none of the other libraries the benchmark uses actually call into zlib it might not make a difference whether there's interposition or not.

(In reply to Nick Clifton from comment #7)
>   I assume that you are running these tests on an x86_64 box right ?

Yes, this is on x86_64.

(In reply to Nick Clifton from comment #8)
>   I am a bit stuck now.  Any ideas as to what could be different between
> your test environment and mine ?

My test environment is a RHEL-9.0 container running (via podman) on a RHEL-8.4 host. I didn't think that should make a big difference here, but I was able to hunt down a machine running RHEL-9.0 directly with the same binutils (and zlib and LLVM) version and that machine does not show this bug. Since you're also unable to reproduce it on your RHEL-9.0 environment, I think this problem might be specific to this container setup or even the specific container image. I'll see if I can figure out what's different between these two environments that causes the error.

Comment 10 Daniel Woodworth 2023-01-25 23:23:50 UTC
I've been able to also reproduce this bug running the RHEL-9.0 container on the RHEL-9.0 system, and I was not able to reproduce it with a clean container image from https://catalog.redhat.com/software/containers/ubi9/ubi/615bcf606feffc5384e8452e?container-tabs=gti. It looks like this problem is specific to the internal RHEL-9.0 image I'm using and not a problem in RHEL-9.0 in general, so I'll close this bug and focus on getting that image fixed instead. Thanks for all the help triaging this!

Also, I've realized I was mistaken and was not using the gold linker after all, and it seems like it is actually unaffected. This bug seems to actually be bfd-specific; I'm also updating the title and description accordingly.


Note You need to log in before you can comment on or make changes to this bug.