_llvm-private_ no longer crashes when used together with more recent libstdc++ library versions
Previously, executable files in the _llvm-private_ package providing drivers for graphics rendering were linked statically against the `libstdc++` library. As a consequence, running a program using GLX, the Mesa *llvmpipe* renderer, and a different `libstdc++` version caused an unexpected termination with message about "invalid pointer". _llvm-private_ has been changed and no longer statically links against `libstdc++`. As a result, programs using this driver no longer terminate unexpectedly in this situation.
Created attachment 1245912 [details]
Output of the crashed program
Description of problem:
When running a GLX program using Mesa's llvmpipe renderer and a newer libstdc++.so.6, it crashes with *** Error in `./quad': free(): invalid pointer: 0x.... *** (full output attached)
Version-Release number of selected component (if applicable): 3.8.1-1.el7
How reproducible: always
Steps to Reproduce:
1. Compile a newer gcc (5.2.0 up to 6.2.0 presented the same behaviour)
2. Compile quad.c program with gcc quad.c -o quad -lX11 -lGL -lGLU
3. Run the program with LIBGL_ALWAYS_SOFTWARE=1 LD_LIBRARY_PATH=<PATH_TO_NEWER_LIBSTDC++> ./quad
Crashes with *** Error in `./quad': free(): invalid pointer: 0x.... ***
A quad is rendered on a window.
- The crash does not happen on CentOS 6, Ubuntu 16.04/16.10, SUSE11. A very suspicious difference between CentOS 7 and those distributions is that on CentOS7 the mesa-private-llvm package is built using the -static-libstdc++ flag.
- After rebuilding mesa-private-llvm package and removing -static-libstdc++ from llvm.spec the crash stops happening.
- Also the crash does not happens on CentOS 7 if using accelerated OpenGL (unset LIBGL_ALWAYS_SOFTWARE).
- valgrind output shows a few "Invalid free() / delete / delete / realloc()"
I noticed that libstdc++ symbols are indeed exported by libLLVM-mesa.so.
For example std::string::replace function (which caused one of the invalid free) is defined both in libLLVM-mesa.so and my newer libstdc++:
$ objdump -T /usr/lib64/libLLVM-3.8.1-mesa.so | grep _ZNSs7replaceEmmPKcm
0000000001246c00 w DF .text 00000000000001c1 Base _ZNSs7replaceEmmPKcm
$ objdump -T <PATH_TO_NEWER_LIBSTDC++>/libstdc++.so | grep _ZNSs7replaceEmmPKcm
00000000000d2a70 w DF .text 00000000000001eb GLIBCXX_3.4 _ZNSs7replaceEmmPKcm
So, it does not seem to be a good idea for a system library be static linked against libstdc++ as this duplicates symbols and lead to undefined behaviour for C++ applications using a newer libstdc++.
As libstdc++ is intended to be backward compatible, I would expect that I can always replace the system's libstdc++ with my own newer version. And others developers trying to build portable binaries expect that too, e.g. Steam from Valve.
But that is not possible if some system libraries are static linked against it.
Created attachment 1245913 [details]
GLX program to reproduce the crash
Created attachment 1245915 [details]
Created attachment 1245916 [details]
Created attachment 1245917 [details]
gdb full backtrace
(In reply to Guilherme Quentel Melo from comment #1)
> As libstdc++ is intended to be backward compatible, I would expect that I
> can always replace the system's libstdc++ with my own newer version. And
> others developers trying to build portable binaries expect that too, e.g.
> Steam from Valve.
Steam is actually one of the major reasons why we link libstdc++ statically. _We_ supply the driver, which is linked against the system version of libstdc++, but the steam app often embeds its own (older) copy of libstdc++. Since the driver is loaded dynamically, the old version is the one in memory, so any newer C++ symbols used by the driver will fail to resolve.
Thanks for the reply Adam.
Ok, but even Steam could have problems with this current configuration depending on the driver used. In my case I only got the crash when using llvmpipe, but could happen with other drivers as well
As shown by gdb and valgrind, mesa is mixing some functions from the static linked libstdc++ and some from my shipped libstdc++. Also valgrind shows some inlined functions from basic_string.h from the static libstdc++.
This does not seem to be safe and could crash with steam and other applications. Isn't there any way to avoid that when linking libstdc++ statically?
So any other thoughts about this issue?
I'm really stuck here. At work we carefully designed our build system and dependencies to build portable binaries and can run the same executable flawlessly on a wide range of distros with glibc >=2.11.3 (we tested on CentOS/RedHat 6, Ubuntu >=14.04, Debian 8, SUSE 11) but there is no way we can run on CentOS/RedHat 7 (and probably Fedora) because of this issue.
Not even trying to LD_PRELOAD my newer libstdc++ works, as the inline code shown by valgrind continues to be executed.
I didn't want to give up and just tell customers that we only support proprietary drivers on those systems as going back to an old gcc is unfeasible. So please let me know if there is anything else I can help with.
I finally figured out what was happening.
In summary, my libstdc++.so was not defining static symbols as unique. The std lib uses a static variable to represent all empty strings.
In this case there was an empty string representation defined on mesa-private-llvm and another on my libstdc++.
Recompiling my gcc with a more recent binutils solved the problem. So for anyone having similar problems, I would recommend to first check if your libstdc++ is using unique symbols:
nm -D -C <PATH_TO_LIBSTDC++> | grep std::string::_Rep::_S_empty_rep_storage
should give something like
0000000000386fa0 u std::string::_Rep::_S_empty_rep_storage
More details in https://gcc.gnu.org/ml/gcc-help/2017-04/msg00062.html
(In reply to Guilherme Quentel Melo from comment #10)
> Recompiling my gcc with a more recent binutils solved the problem.
Thanks for the follow up. Which binutils were you using originally, and what was the more recent version you used that correctly combined the STB_GNU_UNIQUE symbols?
(In reply to Jonathan Wakely from comment #11)
> Thanks for the follow up. Which binutils were you using originally, and what
> was the more recent version you used that correctly combined the
> STB_GNU_UNIQUE symbols?
For this last investigation I built gcc 5.1 first under CentOS 6.8 with the default binutils package (2.20.51). Then I rebuilt, also under CentOS 6, using the latest binutils (2.28). I did not test other versions though.
Something weird is that I had also built gcc 4.8.5 on exactly the same CentOS 6 environment without using the recent binutils and oddly its libstdc++ was defining STB_GNU_UNIQUE symbols. I didn't investigate it further so I'm not sure if I did something different when building 4.8.5.
Just for reference, this is a more detailed explanation on how I've found this out: https://gcc.gnu.org/ml/gcc-help/2017-05/msg00011.html
The previous crash was solved but I'm getting more crashes, now related to std::locale initialization:
But this time I don't see anything I could do to prevent that crash to happen.
Verified with build llvm-private-6.0.1-2.el7.
[root@lenovo-rd230-03 ~]# LIBGL_ALWAYS_SOFTWARE=1 LD_LIBRARY_PATH=usr/lib64 ./quad Press any key to continue
libGL: OpenDriver: trying /usr/lib64/dri/tls/swrast_dri.so
libGL: OpenDriver: trying /usr/lib64/dri/swrast_dri.so
libGL: Can't open configuration file /root/.drirc: No such file or directory.
libGL: Can't open configuration file /root/.drirc: No such file or directory.
visual 0x121 selected
GL_VENDOR: VMware, Inc.
GL_RENDERER: llvmpipe (LLVM 6.0, 128 bits)
GL_VERSION: 2.1 Mesa 17.2.3
Press Enter to finish
It took some amount of manual work to it all working, but finally a nice colorful picture appeared in the separate window.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.