Bug 1417663 - invalid pointer running GLX program with llvmpipe
invalid pointer running GLX program with llvmpipe
Status: ON_QA
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: llvm-private (Show other bugs)
7.6
x86_64 Linux
unspecified Severity unspecified
: rc
: ---
Assigned To: Tom Stellard
qe-baseos-tools
:
Depends On:
Blocks: 1565233
  Show dependency treegraph
 
Reported: 2017-01-30 10:20 EST by Guilherme Quentel Melo
Modified: 2018-06-07 15:46 EDT (History)
7 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
llvm-private no longer statically links against libstdc++
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Output of the crashed program (20.12 KB, text/plain)
2017-01-30 10:20 EST, Guilherme Quentel Melo
no flags Details
GLX program to reproduce the crash (3.15 KB, text/plain)
2017-01-30 10:24 EST, Guilherme Quentel Melo
no flags Details
Valgrind output (6.63 KB, text/plain)
2017-01-30 10:25 EST, Guilherme Quentel Melo
no flags Details
gdb backtrace (3.79 KB, text/plain)
2017-01-30 10:26 EST, Guilherme Quentel Melo
no flags Details
gdb full backtrace (34.49 KB, text/plain)
2017-01-30 10:26 EST, Guilherme Quentel Melo
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
CentOS 0012735 None None None 2017-01-30 10:20 EST

  None (edit)
Description Guilherme Quentel Melo 2017-01-30 10:20:38 EST
Created attachment 1245912 [details]
Output of the crashed program

Description of problem:

When running a GLX program using Mesa's llvmpipe renderer and a newer libstdc++.so.6, it crashes with *** Error in `./quad': free(): invalid pointer: 0x.... *** (full output attached)

Version-Release number of selected component (if applicable): 3.8.1-1.el7


How reproducible: always


Steps to Reproduce:
1. Compile a newer gcc (5.2.0 up to 6.2.0 presented the same behaviour)
2. Compile quad.c program with gcc quad.c -o quad -lX11 -lGL -lGLU
3. Run the program with LIBGL_ALWAYS_SOFTWARE=1 LD_LIBRARY_PATH=<PATH_TO_NEWER_LIBSTDC++> ./quad

Actual results:

Crashes with *** Error in `./quad': free(): invalid pointer: 0x.... ***


Expected results:

A quad is rendered on a window.


Additional info:

- The crash does not happen on CentOS 6, Ubuntu 16.04/16.10, SUSE11. A very suspicious difference between CentOS 7 and those distributions is that on CentOS7 the mesa-private-llvm package is built using the -static-libstdc++ flag.

- After rebuilding mesa-private-llvm package and removing -static-libstdc++ from llvm.spec the crash stops happening.

- Also the crash does not happens on CentOS 7 if using accelerated OpenGL (unset LIBGL_ALWAYS_SOFTWARE). 

- valgrind output shows a few "Invalid free() / delete / delete[] / realloc()"
Comment 1 Guilherme Quentel Melo 2017-01-30 10:23:19 EST
I noticed that libstdc++ symbols are indeed exported by libLLVM-mesa.so.

For example std::string::replace function (which caused one of the invalid free) is defined both in libLLVM-mesa.so and my newer libstdc++:

$ objdump -T /usr/lib64/libLLVM-3.8.1-mesa.so | grep _ZNSs7replaceEmmPKcm
0000000001246c00 w DF .text 00000000000001c1 Base _ZNSs7replaceEmmPKcm



$ objdump -T <PATH_TO_NEWER_LIBSTDC++>/libstdc++.so | grep _ZNSs7replaceEmmPKcm
00000000000d2a70 w DF .text 00000000000001eb GLIBCXX_3.4 _ZNSs7replaceEmmPKcm


So, it does not seem to be a good idea for a system library be static linked against libstdc++ as this duplicates symbols and lead to undefined behaviour for C++ applications using a newer libstdc++.
As libstdc++ is intended to be backward compatible, I would expect that I can always replace the system's libstdc++ with my own newer version. And others developers trying to build portable binaries expect that too, e.g. Steam from Valve.
But that is not possible if some system libraries are static linked against it.
Comment 2 Guilherme Quentel Melo 2017-01-30 10:24 EST
Created attachment 1245913 [details]
GLX program to reproduce the crash
Comment 3 Guilherme Quentel Melo 2017-01-30 10:25 EST
Created attachment 1245915 [details]
Valgrind output
Comment 4 Guilherme Quentel Melo 2017-01-30 10:26 EST
Created attachment 1245916 [details]
gdb backtrace
Comment 5 Guilherme Quentel Melo 2017-01-30 10:26 EST
Created attachment 1245917 [details]
gdb full backtrace
Comment 7 Adam Jackson 2017-03-16 12:20:07 EDT
(In reply to Guilherme Quentel Melo from comment #1)

> As libstdc++ is intended to be backward compatible, I would expect that I
> can always replace the system's libstdc++ with my own newer version. And
> others developers trying to build portable binaries expect that too, e.g.
> Steam from Valve.

Steam is actually one of the major reasons why we link libstdc++ statically. _We_ supply the driver, which is linked against the system version of libstdc++, but the steam app often embeds its own (older) copy of libstdc++. Since the driver is loaded dynamically, the old version is the one in memory, so any newer C++ symbols used by the driver will fail to resolve.
Comment 8 Guilherme Quentel Melo 2017-03-16 12:57:37 EDT
Thanks for the reply Adam.

Ok, but even Steam could have problems with this current configuration depending on the driver used. In my case I only got the crash when using llvmpipe, but could happen with other drivers as well

As shown by gdb and valgrind, mesa is mixing some functions from the static linked libstdc++ and some from my shipped libstdc++. Also valgrind shows some inlined functions from basic_string.h from the static libstdc++.

This does not seem to be safe and could crash with steam and other applications. Isn't there any way to avoid that when linking libstdc++ statically?
Comment 9 Guilherme Quentel Melo 2017-04-12 20:24:52 EDT
So any other thoughts about this issue?

I'm really stuck here. At work we carefully designed our build system and dependencies to build portable binaries and can run the same executable flawlessly on a wide range of distros with glibc >=2.11.3 (we tested on CentOS/RedHat 6, Ubuntu >=14.04, Debian 8, SUSE 11) but there is no way we can run on CentOS/RedHat 7 (and probably Fedora) because of this issue.

Not even trying to LD_PRELOAD my newer libstdc++ works, as the inline code shown by valgrind continues to be executed.

I didn't want to give up and just tell customers that we only support proprietary drivers on those systems as going back to an old gcc is unfeasible. So please let me know if there is anything else I can help with.
Comment 10 Guilherme Quentel Melo 2017-05-05 15:46:08 EDT
I finally figured out what was happening.

In summary, my libstdc++.so was not defining static symbols as unique. The std lib uses a static variable to represent all empty strings.
In this case there was an empty string representation defined on mesa-private-llvm and another on my libstdc++.

Recompiling my gcc with a more recent binutils solved the problem. So for anyone having similar problems, I would recommend to first check if your libstdc++ is using unique symbols:

  nm -D -C <PATH_TO_LIBSTDC++> | grep std::string::_Rep::_S_empty_rep_storage

should give something like

  0000000000386fa0 u std::string::_Rep::_S_empty_rep_storage


More details in https://gcc.gnu.org/ml/gcc-help/2017-04/msg00062.html
Comment 11 Jonathan Wakely 2017-05-05 18:47:07 EDT
(In reply to Guilherme Quentel Melo from comment #10)
> Recompiling my gcc with a more recent binutils solved the problem.

Thanks for the follow up. Which binutils were you using originally, and what was the more recent version you used that correctly combined the STB_GNU_UNIQUE symbols?
Comment 12 Guilherme Quentel Melo 2017-05-06 11:01:10 EDT
(In reply to Jonathan Wakely from comment #11)
> Thanks for the follow up. Which binutils were you using originally, and what
> was the more recent version you used that correctly combined the
> STB_GNU_UNIQUE symbols?

For this last investigation I built gcc 5.1 first under CentOS 6.8 with the default binutils package (2.20.51). Then I rebuilt, also under CentOS 6, using the latest binutils (2.28). I did not test other versions though.

Something weird is that I had also built gcc 4.8.5 on exactly the same CentOS 6 environment without using the recent binutils and oddly its libstdc++ was defining STB_GNU_UNIQUE symbols. I didn't investigate it further so I'm not sure if I did something different when building 4.8.5.
Comment 13 Guilherme Quentel Melo 2017-05-06 15:41:31 EDT
Just for reference, this is a more detailed explanation on how I've found this out: https://gcc.gnu.org/ml/gcc-help/2017-05/msg00011.html
Comment 14 Guilherme Quentel Melo 2017-05-18 15:09:36 EDT
The previous crash was solved but I'm getting more crashes, now related to std::locale initialization:

https://gcc.gnu.org/ml/gcc-help/2017-05/msg00164.html

But this time I don't see anything I could do to prevent that crash to happen.

Note You need to log in before you can comment on or make changes to this bug.