1417663 – invalid pointer running GLX program with llvmpipe

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1417663 - invalid pointer running GLX program with llvmpipe

Summary: invalid pointer running GLX program with llvmpipe

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	llvm-private
Sub Component:
Version:	7.6
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	unspecified
Target Milestone:	rc
Target Release:	---
Assignee:	Tom Stellard
QA Contact:	Miloš Prchlík
Docs Contact:	Vladimír Slávik
URL:
Whiteboard:
Depends On:
Blocks:	1565233
TreeView+	depends on / blocked

Reported:	2017-01-30 15:20 UTC by Guilherme Quentel Melo
Modified:	2018-11-19 17:20 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Doc Type:	Release Note
Doc Text:	_llvm-private_ no longer crashes when used together with more recent libstdc++ library versions Previously, executable files in the _llvm-private_ package providing drivers for graphics rendering were linked statically against the `libstdc++` library. As a consequence, running a program using GLX, the Mesa llvmpipe renderer, and a different `libstdc++` version caused an unexpected termination with message about "invalid pointer". _llvm-private_ has been changed and no longer statically links against `libstdc++`. As a result, programs using this driver no longer terminate unexpectedly in this situation.
Clone Of:
Environment:
Last Closed:	2018-10-30 07:52:13 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
Output of the crashed program (20.12 KB, text/plain) 2017-01-30 15:20 UTC, Guilherme Quentel Melo	no flags	Details
GLX program to reproduce the crash (3.15 KB, text/plain) 2017-01-30 15:24 UTC, Guilherme Quentel Melo	no flags	Details
Valgrind output (6.63 KB, text/plain) 2017-01-30 15:25 UTC, Guilherme Quentel Melo	no flags	Details
gdb backtrace (3.79 KB, text/plain) 2017-01-30 15:26 UTC, Guilherme Quentel Melo	no flags	Details
gdb full backtrace (34.49 KB, text/plain) 2017-01-30 15:26 UTC, Guilherme Quentel Melo	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
CentOS	0012735	0	None	None	None	2017-01-30 15:20:38 UTC
Red Hat Product Errata	RHEA-2018:3033	0	None	None	None	2018-10-30 07:52:44 UTC

Description Guilherme Quentel Melo 2017-01-30 15:20:38 UTC

Created attachment 1245912 [details]
Output of the crashed program

Description of problem:

When running a GLX program using Mesa's llvmpipe renderer and a newer libstdc++.so.6, it crashes with *** Error in `./quad': free(): invalid pointer: 0x.... *** (full output attached)

Version-Release number of selected component (if applicable): 3.8.1-1.el7


How reproducible: always


Steps to Reproduce:
1. Compile a newer gcc (5.2.0 up to 6.2.0 presented the same behaviour)
2. Compile quad.c program with gcc quad.c -o quad -lX11 -lGL -lGLU
3. Run the program with LIBGL_ALWAYS_SOFTWARE=1 LD_LIBRARY_PATH=<PATH_TO_NEWER_LIBSTDC++> ./quad

Actual results:

Crashes with *** Error in `./quad': free(): invalid pointer: 0x.... ***


Expected results:

A quad is rendered on a window.


Additional info:

- The crash does not happen on CentOS 6, Ubuntu 16.04/16.10, SUSE11. A very suspicious difference between CentOS 7 and those distributions is that on CentOS7 the mesa-private-llvm package is built using the -static-libstdc++ flag.

- After rebuilding mesa-private-llvm package and removing -static-libstdc++ from llvm.spec the crash stops happening.

- Also the crash does not happens on CentOS 7 if using accelerated OpenGL (unset LIBGL_ALWAYS_SOFTWARE). 

- valgrind output shows a few "Invalid free() / delete / delete[] / realloc()"

Comment 1 Guilherme Quentel Melo 2017-01-30 15:23:19 UTC

I noticed that libstdc++ symbols are indeed exported by libLLVM-mesa.so.

For example std::string::replace function (which caused one of the invalid free) is defined both in libLLVM-mesa.so and my newer libstdc++:

$ objdump -T /usr/lib64/libLLVM-3.8.1-mesa.so | grep _ZNSs7replaceEmmPKcm
0000000001246c00 w DF .text 00000000000001c1 Base _ZNSs7replaceEmmPKcm



$ objdump -T <PATH_TO_NEWER_LIBSTDC++>/libstdc++.so | grep _ZNSs7replaceEmmPKcm
00000000000d2a70 w DF .text 00000000000001eb GLIBCXX_3.4 _ZNSs7replaceEmmPKcm


So, it does not seem to be a good idea for a system library be static linked against libstdc++ as this duplicates symbols and lead to undefined behaviour for C++ applications using a newer libstdc++.
As libstdc++ is intended to be backward compatible, I would expect that I can always replace the system's libstdc++ with my own newer version. And others developers trying to build portable binaries expect that too, e.g. Steam from Valve.
But that is not possible if some system libraries are static linked against it.

Comment 2 Guilherme Quentel Melo 2017-01-30 15:24:51 UTC

Created attachment 1245913 [details]
GLX program to reproduce the crash

Comment 3 Guilherme Quentel Melo 2017-01-30 15:25:38 UTC

Created attachment 1245915 [details]
Valgrind output

Comment 4 Guilherme Quentel Melo 2017-01-30 15:26:10 UTC

Created attachment 1245916 [details]
gdb backtrace

Comment 5 Guilherme Quentel Melo 2017-01-30 15:26:31 UTC

Created attachment 1245917 [details]
gdb full backtrace

Comment 7 Adam Jackson 2017-03-16 16:20:07 UTC

(In reply to Guilherme Quentel Melo from comment #1)

> As libstdc++ is intended to be backward compatible, I would expect that I
> can always replace the system's libstdc++ with my own newer version. And
> others developers trying to build portable binaries expect that too, e.g.
> Steam from Valve.

Steam is actually one of the major reasons why we link libstdc++ statically. _We_ supply the driver, which is linked against the system version of libstdc++, but the steam app often embeds its own (older) copy of libstdc++. Since the driver is loaded dynamically, the old version is the one in memory, so any newer C++ symbols used by the driver will fail to resolve.

Comment 8 Guilherme Quentel Melo 2017-03-16 16:57:37 UTC

Thanks for the reply Adam.

Ok, but even Steam could have problems with this current configuration depending on the driver used. In my case I only got the crash when using llvmpipe, but could happen with other drivers as well

As shown by gdb and valgrind, mesa is mixing some functions from the static linked libstdc++ and some from my shipped libstdc++. Also valgrind shows some inlined functions from basic_string.h from the static libstdc++.

This does not seem to be safe and could crash with steam and other applications. Isn't there any way to avoid that when linking libstdc++ statically?

Comment 9 Guilherme Quentel Melo 2017-04-13 00:24:52 UTC

So any other thoughts about this issue?

I'm really stuck here. At work we carefully designed our build system and dependencies to build portable binaries and can run the same executable flawlessly on a wide range of distros with glibc >=2.11.3 (we tested on CentOS/RedHat 6, Ubuntu >=14.04, Debian 8, SUSE 11) but there is no way we can run on CentOS/RedHat 7 (and probably Fedora) because of this issue.

Not even trying to LD_PRELOAD my newer libstdc++ works, as the inline code shown by valgrind continues to be executed.

I didn't want to give up and just tell customers that we only support proprietary drivers on those systems as going back to an old gcc is unfeasible. So please let me know if there is anything else I can help with.

Comment 10 Guilherme Quentel Melo 2017-05-05 19:46:08 UTC

I finally figured out what was happening.

In summary, my libstdc++.so was not defining static symbols as unique. The std lib uses a static variable to represent all empty strings.
In this case there was an empty string representation defined on mesa-private-llvm and another on my libstdc++.

Recompiling my gcc with a more recent binutils solved the problem. So for anyone having similar problems, I would recommend to first check if your libstdc++ is using unique symbols:

  nm -D -C <PATH_TO_LIBSTDC++> | grep std::string::_Rep::_S_empty_rep_storage

should give something like

  0000000000386fa0 u std::string::_Rep::_S_empty_rep_storage


More details in https://gcc.gnu.org/ml/gcc-help/2017-04/msg00062.html

Comment 11 Jonathan Wakely 2017-05-05 22:47:07 UTC

(In reply to Guilherme Quentel Melo from comment #10)
> Recompiling my gcc with a more recent binutils solved the problem.

Thanks for the follow up. Which binutils were you using originally, and what was the more recent version you used that correctly combined the STB_GNU_UNIQUE symbols?

Comment 12 Guilherme Quentel Melo 2017-05-06 15:01:10 UTC

(In reply to Jonathan Wakely from comment #11)
> Thanks for the follow up. Which binutils were you using originally, and what
> was the more recent version you used that correctly combined the
> STB_GNU_UNIQUE symbols?

For this last investigation I built gcc 5.1 first under CentOS 6.8 with the default binutils package (2.20.51). Then I rebuilt, also under CentOS 6, using the latest binutils (2.28). I did not test other versions though.

Something weird is that I had also built gcc 4.8.5 on exactly the same CentOS 6 environment without using the recent binutils and oddly its libstdc++ was defining STB_GNU_UNIQUE symbols. I didn't investigate it further so I'm not sure if I did something different when building 4.8.5.

Comment 13 Guilherme Quentel Melo 2017-05-06 19:41:31 UTC

Just for reference, this is a more detailed explanation on how I've found this out: https://gcc.gnu.org/ml/gcc-help/2017-05/msg00011.html

Comment 14 Guilherme Quentel Melo 2017-05-18 19:09:36 UTC

The previous crash was solved but I'm getting more crashes, now related to std::locale initialization:

https://gcc.gnu.org/ml/gcc-help/2017-05/msg00164.html

But this time I don't see anything I could do to prevent that crash to happen.

Comment 16 Miloš Prchlík 2018-09-10 08:29:06 UTC

Verified with build llvm-private-6.0.1-2.el7.

[root@lenovo-rd230-03 ~]# LIBGL_ALWAYS_SOFTWARE=1 LD_LIBRARY_PATH=usr/lib64 ./quad                                                                                                            Press any key to continue

libGL: OpenDriver: trying /usr/lib64/dri/tls/swrast_dri.so
libGL: OpenDriver: trying /usr/lib64/dri/swrast_dri.so
libGL: Can't open configuration file /root/.drirc: No such file or directory.
libGL: Can't open configuration file /root/.drirc: No such file or directory.

        visual 0x121 selected
GL_VENDOR: VMware, Inc.
GL_RENDERER: llvmpipe (LLVM 6.0, 128 bits)
GL_VERSION: 2.1 Mesa 17.2.3
GL_SHADING_LANGUAGE_VERSION: 1.30
GL_EXTENSIONS: 1.30
Press Enter to finish

[root@lenovo-rd230-03 ~]# 

It took some amount of manual work to it all working, but finally a nice colorful picture appeared in the separate window.

Comment 19 errata-xmlrpc 2018-10-30 07:52:13 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:3033

Note You need to log in before you can comment on or make changes to this bug.