Bug 1124987 - Fix static TLS usage in Fedora shared libraries.
Summary: Fix static TLS usage in Fedora shared libraries.
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: calibre
Version: 21
Hardware: x86_64
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Kevin Fenzi
QA Contact: Fedora Extras Quality Assurance
URL: https://retrace.fedoraproject.org/faf...
Whiteboard: abrt_hash:d0494b0cf2c91b36c2d2240a154...
: 1133843 (view as bug list)
Depends On:
Blocks: 1218319
TreeView+ depends on / blocked
 
Reported: 2014-07-30 19:22 UTC by Adam Williamson
Modified: 2015-05-04 15:05 UTC (History)
16 users (show)

Fixed In Version: glibc-2.19.90-36.fc21
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1218319 (view as bug list)
Environment:
Last Closed: 2014-09-11 02:44:15 UTC


Attachments (Terms of Use)
File: backtrace (7.89 KB, text/plain)
2014-07-30 19:22 UTC, Adam Williamson
no flags Details
File: environ (1.72 KB, text/plain)
2014-07-30 19:22 UTC, Adam Williamson
no flags Details
LD_DEBUG log from crash (requested by carlos) (4.28 MB, application/x-xz)
2014-08-08 12:41 UTC, Adam Williamson
no flags Details
readelf -a -W of libraries that calibre opens (1.33 MB, text/plain)
2014-08-18 21:50 UTC, Mikko Tiihonen
no flags Details
readelf -a -W of libraries that calibre opens (6.25 MB, application/x-gzip)
2014-08-18 21:56 UTC, Mikko Tiihonen
no flags Details
Patch for F-20 package (3.54 KB, patch)
2015-01-29 10:20 UTC, Tim Niemueller
no flags Details | Diff


Links
System ID Priority Status Summary Last Updated
Red Hat Bugzilla 1133843 None None None Never

Internal Links: 1133843

Description Adam Williamson 2014-07-30 19:22:15 UTC
Description of problem:
Crashes on startup in current F21.

Version-Release number of selected component:
calibre-1.46.0-1.fc21

Additional info:
reporter:       libreport-2.2.3
cmdline:        python2 /usr/bin/calibre --detach
executable:     /usr/bin/calibre
kernel:         3.16.0-0.rc6.git2.2.fc22.x86_64
runlevel:       N 5
type:           Python
uid:            1001

Truncated backtrace:
#1 <module> in /usr/lib64/calibre/calibre/utils/magick/__init__.py:14
#2 <module> in /usr/lib64/calibre/calibre/db/backend.py:31
#3 <module> in /usr/lib64/calibre/calibre/db/legacy.py:17
#4 <module> in /usr/lib64/calibre/calibre/gui2/ui.py:26
#5 run_gui in /usr/lib64/calibre/calibre/gui2/main.py:320
#6 main in /usr/lib64/calibre/calibre/gui2/main.py:458
#7 <module> in /usr/bin/calibre:20

Comment 1 Adam Williamson 2014-07-30 19:22:18 UTC
Created attachment 922690 [details]
File: backtrace

Comment 2 Adam Williamson 2014-07-30 19:22:19 UTC
Created attachment 922691 [details]
File: environ

Comment 3 Colin Walters 2014-08-06 13:17:25 UTC
I suspect this is a glibc change.  We're seeing the same error message in an Anaconda environment from trying to dlopen(libgtk.so).

Comment 4 Adam Williamson 2014-08-06 13:20:12 UTC
So, let's CC the glibc maintainer (at least, the person to whom glibc bugs appear currently to be assigned).

Comment 5 Adam Williamson 2014-08-08 12:41:54 UTC
Created attachment 925164 [details]
LD_DEBUG log from crash (requested by carlos)

Comment 6 Carlos O'Donell 2014-08-08 12:51:31 UTC
Please get the list of all shared libraries loaded by python then run `readelf -a -W` against all of them and dump that to a file. I want to know which if the libraries is using static TLS.

The error you are seeing is not a bug in glibc. The dynamic loader has a fixed amount of "super fast" static TLS for use by core libraries which are always loaded and can use this static TLS. Normal libraries should not be using static TLS and you should not be running out of static TLS, those loaded libraries should be using dynamic TLS which has no limits (not quite as fast as static TLS but still fast).

Comment 7 Colin Walters 2014-08-08 13:30:38 UTC
This is a good post on the issue:

http://stackoverflow.com/questions/19268293/matlab-error-cannot-open-with-static-tls

Comment 8 Kevin Fenzi 2014-08-08 13:32:52 UTC
"Please get the list of all shared libraries loaded by python then run `readelf -a -W` against all of them and dump that to a file."

Is there some easy way to get this list without parsing the strace output?

Comment 9 Adam Williamson 2014-08-08 19:26:34 UTC
I thought it might be in the abrt data, but I don't see it in there at least in an obvious way. I can do the parsing, I just didn't want to embarrass myself doing it in my monkey way in front of carlos :P when I have some privacy to pore over the 'cut' manpages without anyone seeing, I'll get it done...

Comment 10 Carlos O'Donell 2014-08-09 07:20:38 UTC
(In reply to Adam Williamson (Red Hat) from comment #9)
> I thought it might be in the abrt data, but I don't see it in there at least
> in an obvious way. I can do the parsing, I just didn't want to embarrass
> myself doing it in my monkey way in front of carlos :P when I have some
> privacy to pore over the 'cut' manpages without anyone seeing, I'll get it
> done...

The abrt data should have a loaded library list. For example /proc/self/maps should have been dumped and provided? That will contain the full list of DSOs.

Comment 11 Petr Schindler 2014-08-13 06:01:33 UTC
Another user experienced a similar problem:

During the first start I left the default values and clicked on next button then application froze. Every other start of application ends with traceback.

reporter:       libreport-2.2.3
cmdline:        python2 /usr/bin/calibre --detach
executable:     /usr/bin/calibre
kernel:         3.16.0-1.fc21.x86_64
package:        calibre-1.46.0-3.fc21
reason:         __init__.py:14:<module>:RuntimeError: Failed to load ImageMagick: dlopen: cannot load any more object with static TLS
runlevel:       N 5
type:           Python
uid:            1000

Comment 12 Mikko Tiihonen 2014-08-18 21:50:14 UTC
Created attachment 928076 [details]
readelf -a -W of libraries that calibre opens

strace -o /tmp/strace -f calibre

fgrep '.so"' /tmp/strace | grep ' open(' | grep -v ENOENT | cut -d'"' -f2 | sort -u > /tmp/strace.so.list.txt

for i in $(cat /tmp/strace.so.list.txt) ; do echo "readelf -a -W $i" >> /tmp/readelf.txt; readelf -a -W $i >> /tmp/readelf.txt; done

Comment 13 Mikko Tiihonen 2014-08-18 21:56:02 UTC
Created attachment 928078 [details]
readelf -a -W of libraries that calibre opens

Again with missing libraries added (sorry for the spam):

strace -o /tmp/strace -f calibre

fgrep '.so' /tmp/strace | grep ' open(' | grep -v ENOENT | cut -d'"' -f2 | sort -u > /tmp/strace.so.list.txt

for i in $(cat /tmp/strace.so.list.txt) ; do echo "readelf -a -W $i" >> /tmp/readelf.txt; readelf -a -W $i >> /tmp/readelf.txt; done

Comment 14 Mikko Tiihonen 2014-08-19 21:00:56 UTC
cat readelf.txt | egrep 'readelf| TLS ' | grep -v ' 0 TLS ' | grep TLS -B1

384     /lib64/libpixman-1.so.0
272	/lib64/libc.so.6
224     /lib64/libselinux.so.1
108	/lib64/libgomp.so.1
78	/lib64/libuuid.so.1
56	/lib64/libsystemd.so.0
48	/lib64/libstdc++.so.6
32	/lib64/libglapi.so.0
25	/lib64/libcom_err.so.2
8	/lib64/libasound.so.2
8	/lib64/libdw.so.1
8	/lib64/libEGL.so.1
8	/lib64/libGL.so.1
8	/lib64/libQtCore.so.4
4	/lib64/libelf.so.1

= 1271 bytes (is there some additional rounding done by linker?)

After some googling I found out that we need -d flag to readelf to find which modules have static tls. Results are:

cat readelf.txt | egrep 'readelf|STATIC_TLS' | grep TLS -B1 | grep readelf | cut -d/ -f2-
272	/lib64/libc.so.6
108	/lib64/libgomp.so.1
32	/lib64/libglapi.so.0
8	/lib64/libEGL.so.1
8	/lib64/libGL.so.1
0	/lib64/libcrypt.so.1
0	/lib64/libm.so.6
0	/lib64/libnsl.so.1
0	/lib64/libnss_files.so.2
0	/lib64/libpthread.so.0
0	/lib64/libresolv.so.2
0	/lib64/librt.so.1
0	/lib64/libutil.so.1

The 0 byte STATIC_TLS usage .so files have references to glibc variables such as so I did not count them.
0 TLS     GLOBAL DEFAULT  UND errno@GLIBC_PRIVATE (4)

If I googled right the .so libraries tagged STATIC_TLS fail to load if their TLS section does not fit into the static section. Others prefer static TLS (and thus use it even if not tagged static).

The actual .so load order (for libraries with TLS section) is:
/lib64/libc.so.6
/lib64/libcom_err.so.2
/lib64/libselinux.so.1
/lib64/libstdc++.so.6
/lib64/libQtCore.so.4
/lib64/libuuid.so.1
/lib64/libasound.so.2
/lib64/libGL.so.1
/lib64/libglapi.so.0
/lib64/libsystemd.so.0
/lib64/libdw.so.1
/lib64/libelf.so.1
/lib64/libpixman-1.so.0
/lib64/libEGL.so.1
/lib64/libgomp.so.1

Looking at the above lists my conclusion is that libgomp.so is the library that fails to load (the python tries to load libMagickCore-6.Q16.so.2, which depends on libgomp). But the libselinux and libpixman have already managed to store their large TLS sections into the static block.

Possible solutions:
1) make the calibre python somehow not load libselinux (could such switch be added to python?)
2) make the calibre load the libgomp/libMagicCore earlier, before libpixman (most likely libselinux is loaded so early that it cannot be avoided)
3) add a new flag RTLD_NO_AUTOMATIC_STATIC_TLS for dlopen function to _not_ use static TLS for libraries unless they request it with STATIC_TLS. And then make python load libraries with the new flag
4) recompile the fedora to have larger static TLS section (the glibc seems to define TLS_STATIC_SURPLUS as 64+DL_NNS*100, where DL_NNS is either 1 or 16)

Comment 15 Carlos O'Donell 2014-08-20 04:01:59 UTC
(In reply to Mikko Tiihonen from comment #14)
> cat readelf.txt | egrep 'readelf|STATIC_TLS' | grep TLS -B1 | grep readelf |
> cut -d/ -f2-
> 272	/lib64/libc.so.6
> 108	/lib64/libgomp.so.1
> 32	/lib64/libglapi.so.0
> 8	/lib64/libEGL.so.1
> 8	/lib64/libGL.so.1
> 0	/lib64/libcrypt.so.1
> 0	/lib64/libm.so.6
> 0	/lib64/libnsl.so.1
> 0	/lib64/libnss_files.so.2
> 0	/lib64/libpthread.so.0
> 0	/lib64/libresolv.so.2
> 0	/lib64/librt.so.1
> 0	/lib64/libutil.so.1

That many libraries should not overflow the reserved static TLS slots in the DTV. That is only 13 slots. We have 14 surplus slots.

The original error is:

"RuntimeError: Failed to load ImageMagick: dlopen: cannot load any more object with static TLS"

Which indicates overflow of the slots not size of the allocated static TLS block itself.

For the record DL_NNS should be 16, so we should hvae 102,400 bytes of static surplus storage.

> The 0 byte STATIC_TLS usage .so files have references to glibc variables
> such as so I did not count them.
> 0 TLS     GLOBAL DEFAULT  UND errno@GLIBC_PRIVATE (4)

These are references to static TLS varaibles in other modules e.g. libc.so.6, and because of those references the entire access type for the module is adjusted to be static TLS. Their size doesn't matter for now.
 
> If I googled right the .so libraries tagged STATIC_TLS fail to load if their
> TLS section does not fit into the static section. Others prefer static TLS
> (and thus use it even if not tagged static).

Libraries don't prefer static TLS, they must be compiled for it.

No shared libraries should be built with static TLS. I'm going to make it my quest to ban anything but the implementation from using static TLS in libraries because it leads to unmaintainable chaos at the distribution level :-(

Maybe a few key libraries might be allowed...

> The actual .so load order (for libraries with TLS section) is:
> /lib64/libc.so.6
> /lib64/libcom_err.so.2
> /lib64/libselinux.so.1
> /lib64/libstdc++.so.6
> /lib64/libQtCore.so.4
> /lib64/libuuid.so.1
> /lib64/libasound.so.2
> /lib64/libGL.so.1
> /lib64/libglapi.so.0
> /lib64/libsystemd.so.0
> /lib64/libdw.so.1
> /lib64/libelf.so.1
> /lib64/libpixman-1.so.0
> /lib64/libEGL.so.1
> /lib64/libgomp.so.1
> 
> Looking at the above lists my conclusion is that libgomp.so is the library
> that fails to load (the python tries to load libMagickCore-6.Q16.so.2, which
> depends on libgomp). But the libselinux and libpixman have already managed
> to store their large TLS sections into the static block.

Neither libselinux nor libpixman have static TLS AFAIK. How did you determine they did?

> Possible solutions:
> 1) make the calibre python somehow not load libselinux (could such switch be
> added to python?)

That's not a solution since libselinux doesn't use static TLS.

> 2) make the calibre load the libgomp/libMagicCore earlier, before libpixman
> (most likely libselinux is loaded so early that it cannot be avoided)

Not an option.

> 3) add a new flag RTLD_NO_AUTOMATIC_STATIC_TLS for dlopen function to _not_
> use static TLS for libraries unless they request it with STATIC_TLS. And
> then make python load libraries with the new flag

You are misunderstanding how static TLS works.

The compiler is either told to use static TLS in which case the DSOs generated code *depends* upon it, and the DSO is marked with the STATIC_TLS dynamic section flag.

Or

The compiler is told not to use static TLS (the default) in which case the DSOs generated code uses a mode that allows it to be loaded fully dynamically.

There is no way to undo static TLS requirements without recompiling the DSO.

> 4) recompile the fedora to have larger static TLS section (the glibc seems
> to define TLS_STATIC_SURPLUS as 64+DL_NNS*100, where DL_NNS is either 1 or
> 16)

Users will continue to build more libraries with static tls until you run out of room.

I think we can increase DTV_SURPLUS and TLS_STATIC_SURPLUS slightly, but we need to better understand which libraries are using it and get them to stop or figure out why they need it.

From this list:

> 272	/lib64/libc.so.6
- OK. May use STATIC_TLS, it's part of the implementation.

> 108	/lib64/libgomp.so.1
- OK. May use STATIC_TLS, it's part of the implementation. Language runtime support for gomp shared across all programs.

> 32	/lib64/libglapi.so.0
> 8	/lib64/libEGL.so.1
> 8	/lib64/libGL.so.1
- They should not use static TLS.

e.g.

[carlos@koi mesa-c40d7d6d948912a4d51cbf8f0854cf2ebe916636]$ grep -r 'initial-exec' *
docs/dispatch.html:    __attribute__((tls_model("initial-exec")));
src/glx/glxcurrent.c:__thread void *__glX_tls_Context __attribute__ ((tls_model("initial-exec")))
src/glx/glxclient.h:   __attribute__ ((tls_model("initial-exec")));
src/egl/main/eglcurrent.c:   __attribute__ ((tls_model("initial-exec")));
src/mesa/drivers/dri/common/dri_test.c:    __attribute__((tls_model("initial-exec")));
src/mesa/drivers/dri/common/dri_test.c:    __attribute__((tls_model("initial-exec")));
src/mapi/u_current.c:    __attribute__((tls_model("initial-exec")))
src/mapi/u_current.c:    __attribute__((tls_model("initial-exec")));
src/mapi/u_current.h:    __attribute__((tls_model("initial-exec")));
src/mapi/u_current.h:    __attribute__((tls_model("initial-exec")));
src/mapi/glapi/glapi.h:    __attribute__((tls_model("initial-exec")));
src/mapi/glapi/glapi.h:    __attribute__((tls_model("initial-exec")));

I know why they are using it. They want speed and force the model.

> 0	/lib64/libcrypt.so.1
> 0	/lib64/libm.so.6
> 0	/lib64/libnsl.so.1
> 0	/lib64/libnss_files.so.2
> 0	/lib64/libpthread.so.0
> 0	/lib64/libresolv.so.2
> 0	/lib64/librt.so.1
> 0	/lib64/libutil.so.1

- OK, all part of the implementation (glibc).

That's only 13 libraries though and we have max counted slots + 14.

I'm going to have to debug this myself to figure out what's wrong.

Maybe it fails as we are loding the Nth library, but doesn't get time to display that information.

Have you tried building a glibc with DTV_SURPLUS increased to 64?

e.g.
diff -urN glibc-2.19-883-g7e54fd0/sysdeps/generic/ldsodefs.h glibc-2.19-883-g7e54fd0.mod/sysdeps/generic/ldsodefs.h
--- glibc-2.19-883-g7e54fd0/sysdeps/generic/ldsodefs.h	2014-08-13 12:24:07.000000000 -0400
+++ glibc-2.19-883-g7e54fd0.mod/sysdeps/generic/ldsodefs.h	2014-08-19 23:52:33.636202348 -0400
@@ -389,7 +389,7 @@
 #define TLS_SLOTINFO_SURPLUS (62)
 
 /* Number of additional slots in the dtv allocated.  */
-#define DTV_SURPLUS	(14)
+#define DTV_SURPLUS	(64)
 
   /* Initial dtv of the main thread, not allocated with normal malloc.  */
   EXTERN void *_dl_initial_dtv;


Does it help? Can you get a trace of all the libraries using STATIC_TLS?

If I built you a scratch glibc would you try it?

Here is a scratch build with DTS_SURPLUS set to 64:
http://koji.fedoraproject.org/koji/taskinfo?taskID=7429595

Comment 16 Mikko Tiihonen 2014-08-20 07:48:53 UTC
Your glibc-2.19.90-34 build allows calibre to start.

And thank you for the long explanation on how things really are. My googling seems to have lead me really far a way from the actual problem :)

Comment 17 Carlos O'Donell 2014-08-20 13:44:07 UTC
(In reply to Mikko Tiihonen from comment #16)
> Your glibc-2.19.90-34 build allows calibre to start.
> 
> And thank you for the long explanation on how things really are. My googling
> seems to have lead me really far a way from the actual problem :)

You are very welcome. I'm happy to explain.

However, please don't walk away now.

What I need from you is help. How many STATIC_TLS libraries did Calibre need?

I bumped the surplus slot allowance up to 64, so obviously less than that.

Can you find them for me so I can figure out who I should talk to?

Comment 18 Mikko Tiihonen 2014-08-20 22:41:29 UTC
I'll be away from the machine with the problem for a week now. But here is what I had time to test with the new glibc:

Assuming my grepping skills are still ok the list of .so files loaded with static_tls is in order:

lib64/libc.so.6
lib64/libpthread.so.0
lib64/libutil.so.1
lib64/libm.so.6
lib64/libc.so.6
lib64/libresolv.so.2
lib64/librt.so.1
lib64/libGL.so.1
lib64/libglapi.so.0
lib64/libnsl.so.1
lib64/libEGL.so.1
lib64/libcrypt.so.1
lib64/libc.so.6
lib64/libc.so.6
lib64/libgomp.so.1
lib64/libnss_files.so.2

Thus 16 times, but only 13 unique. The libc.so.6 is loaded by different pids (threads?) like this:
23705 open("/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
23705 open("/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
23706 open("/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
23709 open("/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3

Does it still use a new slow every time?

Comment 19 Zbigniew Jędrzejewski-Szmek 2014-08-22 00:32:49 UTC
I can confirm that the scratch glibc allows calibre to start. I can downgrade and provide further debug info if required.

Comment 20 Kevin Fenzi 2014-08-23 00:07:40 UTC
So, calibre 2.0.0 came out today. It switches over to qt5 and doesn't have this issue anymore, due to loading different libs. ;( 

So, we could just close this now... but it might also still be worth finding out what libraries are to blame when we hit them with other applications?

Comment 21 Ankur Sinha (FranciscoD) 2014-08-26 09:22:48 UTC
Hi,

I seem to have run into the glibc bug too. I've filed a new bug here:

https://bugzilla.redhat.com/show_bug.cgi?id=1124987

Thanks
Ankur

Comment 22 Carlos O'Donell 2014-08-26 14:28:55 UTC
(In reply to Ankur Sinha (FranciscoD) from comment #21)
> Hi,
> 
> I seem to have run into the glibc bug too. I've filed a new bug here:
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=1124987

For the record it is not a glibc bug. It is a defect in the loaded libraries, they should not require TLS. Fundamentally it's a defect that we allowed developers to build shared libraries with static TLS in the first place.

I will keep debugging this issue here and we'll get to a resolution.

Comment 23 Carlos O'Donell 2014-08-26 14:29:29 UTC
*** Bug 1133843 has been marked as a duplicate of this bug. ***

Comment 24 Ankur Sinha (FranciscoD) 2014-09-03 03:39:30 UTC
Hi Carlos,

Would you have any suggestions on how the vlc error could be fixed? I can get in touch with the package maintainer at rpmfusion and ask them to correct their build flags, for example. 

Thanks,
Ankur

Comment 25 Carlos O'Donell 2014-09-04 02:31:38 UTC
(In reply to Ankur Sinha (FranciscoD) from comment #24)
> Hi Carlos,
> 
> Would you have any suggestions on how the vlc error could be fixed? I can
> get in touch with the package maintainer at rpmfusion and ask them to
> correct their build flags, for example. 

There is nothing vlc can do. I'm going to fix this in glibc to work around the issue.

After an analysis I see ~44 distribution-wide libraries using static TLS, and you can only realy have ~30 of them loaded at any given point (some are x11 drivers so you can only load one). Thus increasing the slots in glibc to 32 will fix this.

Comment 26 Carlos O'Donell 2014-09-06 18:25:56 UTC
http://koji.fedoraproject.org/koji/taskinfo?taskID=7538652

This should now be fixed in glibc-2.19.90-36.fc21 which is currently building. In this release I've bumped the surplus DTV slots to 32. An analysis of the distribution shows that's a reasonable minimum given glibc, MESA, and X11 usage.

Comment 27 Ankur Sinha (FranciscoD) 2014-09-07 05:38:15 UTC
(In reply to Carlos O'Donell from comment #26)
> http://koji.fedoraproject.org/koji/taskinfo?taskID=7538652
> 
> This should now be fixed in glibc-2.19.90-36.fc21 which is currently
> building. In this release I've bumped the surplus DTV slots to 32. An
> analysis of the distribution shows that's a reasonable minimum given glibc,
> MESA, and X11 usage.

Fixes my VLC issue. Thanks, Carlos.

Comment 28 Fedora Update System 2014-09-07 15:31:26 UTC
glibc-2.19.90-36.fc21 has been submitted as an update for Fedora 21.
https://admin.fedoraproject.org/updates/glibc-2.19.90-36.fc21

Comment 29 Fedora Update System 2014-09-08 16:08:48 UTC
Package glibc-2.19.90-36.fc21:
* should fix your issue,
* was pushed to the Fedora 21 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing glibc-2.19.90-36.fc21'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2014-10275/glibc-2.19.90-36.fc21
then log in and leave karma (feedback).

Comment 30 Fedora Update System 2014-09-11 02:44:15 UTC
glibc-2.19.90-36.fc21 has been pushed to the Fedora 21 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 31 Arkadiusz Miskiewicz 2014-11-24 17:01:15 UTC
(In reply to Carlos O'Donell from comment #25)

> After an analysis I see ~44 distribution-wide libraries using static TLS,

Could you tell us how to see/check which libraries are affected?

Comment 32 Carlos O'Donell 2014-11-24 17:32:04 UTC
(In reply to Arkadiusz Miskiewicz from comment #31)
> (In reply to Carlos O'Donell from comment #25)
> 
> > After an analysis I see ~44 distribution-wide libraries using static TLS,
> 
> Could you tell us how to see/check which libraries are affected?

All architectures should set the STATIC_TLS flag in the dynamic section if any variable used static TLS. Some architectures, AArch64 I believe, fail to set STATIC_TLS (a bug), but in that case you can see the static tls usage by the relocation.

e.g.

[carlos@koi static-tls]$ readelf -a -W /lib64/libc.so.6 | grep STATIC
 0x000000000000001e (FLAGS)              BIND_NOW STATIC_TLS

OK, because libc.so.6 uses static TLS to accelerate the runtime.

[carlos@koi lib64]$ readelf -a -W libGL.so.1.2.0 | grep STATIC
 0x000000000000001e (FLAGS)              SYMBOLIC STATIC_TLS

Technically not OK because libGL is not part of the core runtime, but as a distribution we are coordinating the use of static TLs across multiple libraries to accelerate the distribution.

[carlos@koi lib64]$ readelf -a -W libGL.so.1.2.0 | grep TPOFF
0000003f66e91fb8  0000000000000012 R_X86_64_TPOFF64                          0
0000003f66e91ff0  000000ab00000012 R_X86_64_TPOFF64       0000000000000000 _glapi_tls_Dispatch + 0

If on x86_64 the STATIC_TLS would not have been set, we could have looked for thread-pointer offset relocations to catch the use of static TLS. Here you can see that the libGL variable `_glapi_tls_Dispatch` is using static TLS and has a TPOFF relocation to have the dynamic loader setup the variable properly during assembly of the in-memory application execution image.

Comment 33 Tim Niemueller 2015-01-29 10:20:03 UTC
Any chance we see this small patch backported to F20? On our robots we use an architecture that loads plugins from shared libraries (dozens on a complex system) which build on a wide variety of existing system libraries. Therefore we frequently face this problem when loading (many) plugins which load a large number of shared libraries. Another problem for example is graphviz, which itself dlopens plugins for rendering.

We have rolled our own glibc (patch attached) and it works just fine. We would appreciate if that patch could be rolled out in F20. Since it does not change API or ABI this should be safe.

A scratch build for testing (which we use on our production machines) is available at
http://koji.fedoraproject.org/koji/taskinfo?taskID=8722107

Comment 34 Tim Niemueller 2015-01-29 10:20:46 UTC
Created attachment 985507 [details]
Patch for F-20 package


Note You need to log in before you can comment on or make changes to this bug.