Bug 1722181 - libgomp.so.1: cannot allocate memory in static TLS block
Summary: libgomp.so.1: cannot allocate memory in static TLS block
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: glibc
Version: 31
Hardware: Unspecified
OS: Linux
unspecified
unspecified
Target Milestone: ---
Assignee: Carlos O'Donell
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
: 1851184 (view as bug list)
Depends On:
Blocks: ARMTracker PPCTracker 1738752
TreeView+ depends on / blocked
 
Reported: 2019-06-19 15:36 UTC by Paul Whalen
Modified: 2020-12-21 16:55 UTC (History)
29 users (show)

Fixed In Version: glibc-2.31.9000-20
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-08-04 13:22:22 UTC
Type: Bug


Attachments (Terms of Use)
anaconda log (34.33 KB, text/plain)
2019-06-19 15:37 UTC, Paul Whalen
no flags Details
program log (79.49 KB, text/plain)
2019-06-19 15:38 UTC, Paul Whalen
no flags Details
LD_DEBUG=libs output (55.84 KB, text/plain)
2019-06-20 08:40 UTC, Dan Horák
no flags Details

Description Paul Whalen 2019-06-19 15:36:53 UTC
Description of problem:

Recent rawhide aarch64 installs fail with:

Starting installer, one moment...
anaconda 31.15-1.fc31 for Fedora Rawhide (pre-release) started.
 * installation log files are stored in /tmp during the installation
 * shell is available on TTY2
 * if the graphical installation interface fails to start, try again with the
   inst.text bootoption to start text installation
 * when reporting a bug add logs from /tmp as separate text/plain attachments
15:27:52 Not asking for VNC because of an automated install
15:27:52 Not asking for VNC because text mode was explicitly asked for in kickstart
Traceback (most recent call last):
  File "/sbin/anaconda", line 598, in <module>
    display.setup_display(anaconda, opts)
  File "/usr/lib64/python3.7/site-packages/pyanaconda/display.py", line 346, in setup_display
    anaconda.initInterface()
  File "/usr/lib64/python3.7/site-packages/pyanaconda/anaconda.py", line 292, in initInterface
    self._intf = TextUserInterface(self.storage, self.payload)
  File "/usr/lib64/python3.7/site-packages/pyanaconda/anaconda.py", line 97, in payload

ne 49, in <module>
    import dnf
  File "/usr/lib/python3.7/site-packages/dnf/__init__.py", line 30, in <module>
    import dnf.base
  File "/usr/lib/python3.7/site-packages/dnf/base.py", line 31, in <module>
    from dnf.comps import CompsQuery
  File "/usr/lib/python3.7/site-packages/dnf/comps.py", line 27, in <module>
    from dnf.exceptions import CompsError
  File "/usr/lib/python3.7/site-packages/dnf/exceptions.py", line 22, in <module>
>
    import dnf.util
  File "/usr/lib/python3.7/site-packages/dnf/util.py", line 29, in <module>
    import dnf.callback
  File "/usr/lib/python3.7/site-packages/dnf/callback.py", line 22, in <module>
    import dnf.yum.rpmtrans
  File "/usr/lib/python3.7/site-packages/dnf/yum/rpmtrans.py", line 26, in <modu
le>
    import rpm
  File "/usr/lib64/python3.7/site-packages/rpm/__init__.py", line 38, in <module
>
    from rpm._rpm import *
ImportError: /lib64/libgomp.so.1: cannot allocate memory in static TLS block


Version-Release number of selected component (if applicable):
gcc-9.1.1-2

Comment 1 Paul Whalen 2019-06-19 15:37:42 UTC
Created attachment 1582306 [details]
anaconda log

Comment 2 Paul Whalen 2019-06-19 15:38:26 UTC
Created attachment 1582307 [details]
program log

Comment 3 Paul Whalen 2019-06-19 15:44:10 UTC
Rawhide composes are failing as a result(aarch64 cloud image).

Comment 4 Kevin Fenzi 2019-06-19 15:45:09 UTC
ppc64le also fails this same way, while x86_64 works...

Comment 5 Florian Weimer 2019-06-19 15:55:24 UTC
(In reply to Paul Whalen from comment #0)
>     import rpm
>   File "/usr/lib64/python3.7/site-packages/rpm/__init__.py", line 38, in
> <module
> >
>     from rpm._rpm import *
> ImportError: /lib64/libgomp.so.1: cannot allocate memory in static TLS block

Any idea why the rpm module for Python depends on libgomp?  Could you please run ldd on the Python DSO, and hunt down the source using readelf -dW?  Thanks.

Comment 6 Jakub Jelinek 2019-06-19 15:57:30 UTC
That is a bug in whatever dlopened libgomp.so.1 and before doing that ate all the preallocated TLS area.
The GNU TLS2 model which I'm afraid aarch64 uses unfortunately eats from the same TLS preallocated pool as libraries that require static TLS like libgomp, where it is performance critical to have it as static TLS.
Either don't dlopen libgomp, or LD_PRELOAD it, link it with the application that dlopens it, cut down the uses of other TLS or dlopen libgomp earlier.
There is nothing that can be done on the gcc side.

Comment 7 Jakub Jelinek 2019-06-19 15:59:32 UTC
And, ldd will not help, but LD_DEBUG=libs python ... could.

Comment 8 Kevin Fenzi 2019-06-19 22:12:54 UTC
Adding rpm maintainer here. Any ideas on rpm's libgomp usage?

Comment 9 Panu Matilainen 2019-06-20 05:48:01 UTC
In rpm >= 4.14.90, librpmbuild links to libgomp, but the thing that's dlopen()'ing stuff here is python, which dlopen()'s the rpm module which links to librpmbuild. 
Rpm uses dlopen() too but only to load transaction plugins and that doesn't happen on module load, and libgomp is not involved in there.

Comment 10 Panu Matilainen 2019-06-20 07:29:51 UTC
So... as an emergency measure we can of course disable rpmbuild threading support on aarch64. Or the build python bindings.
It's just that both options feel like quite terrible sacrifices for the cause - after all, anaconda doesn't need the spec bindings for anything. In recent years the build-bindings were in a separate module but we just re-merged them for 4.15 because it was too much of a pain in other ways.

Ideas welcome.

Comment 11 Dan Horák 2019-06-20 07:33:40 UTC
(In reply to Panu Matilainen from comment #10)
> So... as an emergency measure we can of course disable rpmbuild threading
> support on aarch64. Or the build python bindings.

also ppc64le suffers from the same problem, see eg. https://koji.fedoraproject.org/koji/taskinfo?taskID=35629481

> It's just that both options feel like quite terrible sacrifices for the
> cause - after all, anaconda doesn't need the spec bindings for anything. In
> recent years the build-bindings were in a separate module but we just
> re-merged them for 4.15 because it was too much of a pain in other ways.
> 
> Ideas welcome.

Comment 12 Jakub Jelinek 2019-06-20 07:38:35 UTC
What would be interesting to find out is what other library that is dlopened before libgomp.so.1 eats the TLS block, look with LD_DEBUG=libs what libraries are loaded and in what order, and for each of them
readelf -Wl library | grep TLS
64-bit libgomp needs 120 bytes in TLS.

Comment 13 Dan Horák 2019-06-20 08:14:13 UTC
Good, I have a reproducer that can be run from the command line, outside of the regular installation, stay tuned.

Comment 14 Dan Horák 2019-06-20 08:40:17 UTC
Created attachment 1582562 [details]
LD_DEBUG=libs output

steps to reproduce
1.1 mock -r fedora-rawhide-ppc64le --enablerepo=local --init
1.2 mock -r fedora-rawhide-ppc64le --enablerepo=local --install anaconda
1.3 mock -r fedora-rawhide-ppc64le --enablerepo=local --update
^^ this is to get anaconda and the latest 4.14.90+ rpm into the rawhide chroot

2. mock -r fedora-rawhide-ppc64le shell
3. truncate -s 4G /mnt/image
4. LD_DEBUG=libs anaconda --image /mnt/image

Comment 15 Jakub Jelinek 2019-06-20 08:45:55 UTC
So now do:
for i in `sed -n 's/^.*trying file=//p' anaconda-ld-debug.log`; do echo $i; readelf -Wl $i | grep TLS; done

Comment 16 Dan Horák 2019-06-20 08:49:40 UTC
<mock-chroot> sh-5.0# for l in `grep trying /tmp/anaconda-ld-debug.log | cut -d '=' -f 2`; do echo lib=$l; readelf -Wl $l | grep TLS; done
lib=/lib64/libc.so.6
  TLS            0x1ecaf0 0x00000000001fcaf0 0x00000000001fcaf0 0x000010 0x000090 R   0x8
lib=/lib64/libpython3.7m.so.1.0
lib=/lib64/libpthread.so.0
lib=/lib64/libdl.so.2
lib=/lib64/libutil.so.1
lib=/lib64/libm.so.6
lib=/lib64/libz.so.1
lib=/lib64/libbz2.so.1
lib=/lib64/liblzma.so.5
lib=/lib64/libcrypto.so.1.1
lib=/lib64/libglib-2.0.so.0
lib=/lib64/libgirepository-1.0.so.1
lib=/lib64/libgobject-2.0.so.0
lib=/lib64/libffi.so.6
lib=/lib64/libpcre.so.1
lib=/lib64/libgmodule-2.0.so.0
lib=/lib64/libgio-2.0.so.0
lib=/lib64/libmount.so.1
  TLS            0x08d898 0x000000000009d898 0x000000000009d898 0x000000 0x000006 R   0x2
lib=/lib64/libselinux.so.1
  TLS            0x03f500 0x000000000004f500 0x000000000004f500 0x000028 0x0000e8 R   0x8
lib=/lib64/libresolv.so.2
lib=/lib64/libblkid.so.1
  TLS            0x06b6a0 0x000000000007b6a0 0x000000000007b6a0 0x000000 0x000006 R   0x2
lib=/lib64/librt.so.1
lib=/lib64/libpcre2-8.so.0
lib=/lib64/libcairo.so.2
lib=/lib64/libcairo-gobject.so.2
lib=/lib64/libpixman-1.so.0
  TLS            0x08ba40 0x000000000009ba40 0x000000000009ba40 0x000000 0x000180 R   0x8
lib=/lib64/libfontconfig.so.1
lib=/lib64/libfreetype.so.6
lib=/lib64/libpng16.so.16
lib=/lib64/libxcb-shm.so.0
lib=/lib64/libxcb.so.1
lib=/lib64/libxcb-render.so.0
lib=/lib64/libXrender.so.1
lib=/lib64/libX11.so.6
lib=/lib64/libXext.so.6
lib=/lib64/libexpat.so.1
lib=/lib64/libXau.so.6
lib=/lib64/libssl.so.1.1
lib=/lib64/libuuid.so.1
  TLS            0x00fb10 0x000000000001fb10 0x000000000001fb10 0x000004 0x00004a R   0x8
lib=/lib64/libsystemd.so.0
  TLS            0x0dcf10 0x00000000000ecf10 0x00000000000ecf10 0x000004 0x000090 R   0x8
lib=/lib64/liblz4.so.1
lib=/lib64/libgcrypt.so.20
lib=/lib64/libgcc_s.so.1
lib=/lib64/libgpg-error.so.0
lib=/lib64/libblockdev.so.2
lib=/lib64/libbd_utils.so.2
  TLS            0x00f960 0x000000000001f960 0x000000000001f960 0x000000 0x000008 R   0x8
lib=/lib64/libudev.so.1
  TLS            0x03f5d8 0x000000000004f5d8 0x000000000004f5d8 0x000000 0x000004 R   0x4
lib=/lib64/libkmod.so.2
lib=/lib64/libbd_lvm.so.2
lib=/lib64/libdevmapper.so.1.02
lib=/lib64/libsepol.so.1
lib=/lib64/libbd_btrfs.so.2
lib=/lib64/libbytesize.so.1
lib=/lib64/libmpfr.so.4
  TLS            0x07dfd8 0x000000000008dfd8 0x000000000008dfd8 0x0000d8 0x0000f8 R   0x8
lib=/lib64/libgmp.so.10
lib=/lib64/libbd_swap.so.2
lib=/lib64/libbd_loop.so.2
lib=/lib64/libbd_crypto.so.2
lib=/lib64/libcryptsetup.so.12
lib=/lib64/libssl3.so
lib=/lib64/libsmime3.so
lib=/lib64/libnss3.so
lib=/lib64/libnssutil3.so
lib=/lib64/libplds4.so
lib=/lib64/libplc4.so
lib=/lib64/libnspr4.so
lib=/lib64/libvolume_key.so.1
lib=/lib64/libargon2.so.1
lib=/lib64/libjson-c.so.4
  TLS            0x01f848 0x000000000002f848 0x000000000002f848 0x000000 0x000008 R   0x8
lib=/lib64/libgpgme.so.11
  TLS            0x06f5e0 0x000000000007f5e0 0x000000000007f5e0 0x000000 0x000004 R   0x4
lib=/lib64/libassuan.so.0
lib=/lib64/libbd_mpath.so.2
lib=/lib64/libbd_dm.so.2
lib=/lib64/libdmraid.so.1
lib=/lib64/libdevmapper-event.so.1.02
lib=/lib64/libbd_mdraid.so.2
lib=/lib64/libbd_nvdimm.so.2
lib=/lib64/libndctl.so.6
lib=/lib64/libdaxctl.so.1
lib=/lib64/libparted.so.2
lib=/lib64/libdbus-1.so.3
lib=/lib64/libpwquality.so.1
lib=/lib64/libcrack.so.2
lib=/lib64/libcrypt.so.2
lib=/lib64/libreport.so.0
lib=/lib64/libtar.so.1
lib=/lib64/libaugeas.so.0
lib=/lib64/libsatyr.so.3
lib=/lib64/libfa.so.1
lib=/lib64/libxml2.so.2
lib=/lib64/librpm.so.9
  TLS            0x0ab618 0x00000000000bb618 0x00000000000bb618 0x000000 0x002000 R   0x8
lib=/lib64/librpmio.so.9
  TLS            0x03da70 0x000000000004da70 0x000000000004da70 0x000004 0x00019c R   0x8
lib=/lib64/libstdc++.so.6
  TLS            0x274218 0x0000000000284218 0x0000000000284218 0x000000 0x000020 R   0x8
lib=/lib64/libdw.so.1
  TLS            0x06df30 0x000000000007df30 0x000000000007df30 0x000000 0x000008 R   0x4
lib=/lib64/libelf.so.1
  TLS            0x01f930 0x000000000002f930 0x000000000002f930 0x000000 0x000004 R   0x4
lib=/lib64/libzstd.so.1
lib=/lib64/libpopt.so.0
lib=/lib64/libcap.so.2
lib=/lib64/libacl.so.1
lib=/lib64/liblua-5.3.so
lib=/lib64/libdb-5.3.so
lib=/lib64/libaudit.so.1
lib=/lib64/libattr.so.1
lib=/lib64/libcap-ng.so.0
  TLS            0x00fb08 0x000000000001fb08 0x000000000001fb08 0x000030 0x000030 R   0x4
lib=/lib64/libdnf.so.2
lib=/lib64/librepo.so.0
lib=/lib64/libsolv.so.1
lib=/lib64/libsolvext.so.1
lib=/lib64/libsqlite3.so.0
lib=/lib64/libmodulemd.so.1
lib=/lib64/libsmartcols.so.1
  TLS            0x04e320 0x000000000005e320 0x000000000005e320 0x000000 0x000006 R   0x2
lib=/lib64/libcurl.so.4
lib=/lib64/libzck.so.1
lib=/lib64/libyaml-0.so.2
lib=/lib64/libnghttp2.so.14
lib=/lib64/libidn2.so.0
lib=/lib64/libssh.so.4
  TLS            0x0ae450 0x00000000000be450 0x00000000000be450 0x000000 0x000014 R   0x8
lib=/lib64/libpsl.so.5
lib=/lib64/libgssapi_krb5.so.2
lib=/lib64/libkrb5.so.3
lib=/lib64/libk5crypto.so.3
lib=/lib64/libcom_err.so.2
  TLS            0x00fb48 0x000000000001fb48 0x000000000001fb48 0x000000 0x000019 R   0x8
lib=/lib64/libldap-2.4.so.2
lib=/lib64/liblber-2.4.so.2
lib=/lib64/libbrotlidec.so.1
lib=/lib64/libunistring.so.2
lib=/lib64/libkrb5support.so.0
lib=/lib64/libkeyutils.so.1
lib=/lib64/libsasl2.so.3
lib=/lib64/libbrotlicommon.so.1
lib=/lib64/librpmbuild.so.9
lib=/lib64/librpmsign.so.9
lib=/lib64/libmagic.so.1
lib=/lib64/libgomp.so.1
  TLS            0x04fc90 0x000000000005fc90 0x000000000005fc90 0x000000 0x000078 R   0x8
lib=/lib64/libimaevm.so.0

Comment 17 Panu Matilainen 2019-06-20 09:52:09 UTC
Practical question of the day: do you want me to disable openmp use in rpm on these architectures while this is being investigated?

Comment 18 Dan Horák 2019-06-20 10:06:08 UTC
From the discussion we had on IRC it's an excessive TLS usage by the libs. Now waiting for Jakub or Florian to post an "executive" summary here.

If you would prepare a scratch build without openmp, then I can check it before making it official.

Comment 19 Mark Wielaard 2019-06-20 10:07:05 UTC
One issue, identified by Florian, is that librpm allocates a huge amount of TLS storage.
In particular it has this in lib/headerfmt.c:

RPM_GNUC_PRINTF(2, 3)
static void hsaError(headerSprintfArgs hsa, const char *fmt, ...)
{
    /* Use thread local static buffer as headerFormat() errmsg arg is const */
    static __thread char errbuf[BUFSIZ];

Where BUFSIZ is 8192 bytes.

Would it be possible to rewrite that code so that it doesn't use thread local storage for that buffer, or at least only a thread local pointer to an array?

Comment 20 Jakub Jelinek 2019-06-20 10:12:22 UTC
So, the above shows there are some completely insane usages of TLS (primarily librpm, TLS is a scarce resource, having 8192 bytes per thread in it is too much, normal way would be to use a __thread pointer and if some destruction is needed and it is in C, call __cxa_thread_atexit on it when allocated).
Then multiple libraries eat a lot and with the aarch64/ppc64le/GNU2 TLS those eat from the static TLS extra buffer as well.
Biggest offenders librpmio, libpixman, libmpfr, libselinux, libsystemd.
So, either ensure libgomp is loaded if needed before those libraries, or not at all.
Another option is move the code that uses OpenMP into a separate library, build that library twice, once with -fopenmp, then with -fno-openmp, and try to dlopen the -fopenmp one first and if that fails, the non-openmp one.  That way, if there is still enough static TLS, you'd use parallelism, otherwise fallback to serial version.

Comment 21 Dan Horák 2019-06-20 10:16:46 UTC
let me get the numbers also for the aarch64 platform ...

Comment 22 Panu Matilainen 2019-06-20 10:28:34 UTC
> TLS is a scarce resource

This may come as shock to you, but that's news to me. Probably "us" in place of "me" would be equally correct.
Oh well, it's a good day when you learn something new. I'll review those usages and fix, no problem with that.

I'll disable openmp on the problem architectures as the first remedy (I wont be here tomorrow so it's faster to just disable straight in rawhide)

Comment 23 Dan Horák 2019-06-20 10:45:22 UTC
and output for aarch64 is:

lib=/lib64/libc.so.6
  TLS            0x15d7b8 0x000000000016d7b8 0x000000000016d7b8 0x000010 0x000090 R   0x8
lib=/lib64/libpython3.7m.so.1.0
lib=/lib64/libpthread.so.0
lib=/lib64/libdl.so.2
lib=/lib64/libutil.so.1
lib=/lib64/libm.so.6
lib=/lib64/libz.so.1
lib=/lib64/libbz2.so.1
lib=/lib64/liblzma.so.5
lib=/lib64/libcrypto.so.1.1
lib=/lib64/libglib-2.0.so.0
lib=/lib64/libgirepository-1.0.so.1
lib=/lib64/libgobject-2.0.so.0
lib=/lib64/libffi.so.6
lib=/lib64/libpcre.so.1
lib=/lib64/libgmodule-2.0.so.0
lib=/lib64/libgio-2.0.so.0
lib=/lib64/libmount.so.1
  TLS            0x05da28 0x000000000006da28 0x000000000006da28 0x000000 0x000006 R   0x8
lib=/lib64/libselinux.so.1
  TLS            0x02f610 0x000000000003f610 0x000000000003f610 0x000028 0x0000e1 R   0x8
lib=/lib64/libresolv.so.2
lib=/lib64/libblkid.so.1
  TLS            0x05b920 0x000000000006b920 0x000000000006b920 0x000000 0x000006 R   0x8
lib=/lib64/librt.so.1
lib=/lib64/libpcre2-8.so.0
lib=/lib64/libcairo.so.2
lib=/lib64/libcairo-gobject.so.2
lib=/lib64/libpixman-1.so.0
  TLS            0x05c118 0x000000000006c118 0x000000000006c118 0x000000 0x000180 R   0x8
lib=/lib64/libfontconfig.so.1
lib=/lib64/libfreetype.so.6
lib=/lib64/libpng16.so.16
lib=/lib64/libxcb-shm.so.0
lib=/lib64/libxcb.so.1
lib=/lib64/libxcb-render.so.0
lib=/lib64/libXrender.so.1
lib=/lib64/libX11.so.6
lib=/lib64/libXext.so.6
lib=/lib64/libexpat.so.1
lib=/lib64/libXau.so.6
lib=/lib64/libssl.so.1.1
lib=/lib64/libuuid.so.1
  TLS            0x00fba8 0x000000000001fba8 0x000000000001fba8 0x000004 0x00004e R   0x8
lib=/lib64/libsystemd.so.0
  TLS            0x0acef8 0x00000000000bcef8 0x00000000000bcef8 0x000004 0x000090 R   0x8
lib=/lib64/liblz4.so.1
lib=/lib64/libgcrypt.so.20
lib=/lib64/libgcc_s.so.1
lib=/lib64/libgpg-error.so.0
lib=/lib64/libblockdev.so.2
lib=/lib64/libbd_utils.so.2
  TLS            0x00fa10 0x000000000001fa10 0x000000000001fa10 0x000000 0x000008 R   0x8
lib=/lib64/libudev.so.1
  TLS            0x02f668 0x000000000003f668 0x000000000003f668 0x000000 0x000004 R   0x4
lib=/lib64/libkmod.so.2
lib=/lib64/libbd_lvm.so.2
lib=/lib64/libdevmapper.so.1.02
lib=/lib64/libsepol.so.1
lib=/lib64/libbd_btrfs.so.2
lib=/lib64/libbytesize.so.1
lib=/lib64/libmpfr.so.4
  TLS            0x06e388 0x000000000007e388 0x000000000007e388 0x0000d8 0x0000f8 R   0x8
lib=/lib64/libgmp.so.10
lib=/lib64/libbd_swap.so.2
lib=/lib64/libbd_loop.so.2
lib=/lib64/libbd_crypto.so.2
lib=/lib64/libcryptsetup.so.12
lib=/lib64/libssl3.so
lib=/lib64/libsmime3.so
lib=/lib64/libnss3.so
lib=/lib64/libnssutil3.so
lib=/lib64/libplds4.so
lib=/lib64/libplc4.so
lib=/lib64/libnspr4.so
lib=/lib64/libvolume_key.so.1
lib=/lib64/libargon2.so.1
lib=/lib64/libjson-c.so.4
  TLS            0x00f8b8 0x000000000001f8b8 0x000000000001f8b8 0x000000 0x000008 R   0x8
lib=/lib64/libgpgme.so.11
  TLS            0x04f718 0x000000000005f718 0x000000000005f718 0x000000 0x000004 R   0x4
lib=/lib64/libassuan.so.0
lib=/lib64/libbd_mpath.so.2
lib=/lib64/libbd_dm.so.2
lib=/lib64/libdmraid.so.1
lib=/lib64/libdevmapper-event.so.1.02
lib=/lib64/libbd_mdraid.so.2
lib=/lib64/libbd_nvdimm.so.2
lib=/lib64/libndctl.so.6
lib=/lib64/libdaxctl.so.1
lib=/lib64/libparted.so.2
lib=/lib64/libdbus-1.so.3
lib=/lib64/libpwquality.so.1
lib=/lib64/libcrack.so.2
lib=/lib64/libcrypt.so.2
lib=/lib64/libreport.so.0
lib=/lib64/libtar.so.1
lib=/lib64/libaugeas.so.0
lib=/lib64/libsatyr.so.3
lib=/lib64/libfa.so.1
lib=/lib64/libxml2.so.2
lib=/lib64/librpm.so.9
  TLS            0x07b6d0 0x000000000008b6d0 0x000000000008b6d0 0x000000 0x002000 R   0x8
lib=/lib64/librpmio.so.9
  TLS            0x02db78 0x000000000003db78 0x000000000003db78 0x000004 0x000198 R   0x8
lib=/lib64/libstdc++.so.6
  TLS            0x1d5978 0x00000000001e5978 0x00000000001e5978 0x000000 0x000020 R   0x8
lib=/lib64/libdw.so.1
  TLS            0x04dfb0 0x000000000005dfb0 0x000000000005dfb0 0x000000 0x000008 R   0x4
lib=/lib64/libelf.so.1
  TLS            0x01f9f0 0x000000000002f9f0 0x000000000002f9f0 0x000000 0x000004 R   0x4
lib=/lib64/libzstd.so.1
lib=/lib64/libpopt.so.0
lib=/lib64/libcap.so.2
lib=/lib64/libacl.so.1
lib=/lib64/liblua-5.3.so
lib=/lib64/libdb-5.3.so
lib=/lib64/libaudit.so.1
lib=/lib64/libattr.so.1
lib=/lib64/libcap-ng.so.0
  TLS            0x00fbc8 0x000000000001fbc8 0x000000000001fbc8 0x000030 0x000030 R   0x8
lib=/lib64/libdnf.so.2
lib=/lib64/librepo.so.0
lib=/lib64/libsolv.so.1
lib=/lib64/libsolvext.so.1
lib=/lib64/libsqlite3.so.0
lib=/lib64/libmodulemd.so.1
lib=/lib64/libsmartcols.so.1
  TLS            0x03e4c0 0x000000000004e4c0 0x000000000004e4c0 0x000000 0x000006 R   0x8
lib=/lib64/libcurl.so.4
lib=/lib64/libzck.so.1
lib=/lib64/libyaml-0.so.2
lib=/lib64/libnghttp2.so.14
lib=/lib64/libidn2.so.0
lib=/lib64/libssh.so.4
  TLS            0x08e478 0x000000000009e478 0x000000000009e478 0x000000 0x000018 R   0x8
lib=/lib64/libpsl.so.5
lib=/lib64/libgssapi_krb5.so.2
lib=/lib64/libkrb5.so.3
lib=/lib64/libk5crypto.so.3
lib=/lib64/libcom_err.so.2
  TLS            0x00fbe8 0x000000000001fbe8 0x000000000001fbe8 0x000000 0x000019 R   0x8
lib=/lib64/libldap-2.4.so.2
lib=/lib64/liblber-2.4.so.2
lib=/lib64/libbrotlidec.so.1
lib=/lib64/libunistring.so.2
lib=/lib64/libkrb5support.so.0
lib=/lib64/libkeyutils.so.1
lib=/lib64/libsasl2.so.3
lib=/lib64/libbrotlicommon.so.1
lib=/lib64/librpmbuild.so.9
lib=/lib64/librpmsign.so.9
lib=/lib64/libmagic.so.1
lib=/lib64/libgomp.so.1
  TLS            0x03fd38 0x000000000004fd38 0x000000000004fd38 0x000000 0x000078 R   0x8
lib=/lib64/libimaevm.so.0

Comment 24 Panu Matilainen 2019-06-20 11:26:38 UTC
So it turns out fixing the actual issue is easier than fighting auto*foo :)
The excessive TLS use should be now gone in rawhide as of rpm-4.14.90-0.git14653.17.fc31

Let me know whether that fixes the problem, if not I'll supply the build with openmp entirely disabled for the problem archs.

Comment 25 Dan Horák 2019-06-20 11:27:19 UTC
So the problem is still there with rpm-4.14.90-0.git14653.17.fc31, which decreased the TLS size to 0x000008 :-(

Comment 26 Dan Horák 2019-06-20 11:46:19 UTC
As an experiment I have dropped the __thread modifier for some variables in the librpmio source and this unbreaks anaconda. Some libs need to be fixed by reducing the TLS size ...

Comment 27 Dan Horák 2019-06-20 11:47:53 UTC
(In reply to Dan Horák from comment #26)
> As an experiment I have dropped the __thread modifier for some variables in
> the librpmio source and this unbreaks anaconda. Some libs need to be fixed
> by reducing the TLS size ...

this is on top of Panu's rpm-4.14.90-0.git14653.17.fc31

Comment 28 Panu Matilainen 2019-06-20 11:53:46 UTC
Right, those sigset_t's are pretty big too. I'll need to review the sanity of that all, it's probably not so sane :)

I try to have a look at it next week before I go on a long vacation, but in the meanwhile, here's an aarch64 scratch-build with openmp disabled: https://koji.fedoraproject.org/koji/taskinfo?taskID=35647700, if you can verify it I'll push it to rawhide to get things rolling again.

Comment 29 Dan Horák 2019-06-20 12:13:13 UTC
with rpms from https://koji.fedoraproject.org/koji/taskinfo?taskID=35648014 I can't reproduce the problem

same when 2 sigset_t static vars converted to non-TLS, with only 1 converted it still crashes

Comment 30 Dan Horák 2019-06-20 12:40:35 UTC
and retested with rpm-4.14.90-0.git14653.18.fc31, where Panu additionally removed all TLS usage from librpmio, and we should be good now, anaconda doesn't crash with the TLS error

Comment 31 Ben Cotton 2019-08-13 17:13:12 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 31 development cycle.
Changing version to '31'.

Comment 32 Ben Cotton 2019-08-13 17:33:32 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 31 development cycle.
Changing version to 31.

Comment 33 Jerry James 2019-10-02 14:34:55 UTC
Koschei just informed me that python-networkx's builds started failing in Rawhide with this error.

https://koji.fedoraproject.org/koji/taskinfo?taskID=37908747

I don't know what changed, but I have never seen this error before.

Comment 34 Kevin Fenzi 2019-10-18 20:34:47 UTC
rawhide composes just hit it again too. So something must have gotten reverted or changed around this recently?

Traceback (most recent call last):                                              >
  File "/sbin/anaconda", line 576, in <module>
    display.setup_display(anaconda, opts)
  File "/usr/lib64/python3.8/site-packages/pyanaconda/display.py", line 339, in 
setup_display                           
    anaconda.initInterface()                                                    
  File "/usr/lib64/python3.8/site-packages/pyanaconda/anaconda.py", line 295, in
 initInterface                                                                  
    self._intf = TextUserInterface(self.storage, self.payload)
  File "/usr/lib64/python3.8/site-packages/pyanaconda/anaconda.py", line 100, in
 payload                                                                        
    from pyanaconda.payload.dnfpayload import DNFPayload
  File "/usr/lib64/python3.8/site-packages/pyanaconda/payload/dnfpayload.py", li
ne 49, in <module>  
    import dnf
  File "/usr/lib/python3.8/site-packages/dnf/__init__.py", line 30, in <module>
    import dnf.base
  File "/usr/lib/python3.8/site-packages/dnf/base.py", line 31, in <module>
    from dnf.comps import CompsQuery
  File "/usr/lib/python3.8/site-packages/dnf/comps.py", line 27, in <module>
    from dnf.exceptions import CompsError
  File "/usr/lib/python3.8/site-packages/dnf/exceptions.py", line 22, in <module>
>
    import dnf.util
  File "/usr/lib/python3.8/site-packages/dnf/util.py", line 29, in <module>
    import dnf.callback
  File "/usr/lib/python3.8/site-packages/dnf/callback.py", line 22, in <module>
    import dnf.yum.rpmtrans
  File "/usr/lib/python3.8/site-packages/dnf/yum/rpmtrans.py", line 26, in <modu
le>
    import rpm
  File "/usr/lib64/python3.8/site-packages/rpm/__init__.py", line 38, in <module
>
    from rpm._rpm import *
ImportError: /lib64/libgomp.so.1: cannot allocate memory in static TLS block

Pane is dead (status 1, Fri Oct 18 20:28:45 2019)

If I can identify whats causing it, I can untag that to unblock rawhide, but it's not super clear... I guess rpm?

Comment 35 Dan Horák 2019-10-19 17:02:47 UTC
I suspect the recent mfpr update, but I will look into the details again.

Comment 36 Dan Horák 2019-10-19 17:04:04 UTC
Also, I brought the problem to the glibc mailing list - https://sourceware.org/ml/libc-alpha/2019-09/msg00512.html - and there was a good discussion about it.

Comment 37 Kevin Fenzi 2019-10-19 17:44:25 UTC
Thanks Dan. 

I suppose if we wanted to untag to get back to a working state right now we would have to backout the entire mfpr update/all things rebuilt against it?

Or is there a simple workaround we could do soon to get composes working again?

Comment 38 Jerry James 2019-10-20 23:20:37 UTC
(In reply to Dan Horák from comment #35)
> I suspect the recent mfpr update, but I will look into the details again.

Why do you suspect it?  Note that the python-networkx problem noted in comment 33 was *before* I started doing the mpfr 4 builds.

Comment 39 Dan Horák 2019-10-21 13:28:51 UTC
20191014 compose
lib=/lib64/libmpfr.so.4
  TLS            0x07dfd8 0x000000000008dfd8 0x000000000008dfd8 0x0000d8 0x0000f8 R   0x8

20191018 compose
lib=/lib64/libmpfr.so.6
  TLS            0x09dba8 0x00000000000adba8 0x00000000000adba8 0x0000d8 0x000364 R   0x8

So, mpfr increased its TLS storage significantly between version 3 and 4 ...

Comment 40 Dan Horák 2019-10-21 14:02:46 UTC
and mpfr is the only significant update between the 2 composes for the anaconda run-root

Comment 41 Kevin Fenzi 2019-10-21 18:57:42 UTC
We are going to temporarily mark the aarch64 cloud base image as failable so we can get a rawhide compose out, then we will revert that change. 

So, we still need some solution or workaround here sooner rather than later...

Comment 42 Jerry James 2019-10-21 20:47:43 UTC
Isn't this the real problem? https://sourceware.org/bugzilla/show_bug.cgi?id=25051

I can try doing something about mpfr's use of TLS, but again, see comment 33.  That happened before the mpfr update, so there is no guarantee that "fixing" mpfr will help.

MPFR itself does not use the initial-exec model, so far as I am able to determine.  At least, it does not provide the -ftls-model option to gcc, nor does it use attributes to change the model.  Therefore, since it is a shared object built with -fPIC, all of its TLS variables should be global-dynamic, right?

The growth in mpfr's use of TLS appears to be due in large part to the addition of an array of mpz_t values in src/pool.c (mpz_tab).  If you would like, I can see if that pool can be disabled, or if it can be allocated dynamically so that only a single pointer is in TLS.  This would have to be done without changing the ABI in any way, of course, lest gcc be broken in the process.

I am skeptical that that will fix the problem.

Comment 43 Jerry James 2019-10-21 21:02:46 UTC
The discussions on the glibc and gcc mailing lists imply that invoking anaconda with this in its environment:

LD_PRELOAD=libgomp.so.1

may work around the issue by ensuring that libgomp gets the statically allocated TLS space that it needs, before the optimization of handing out that space to other libraries kicks in.

Comment 44 Dan Horák 2019-10-21 21:04:24 UTC
What I'm trying to say is that a change in mpfr caused the failed anaconda run. Yes, there is a plan for a global fix in glibc. But an open question is what we can do until the fix available. I was told that a good coding practice for handling bigger data structures as static TLS data is to store only pointers, while the structures will be allocated dynamicaly.

Comment 45 Dan Horák 2019-10-21 21:13:56 UTC
(In reply to Jerry James from comment #43)
> The discussions on the glibc and gcc mailing lists imply that invoking
> anaconda with this in its environment:
> 
> LD_PRELOAD=libgomp.so.1
> 
> may work around the issue by ensuring that libgomp gets the statically
> allocated TLS space that it needs, before the optimization of handing out
> that space to other libraries kicks in.

which should probably go into https://github.com/rhinstaller/anaconda/blob/master/data/systemd/anaconda-direct.service#L11

Now I wonder if an updates image will be sufficient for testing or a whole new compose with patched anaconda package will be required ...

Comment 46 Kevin Fenzi 2019-10-21 23:20:09 UTC
Ugh, somehow I marked this private and used the wrong account. ;( 

Fixed.

Comment 47 Florian Weimer 2019-10-22 08:31:37 UTC
We could also revert the RPM change which added the OpenMP dependency, and maybe rework it so that it is specific to rpmbuild before it is activated again.  It may be the less invasive change at this time.

Comment 48 Panu Matilainen 2019-10-22 08:58:54 UTC
The libgomp dependency is specific to librpmbuild only, but the python bindings drag that in as well.
Rpm can be built without OpenMP support but disabling parallelism in rpmbuild because it indirectly breaks the installer seems ... you know.

Comment 49 Florian Weimer 2019-10-22 09:54:25 UTC
(In reply to Panu Matilainen from comment #48)
> The libgomp dependency is specific to librpmbuild only, but the python
> bindings drag that in as well.

Can this be made optional, loading librpmbuild lazily?

> Rpm can be built without OpenMP support but disabling parallelism in
> rpmbuild because it indirectly breaks the installer seems ... you know.

An installer that depends on librpmbuild and libgomp is also a bit strange, given that it doesn't use it at all.

It's also not clear that thread-base parallelism is appropriate in this context.  Bug 1729382 showed that it does not work on 32-bit platforms.

Comment 50 Peter Robinson 2019-10-22 11:21:19 UTC
> An installer that depends on librpmbuild and libgomp is also a bit strange,
> given that it doesn't use it at all.

That's because the rpm python bindings are one big collection and not broken in to rpmbuild and standard rpm runtime bindings, if they were split (no idea the feasibility of that) you should lose all the rpmbuild deps.

Comment 51 Panu Matilainen 2019-10-22 11:46:29 UTC
Actually the python bindings *were* split into separate sub-libraries for several years, and were just reunited in 4.15 because of all the pain-and-no-gain the split caused over the years.

Comment 52 Dan Horák 2019-10-23 12:46:12 UTC
I just checked (on ppc64le) that anaconda can be fixed with

diff --git a/data/tmux.conf b/data/tmux.conf
index c909aca31..25de1704f 100644
--- a/data/tmux.conf
+++ b/data/tmux.conf
@@ -18,7 +18,7 @@ set-option -g history-limit 10000
 
 # The idea here is to detach the client started here via anaconda.service, and
 # then re-attach to it in the tmux service run on the console tty.
-new-session -d -s anaconda -n main "anaconda"
+new-session -d -s anaconda -n main "LD_PRELOAD=libgomp.so.1 anaconda"
 
 set-option status-right '#[fg=blue]#(echo -n "Switch tab: Alt+Tab | Help: F1 ")'

Comment 53 Kevin Fenzi 2019-10-30 22:13:38 UTC
Sadly, the workaround doesn't seem to be working for Cloud-Base images. ;( 

We can discuss that issue over in https://bugzilla.redhat.com/show_bug.cgi?id=1764666 though....

Comment 54 Adam Williamson 2019-11-27 18:20:35 UTC
we also seem to be hitting this building Fedora-Workstation-Live for ppc64le:

https://koji.fedoraproject.org/koji/taskinfo?taskID=39373511

Comment 55 Dan Horák 2019-11-27 18:38:02 UTC
Hmm, is anaconda invoked there differently?

Comment 56 Dan Horák 2019-12-12 11:29:31 UTC
(In reply to Dan Horák from comment #55)
> Hmm, is anaconda invoked there differently?

It is, seems it runs in a chroot from livemedia-creator and not as a service thru tmux.

Comment 57 Dan Horák 2020-01-10 08:36:17 UTC
a fix has been proposed in https://sourceware.org/ml/libc-alpha/2020-01/msg00099.html

Comment 58 Dan Horák 2020-01-10 10:24:32 UTC
livemedia-creator workaround proposed in https://github.com/weldr/lorax/pull/942

Comment 59 Carlos O'Donell 2020-01-10 14:18:05 UTC
The action item for me here is that the upstream patch needs to be reviewed and moved forward.

Comment 60 Adam Williamson 2020-05-22 16:30:22 UTC
So this seems to have stalled upstream...

Comment 61 Carlos O'Donell 2020-05-22 16:36:33 UTC
(In reply to Adam Williamson from comment #60)
> So this seems to have stalled upstream...

This is 3rd on my public patchwork review queue.

Comment 62 Dan Horák 2020-06-02 12:29:36 UTC
For the record, the issue is back with the Fedora-Rawhide-20200601.n.1 compose, Fedora-Rawhide-20200530.n.0 is OK. Also I think it's slightly different this time, because it seems to affect only GUI installation, while previously even text mode failed.

Comment 63 Dan Horák 2020-06-02 17:58:07 UTC
(In reply to Dan Horák from comment #62)
> For the record, the issue is back with the Fedora-Rawhide-20200601.n.1
> compose, Fedora-Rawhide-20200530.n.0 is OK. Also I think it's slightly
> different this time, because it seems to affect only GUI installation, while
> previously even text mode failed.

It's really caused by a change in anaconda, https://github.com/rhinstaller/anaconda/pull/2631 will fix it (aka adapt the workaround to the recent changes).

Comment 65 Carlos O'Donell 2020-06-09 13:52:08 UTC
This is now 2nd on my review queue. We've given Arm some feedback about this already with regards to splitting the patch into tunable + changes.

Comment 66 Carlos O'Donell 2020-06-23 13:14:37 UTC
I reviewed the initial implementation from Arm to fix this, and they have reposted a v2 and I'm going to review the v2 for inclusion in glibc 2.32.

Comment 67 Bastien Nocera 2020-06-26 08:19:14 UTC
*** Bug 1851184 has been marked as a duplicate of this bug. ***

Comment 68 Carlos O'Donell 2020-08-04 13:22:22 UTC
This is now fixed.

The solution is as follows:
* TLS static space is reserved for the implementation and not entirely consumed by the TLS descriptors.
* You have access two various tunables to increase the amount of static TLS that is reserved and it can be adjusted for a given deployed workload.

This is now CLOSED/RAWHIDE.

Comment 69 Dan Horák 2020-08-04 13:46:39 UTC
We have also verified the fix in glibc is allowing the installer to run correctly without the LD_PRELOAD workaround.


Note You need to log in before you can comment on or make changes to this bug.