Description of problem: I see a strange Segmentation fault in test_local_lib of python-meson-python. At first, I thought this was related to Python 3.14 which is what I was testing, but it happening with older Pythons as well. The Segmentation fault reproduces with binutils 2.43.50-5.fc42, but not with binutils 2.43.1-2.fc42. Version-Release number of selected component: 2.43.50-5.fc42 How reproducible: # dnf install uv git-core cmake python3.13-devel python3.14-devel gcc patchelf gdb $ git clone https://github.com/mesonbuild/meson-python.git $ cd meson-python $ uv venv --python=python3.14 venv # or python3.13 $ . venv/bin/activate $ uv pip install ninja .[test] $ python -m pytest -k test_local_lib ... ============================= test session starts ============================== platform linux -- Python 3.14.0a1, pytest-8.3.3, pluggy-1.5.0 rootdir: /meson-python configfile: pyproject.toml testpaths: tests plugins: cov-5.0.0, mock-3.14.0 collected 123 items / 122 deselected / 1 selected tests/test_wheel.py F [100%] =================================== FAILURES =================================== ________________________________ test_local_lib ________________________________ venv = <tests.conftest.VEnv object at 0x7fb577566f90> wheel_link_against_local_lib = PosixPath('/tmp/pytest-of-root/pytest-5/test0/mesonpy-test-5tupkd1z/link_against_local_lib-1.0.0-cp314-cp314-linux_x86_64.whl') @pytest.mark.skipif(sys.platform not in {'linux', 'darwin'}, reason='Not supported on this platform') def test_local_lib(venv, wheel_link_against_local_lib): venv.pip('install', wheel_link_against_local_lib) > output = venv.python('-c', 'import example; print(example.example_sum(1, 2))') tests/test_wheel.py:160: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ tests/conftest.py:114: in python return subprocess.check_output([self.executable, *args]).decode() /usr/lib64/python3.14/subprocess.py:472: in check_output return run(*popenargs, stdout=PIPE, timeout=timeout, check=True, _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ input = None, capture_output = False, timeout = None, check = True popenargs = (['/tmp/pytest-of-root/pytest-5/mesonpy-test-venv0/bin/python3.14', '-c', 'import example; print(example.example_sum(1, 2))'],) kwargs = {'stdout': -1} process = <Popen: returncode: -11 args: ['/tmp/pytest-of-root/pytest-5/mesonpy-test-ve...> stdout = b'', stderr = None, retcode = -11 def run(*popenargs, input=None, capture_output=False, timeout=None, check=False, **kwargs): """Run command with arguments and return a CompletedProcess instance. The returned instance will have attributes args, returncode, stdout and stderr. By default, stdout and stderr are not captured, and those attributes will be None. Pass stdout=PIPE and/or stderr=PIPE in order to capture them, or pass capture_output=True to capture both. If check is True and the exit code was non-zero, it raises a CalledProcessError. The CalledProcessError object will have the return code in the returncode attribute, and output & stderr attributes if those streams were captured. If timeout is given, and the process takes too long, a TimeoutExpired exception will be raised. There is an optional argument "input", allowing you to pass bytes or a string to the subprocess's stdin. If you use this argument you may not also use the Popen constructor's "stdin" argument, as it will be used internally. By default, all communication is in bytes, and therefore any "input" should be bytes, and the stdout and stderr will be bytes. If in text mode, any "input" should be a string, and stdout and stderr will be strings decoded according to locale encoding, or by "encoding" if set. Text mode is triggered by setting any of text, encoding, errors or universal_newlines. The other arguments are the same as for the Popen constructor. """ if input is not None: if kwargs.get('stdin') is not None: raise ValueError('stdin and input arguments may not both be used.') kwargs['stdin'] = PIPE if capture_output: if kwargs.get('stdout') is not None or kwargs.get('stderr') is not None: raise ValueError('stdout and stderr arguments may not be used ' 'with capture_output.') kwargs['stdout'] = PIPE kwargs['stderr'] = PIPE with Popen(*popenargs, **kwargs) as process: try: stdout, stderr = process.communicate(input, timeout=timeout) except TimeoutExpired as exc: process.kill() if _mswindows: # Windows accumulates the output in a single blocking # read() call run on child threads, with the timeout # being done in a join() on those threads. communicate() # _after_ kill() is required to collect that and add it # to the exception. exc.stdout, exc.stderr = process.communicate() else: # POSIX _communicate already populated the output so # far into the TimeoutExpired exception. process.wait() raise except: # Including KeyboardInterrupt, communicate handled that. process.kill() # We don't call process.wait() as .__exit__ does that for us. raise retcode = process.poll() if check and retcode: > raise CalledProcessError(retcode, process.args, output=stdout, stderr=stderr) E subprocess.CalledProcessError: Command '['/tmp/pytest-of-root/pytest-5/mesonpy-test-venv0/bin/python3.14', '-c', 'import example; print(example.example_sum(1, 2))']' died with <Signals.SIGSEGV: 11>. /usr/lib64/python3.14/subprocess.py:577: CalledProcessError ---------------------------- Captured stdout setup ----------------------------- Initialized empty Git repository in /meson-python/tests/packages/link-against-local-lib/.git/ + meson setup /meson-python/tests/packages/link-against-local-lib /meson-python/tests/packages/link-against-local-lib/.mesonpy-gx4cvlf7 -Dbuildtype=release -Db_ndebug=if-release -Db_vscrt=md --native-file=/meson-python/tests/packages/link-against-local-lib/.mesonpy-gx4cvlf7/meson-python-native-file.ini The Meson build system Version: 1.6.0 Source dir: /meson-python/tests/packages/link-against-local-lib Build dir: /meson-python/tests/packages/link-against-local-lib/.mesonpy-gx4cvlf7 Build type: native build Project name: link-against-local-lib Project version: 1.0.0 C compiler for the host machine: cc (gcc 14.2.1 "cc (GCC) 14.2.1 20240912 (Red Hat 14.2.1-4)") C linker for the host machine: cc ld.bfd 2.43.50.20241014 Host machine cpu family: x86_64 Host machine cpu: x86_64 Program python found: YES (/meson-python/venv/bin/python) Found pkg-config: YES (/usr/bin/pkg-config) 2.3.0 Run-time dependency python found: YES 3.14 WARNING: Please do not define rpath with a linker argument, use install_rpath or build_rpath properties instead. This will become a hard error in a future Meson release. Build targets in project: 2 link-against-local-lib 1.0.0 User defined options Native files: /meson-python/tests/packages/link-against-local-lib/.mesonpy-gx4cvlf7/meson-python-native-file.ini b_ndebug : if-release b_vscrt : md buildtype : release Found ninja-1.11.1.git.kitware.jobserver-1 at /meson-python/venv/bin/ninja + /meson-python/venv/bin/ninja [1/5] Compiling C object lib/libexample.so.p/examplelib.c.o [2/5] Linking target lib/libexample.so [3/5] Compiling C object example.cpython-314-x86_64-linux-gnu.so.p/examplemod.c.o [4/5] Generating symbol file lib/libexample.so.p/libexample.so.symbols [5/5] Linking target example.cpython-314-x86_64-linux-gnu.so [1/2] /meson-python/tests/packages/link-against-local-lib/.mesonpy-gx4cvlf7/lib/libexample.so [2/2] /meson-python/tests/packages/link-against-local-lib/.mesonpy-gx4cvlf7/example.cpython-314-x86_64-linux-gnu.so =========================== short test summary info ============================ FAILED tests/test_wheel.py::test_local_lib - subprocess.CalledProcessError: Command '['/tmp/pytest-of-root/pytest-5/meso... ====================== 1 failed, 122 deselected in 3.01s ======================= # /tmp/pytest-of-root/pytest-5/mesonpy-test-venv0/bin/python3.14 -c 'import example; print(example.example_sum(1, 2))' Segmentation fault (core dumped) # gdb /tmp/pytest-of-root/pytest-5/mesonpy-test-venv0/bin/python3.14 (gdb) run -c 'import example; print(example.example_sum(1, 2))' Program received signal SIGSEGV, Segmentation fault. 0x00007f05fbbe1294 in ?? () (gdb) bt #0 0x00007f05fbbe1294 in ?? () #1 0x00007f05fbbfc310 in call_init (l=0x56104552bed0, argc=3, argv=0x7fffbdb0b0f8, env=0x7fffbdb0b118) at dl-init.c:60 #2 call_init (l=0x56104552bed0, argc=3, argv=0x7fffbdb0b0f8, env=0x7fffbdb0b118) at dl-init.c:26 #3 0x00007f05fbbfc42d in _dl_init (main_map=0x56104552bed0, argc=3, argv=0x7fffbdb0b0f8, env=0x7fffbdb0b118) at dl-init.c:121 #4 0x00007f05fbbf9562 in __GI__dl_catch_exception ( exception=exception@entry=0x0, operate=operate@entry=0x7f05fbc030a0 <call_dl_init>, args=args@entry=0x7fffbdb09fc0) at dl-catch.c:215 #5 0x00007f05fbc03039 in dl_open_worker (a=a@entry=0x7fffbdb09fc0) at dl-open.c:785 #6 0x00007f05fbbf94c3 in __GI__dl_catch_exception ( exception=exception@entry=0x7fffbdb09fa0, operate=operate@entry=0x7f05fbc02fb0 <dl_open_worker>, args=args@entry=0x7fffbdb09fc0) at dl-catch.c:241 #7 0x00007f05fbc03424 in _dl_open ( file=0x7f05fb0d94f0 "/tmp/pytest-of-root/pytest-5/mesonpy-test-venv0/lib64/python3.14/site-packages/example.cpython-314-x86_64-linux-gnu.so", mode=<optimized out>, caller_dlopen=0x7f05fb869e21 <_imp_create_dynamic+929>, nsid=<optimized out>, argc=3, argv=0x7fffbdb0b0f8, env=0x7fffbdb0b118) at dl-open.c:860 #8 0x00007f05fb47a9b4 in dlopen_doit () from /lib64/libc.so.6 #9 0x00007f05fbbf94c3 in __GI__dl_catch_exception ( exception=exception@entry=0x7fffbdb0a1b0, operate=0x7f05fb47a950 <dlopen_doit>, args=0x7fffbdb0a270) at dl-catch.c:241 #10 0x00007f05fbbf9619 in _dl_catch_error (objname=0x7fffbdb0a218, errstring=0x7fffbdb0a220, mallocedp=0x7fffbdb0a217, operate=<optimized out>, args=<optimized out>) at dl-catch.c:260 #11 0x00007f05fb47a4a3 in _dlerror_run () from /lib64/libc.so.6 #12 0x00007f05fb47aa6f in dlopen.5 () from /lib64/libc.so.6 #13 0x00007f05fb869e21 in _imp_create_dynamic () from /lib64/libpython3.14.so.1.0 #14 0x00007f05fb77dccb in cfunction_vectorcall_FASTCALL () from /lib64/libpython3.14.so.1.0 #15 0x00007f05fb75b05a in _PyEval_EvalFrameDefault () from /lib64/libpython3.14.so.1.0 #16 0x00007f05fb77aec2 in object_vacall () from /lib64/libpython3.14.so.1.0 #17 0x00007f05fb7b441e in PyObject_CallMethodObjArgs () from /lib64/libpython3.14.so.1.0 #18 0x00007f05fb7b35bd in PyImport_ImportModuleLevelObject () from /lib64/libpython3.14.so.1.0 #19 0x00007f05fb75d7c9 in _PyEval_EvalFrameDefault () from /lib64/libpython3.14.so.1.0 #20 0x00007f05fb82d3bb in PyEval_EvalCode () from /lib64/libpython3.14.so.1.0 #21 0x00007f05fb852050 in run_eval_code_obj () from /lib64/libpython3.14.so.1.0 #22 0x00007f05fb84af83 in run_mod () from /lib64/libpython3.14.so.1.0 #23 0x00007f05fb83d8ee in _PyRun_StringFlagsWithName.constprop.0 () from /lib64/libpython3.14.so.1.0 #24 0x00007f05fb83d798 in _PyRun_SimpleStringFlagsWithName () from /lib64/libpython3.14.so.1.0 #25 0x00007f05fb8647e4 in Py_RunMain () from /lib64/libpython3.14.so.1.0 #26 0x00007f05fb81c7ec in Py_BytesMain () from /lib64/libpython3.14.so.1.0 #27 0x00007f05fb4120c8 in __libc_start_call_main () from /lib64/libc.so.6 #28 0x00007f05fb41218b in __libc_start_main_impl () from /lib64/libc.so.6 #29 0x0000561011e4f095 in _start ()
I can also reproduce by building https://src.fedoraproject.org/rpms/python-meson-python @ rawhide (90b81c9e8645a4c08f6f74127746eadddd18ae2b) in mock: [python-meson-python (rawhide)]$ fedpkg --release rawhide mockbuild ... =================================== FAILURES =================================== ________________________________ test_local_lib ________________________________ venv = <tests.conftest.VEnv object at 0x7ff3d4b590f0> wheel_link_against_local_lib = PosixPath('/tmp/pytest-of-mockbuild/pytest-0/test0/mesonpy-test-pbruvgfn/link_against_local_lib-1.0.0-cp313-cp313-linux_x86_64.whl') @pytest.mark.skipif(sys.platform not in {'linux', 'darwin'}, reason='Not supported on this platform') def test_local_lib(venv, wheel_link_against_local_lib): venv.pip('install', wheel_link_against_local_lib) > output = venv.python('-c', 'import example; print(example.example_sum(1, 2))') tests/test_wheel.py:160: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ tests/conftest.py:107: in python return subprocess.check_output([self.executable, *args]).decode() /usr/lib64/python3.13/subprocess.py:472: in check_output return run(*popenargs, stdout=PIPE, timeout=timeout, check=True, _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ input = None, capture_output = False, timeout = None, check = True popenargs = (['/tmp/pytest-of-mockbuild/pytest-0/mesonpy-test-venv4/bin/python3', '-c', 'import example; print(example.example_sum(1, 2))'],) kwargs = {'stdout': -1} process = <Popen: returncode: -11 args: ['/tmp/pytest-of-mockbuild/pytest-0/mesonpy-te...> stdout = b'', stderr = None, retcode = -11 def run(*popenargs, input=None, capture_output=False, timeout=None, check=False, **kwargs): """Run command with arguments and return a CompletedProcess instance. The returned instance will have attributes args, returncode, stdout and stderr. By default, stdout and stderr are not captured, and those attributes will be None. Pass stdout=PIPE and/or stderr=PIPE in order to capture them, or pass capture_output=True to capture both. If check is True and the exit code was non-zero, it raises a CalledProcessError. The CalledProcessError object will have the return code in the returncode attribute, and output & stderr attributes if those streams were captured. If timeout is given, and the process takes too long, a TimeoutExpired exception will be raised. There is an optional argument "input", allowing you to pass bytes or a string to the subprocess's stdin. If you use this argument you may not also use the Popen constructor's "stdin" argument, as it will be used internally. By default, all communication is in bytes, and therefore any "input" should be bytes, and the stdout and stderr will be bytes. If in text mode, any "input" should be a string, and stdout and stderr will be strings decoded according to locale encoding, or by "encoding" if set. Text mode is triggered by setting any of text, encoding, errors or universal_newlines. The other arguments are the same as for the Popen constructor. """ if input is not None: if kwargs.get('stdin') is not None: raise ValueError('stdin and input arguments may not both be used.') kwargs['stdin'] = PIPE if capture_output: if kwargs.get('stdout') is not None or kwargs.get('stderr') is not None: raise ValueError('stdout and stderr arguments may not be used ' 'with capture_output.') kwargs['stdout'] = PIPE kwargs['stderr'] = PIPE with Popen(*popenargs, **kwargs) as process: try: stdout, stderr = process.communicate(input, timeout=timeout) except TimeoutExpired as exc: process.kill() if _mswindows: # Windows accumulates the output in a single blocking # read() call run on child threads, with the timeout # being done in a join() on those threads. communicate() # _after_ kill() is required to collect that and add it # to the exception. exc.stdout, exc.stderr = process.communicate() else: # POSIX _communicate already populated the output so # far into the TimeoutExpired exception. process.wait() raise except: # Including KeyboardInterrupt, communicate handled that. process.kill() # We don't call process.wait() as .__exit__ does that for us. raise retcode = process.poll() if check and retcode: > raise CalledProcessError(retcode, process.args, output=stdout, stderr=stderr) E subprocess.CalledProcessError: Command '['/tmp/pytest-of-mockbuild/pytest-0/mesonpy-test-venv4/bin/python3', '-c', 'import example; print(example.example_sum(1, 2))']' died with <Signals.SIGSEGV: 11>. /usr/lib64/python3.13/subprocess.py:577: CalledProcessError ---------------------------- Captured stdout setup ----------------------------- Initialized empty Git repository in /builddir/build/BUILD/python-meson-python-0.17.0-build/meson_python-0.17.0/tests/packages/link-against-local-lib/.git/ + meson setup /builddir/build/BUILD/python-meson-python-0.17.0-build/meson_python-0.17.0/tests/packages/link-against-local-lib /builddir/build/BUILD/python-meson-python-0.17.0-build/meson_python-0.17.0/tests/packages/link-against-local-lib/.mesonpy-2x5o62cf -Dbuildtype=release -Db_ndebug=if-release -Db_vscrt=md --native-file=/builddir/build/BUILD/python-meson-python-0.17.0-build/meson_python-0.17.0/tests/packages/link-against-local-lib/.mesonpy-2x5o62cf/meson-python-native-file.ini The Meson build system Version: 1.5.1 Source dir: /builddir/build/BUILD/python-meson-python-0.17.0-build/meson_python-0.17.0/tests/packages/link-against-local-lib Build dir: /builddir/build/BUILD/python-meson-python-0.17.0-build/meson_python-0.17.0/tests/packages/link-against-local-lib/.mesonpy-2x5o62cf Build type: native build Project name: link-against-local-lib Project version: 1.0.0 C compiler for the host machine: gcc (gcc 14.2.1 "gcc (GCC) 14.2.1 20240912 (Red Hat 14.2.1-4)") C linker for the host machine: gcc ld.bfd 2.43.50.20241014 Host machine cpu family: x86_64 Host machine cpu: x86_64 Program python found: YES (/usr/bin/python3) Found pkg-config: YES (/usr/bin/pkg-config) 2.3.0 Run-time dependency python found: YES 3.13 WARNING: Please do not define rpath with a linker argument, use install_rpath or build_rpath properties instead. This will become a hard error in a future Meson release. Build targets in project: 2 link-against-local-lib 1.0.0 User defined options Native files: /builddir/build/BUILD/python-meson-python-0.17.0-build/meson_python-0.17.0/tests/packages/link-against-local-lib/.mesonpy-2x5o62cf/meson-python-native-file.ini buildtype : release b_ndebug : if-release b_vscrt : md Found ninja-1.12.1 at /usr/bin/ninja + /usr/bin/ninja [1/5] Compiling C object lib/libexample.so.p/examplelib.c.o [2/5] Linking target lib/libexample.so [3/5] Compiling C object example.cpython-313-x86_64-linux-gnu.so.p/examplemod.c.o [4/5] Generating symbol file lib/libexample.so.p/libexample.so.symbols [5/5] Linking target example.cpython-313-x86_64-linux-gnu.so [1/2] /builddir/build/BUILD/python-meson-python-0.17.0-build/meson_python-0.17.0/tests/packages/link-against-local-lib/.mesonpy-2x5o62cf/lib/libexample.so [2/2] /builddir/build/BUILD/python-meson-python-0.17.0-build/meson_python-0.17.0/tests/packages/link-against-local-lib/.mesonpy-2x5o62cf/example.cpython-313-x86_64-linux-gnu.so =========================== short test summary info ============================ ... FAILED tests/test_wheel.py::test_local_lib - subprocess.CalledProcessError: C... ================== 1 failed, 111 passed, 11 skipped in 51.00s ================== RPM build warnings: RPM build errors: error: Bad exit status from /var/tmp/rpm-tmp.koxgcb (%check)
In Koji, this happens on x86_64 and i686, but not on s390x, aarch64, ppc64le. https://koji.fedoraproject.org/koji/taskinfo?taskID=125164632 x86_64 https://koji.fedoraproject.org/koji/taskinfo?taskID=125164654 i686 Reproduce with: [python-meson-python (rawhide)]$ fedpkg build --scratch --arches x86_64 [python-meson-python (rawhide)]$ fedpkg build --scratch --arches i686
FWIW it looks like 2.43.50-1.fc42 to 2.43.50-4.fc42 never passed gating and 2.43.50-5.fc42 landed via https://bodhi.fedoraproject.org/updates/FEDORA-2024-109a9172e1 with the test waived.
binutils-2.43.50-1.fc42.x86_64 is OK binutils-2.43.50-2.fc42.x86_64 is bad binutils-2.43.50-4.fc42.x86_64 is bad binutils-2.43.50-5.fc42.x86_64 is bad
Some comments/suggestions: * Given that rawhide binutils builds are now based upon snapshots of the upstream binutils development sources, you may have more success reporting this bug upstream. * Are you building with LTO enabled ? If so, does the problem go away if LTO is not used ? (Not that this is a solution, but it does help to narrow down the problem area). * Is it possible to create a small stand-alone reproducer for the problem ? Having to build python from scratch and then run its testsuite each time is going to make it very hard to narrow down which commit to the binutils sources introduced the bug.
I'll check the rest, but just a short answer: > Having to build python from scratch and then run its testsuite... This is probably a confusion. Building Python from source is not needed. Use Python packaged in Fedora.
Sticking %global _lto_cflags %{nil} into python-meson-python.spec makes no difference. I have not tested using a Python built with LTO disabled (because I am using Python packaged in Fedora).
Smaller reproducer: # dnf install git-core pip gcc ninja python3-devel # git clone https://github.com/mesonbuild/meson-python.git # cd meson-python/tests/packages/link-against-local-lib/ # pip install . # python3 -c 'import example' Segmentation fault (core dumped)
It looks like the main unique thing about this particular test is that it uses -Wl,-rpath,custom-rpath. That’s probably relevant. https://github.com/mesonbuild/meson-python/blob/b43ffcd0c64fa9ef97e99c15ac3f1f43d9572324/tests/packages/link-against-local-lib/meson.build#L15C18-L15C41
Built with binutils-2.43.50-1.fc42.x86_64: $ readelf -d /usr/local/lib64/python3.13/site-packages/example.cpython-313-x86_64-linux-gnu.so Dynamic section at offset 0x2e28 contains 22 entries: Tag Type Name/Value 0x0000000000000001 (NEEDED) Shared library: [libexample.so] 0x000000000000001d (RUNPATH) Library runpath: [$ORIGIN/.link_against_local_lib.mesonpy.libs:custom-rpath] 0x000000000000000c (INIT) 0x1000 0x000000000000000d (FINI) 0x11a4 0x0000000000000019 (INIT_ARRAY) 0x3e10 0x000000000000001b (INIT_ARRAYSZ) 8 (bytes) 0x000000000000001a (FINI_ARRAY) 0x3e18 0x000000000000001c (FINI_ARRAYSZ) 8 (bytes) 0x000000006ffffef5 (GNU_HASH) 0x2028 0x0000000000000005 (STRTAB) 0x5000 0x0000000000000006 (SYMTAB) 0x2050 0x000000000000000a (STRSZ) 251 (bytes) 0x000000000000000b (SYMENT) 24 (bytes) 0x0000000000000003 (PLTGOT) 0x3fe8 0x0000000000000002 (PLTRELSZ) 120 (bytes) 0x0000000000000014 (PLTREL) RELA 0x0000000000000017 (JMPREL) 0x2310 0x0000000000000007 (RELA) 0x2208 0x0000000000000008 (RELASZ) 264 (bytes) 0x0000000000000009 (RELAENT) 24 (bytes) 0x000000006ffffff9 (RELACOUNT) 7 0x0000000000000000 (NULL) 0x0 Built with binutils-2.43.50-5.fc42.x86_64: $ readelf -d /usr/local/lib64/python3.13/site-packages/example.cpython-313-x86_64-linux-gnu.so Dynamic section at offset 0x1e28 contains 22 entries: Tag Type Name/Value 0x0000000000000001 (NEEDED) Shared library: [libexample.so] 0x000000000000001d (RUNPATH) Library runpath: [$ORIGIN/.link_against_local_lib.mesonpy.libs:custom-rpath] 0x000000000000000c (INIT) 0x294 0x000000000000000d (FINI) 0x434 0x0000000000000019 (INIT_ARRAY) 0x2e10 0x000000000000001b (INIT_ARRAYSZ) 8 (bytes) 0x000000000000001a (FINI_ARRAY) 0x2e18 0x000000000000001c (FINI_ARRAYSZ) 8 (bytes) 0x000000006ffffef5 (GNU_HASH) 0x1000 0x0000000000000005 (STRTAB) 0x41d0 0x0000000000000006 (SYMTAB) 0x1028 0x000000000000000a (STRSZ) 251 (bytes) 0x000000000000000b (SYMENT) 24 (bytes) 0x0000000000000003 (PLTGOT) 0x2fe8 0x0000000000000002 (PLTRELSZ) 120 (bytes) 0x0000000000000014 (PLTREL) RELA 0x0000000000000017 (JMPREL) 0x12e8 0x0000000000000007 (RELA) 0x11e0 0x0000000000000008 (RELASZ) 264 (bytes) 0x0000000000000009 (RELAENT) 24 (bytes) 0x000000006ffffff9 (RELACOUNT) 7 0x0000000000000000 (NULL) 0x0 diff -Dynamic section at offset 0x2e28 contains 22 entries: +Dynamic section at offset 0x1e28 contains 22 entries: Tag Type Name/Value 0x0000000000000001 (NEEDED) Shared library: [libexample.so] 0x000000000000001d (RUNPATH) Library runpath: [$ORIGIN/.link_against_local_lib.mesonpy.libs:custom-rpath] - 0x000000000000000c (INIT) 0x1000 - 0x000000000000000d (FINI) 0x11a4 - 0x0000000000000019 (INIT_ARRAY) 0x3e10 + 0x000000000000000c (INIT) 0x294 + 0x000000000000000d (FINI) 0x434 + 0x0000000000000019 (INIT_ARRAY) 0x2e10 0x000000000000001b (INIT_ARRAYSZ) 8 (bytes) - 0x000000000000001a (FINI_ARRAY) 0x3e18 + 0x000000000000001a (FINI_ARRAY) 0x2e18 0x000000000000001c (FINI_ARRAYSZ) 8 (bytes) - 0x000000006ffffef5 (GNU_HASH) 0x2028 - 0x0000000000000005 (STRTAB) 0x5000 - 0x0000000000000006 (SYMTAB) 0x2050 + 0x000000006ffffef5 (GNU_HASH) 0x1000 + 0x0000000000000005 (STRTAB) 0x41d0 + 0x0000000000000006 (SYMTAB) 0x1028 0x000000000000000a (STRSZ) 251 (bytes) 0x000000000000000b (SYMENT) 24 (bytes) - 0x0000000000000003 (PLTGOT) 0x3fe8 + 0x0000000000000003 (PLTGOT) 0x2fe8 0x0000000000000002 (PLTRELSZ) 120 (bytes) 0x0000000000000014 (PLTREL) RELA - 0x0000000000000017 (JMPREL) 0x2310 - 0x0000000000000007 (RELA) 0x2208 + 0x0000000000000017 (JMPREL) 0x12e8 + 0x0000000000000007 (RELA) 0x11e0 0x0000000000000008 (RELASZ) 264 (bytes) 0x0000000000000009 (RELAENT) 24 (bytes) 0x000000006ffffff9 (RELACOUNT) 7
(In reply to Miro Hrončok from comment #10) > [$ORIGIN/.link_against_local_lib.mesonpy.libs:custom-rpath] > - 0x000000000000000d (FINI) 0x11a4 > - 0x0000000000000019 (INIT_ARRAY) 0x3e10 > + 0x000000000000000c (INIT) 0x294 > + 0x000000000000000d (FINI) 0x434 > + 0x0000000000000019 (INIT_ARRAY) 0x2e10 This caught my interest. DT_INIT points to non-executable memory (same page as the ELF header, which is not executable with separate-code). The backtrace confirms the crash happens when DT_INIT is invoked: (gdb) bt #0 0x00007f51053c3294 in ?? () #1 0x00007f51055065d0 in call_init (l=0x5647f6caa7e0, argc=3, argv=0x7ffd85e5d8f8, env=0x7ffd85e5d918) at dl-init.c:60 #2 call_init (l=0x5647f6caa7e0, argc=3, argv=0x7ffd85e5d8f8, env=0x7ffd85e5d918) at dl-init.c:26 #3 0x00007f51055066ed in _dl_init (main_map=0x5647f6caa7e0, argc=3, argv=0x7ffd85e5d8f8, env=0x7ffd85e5d918) at dl-init.c:121 Unlike DT_INIT_ARRAY, DT_INIT points to actual code, so it has to be in an executable segment. Is there a way to get pip to pass some flags to ninja so that it prints the actual commands executed? I got something out of strace: 606 execve("/usr/bin/ld", ["/usr/bin/ld", "-plugin", "/usr/libexec/gcc/x86_64-redhat-linux/14/liblto_plugin.so", "-plugin-opt=/usr/libexec/gcc/x86_64-redhat-linux/14/lto-wrapper", "-plugin-opt=-fresolution=/tmp/ccK0d8nu.res", "-plugin-opt=-pass-through=-lgcc", "-plugin-opt=-pass-through=-lgcc_s", "-plugin-opt=-pass-through=-lc", "-plugin-opt=-pass-through=-lgcc", "-plugin-opt=-pass-through=-lgcc_s", "--build-id", "--no-add-needed", "--eh-frame-hdr", "--hash-style=gnu", "-m", "elf_x86_64", "-dynamic-linker", "/lib64/ld-linux-x86-64.so.2", "/usr/lib/gcc/x86_64-redhat-linux/14/../../../../lib64/crt1.o", "/usr/lib/gcc/x86_64-redhat-linux/14/../../../../lib64/crti.o", "/usr/lib/gcc/x86_64-redhat-linux/14/crtbegin.o", "-L/usr/lib/gcc/x86_64-redhat-linux/14", "-L/usr/lib/gcc/x86_64-redhat-linux/14/../../../../lib64", "-L/lib/../lib64", "-L/usr/lib/../lib64", "-L/usr/lib/gcc/x86_64-redhat-linux/14/../../..", "--version", "-lgcc", "--push-state", "--as-needed", "-lgcc_s", "--pop-state", "-lc", "-lgcc", "--push-state", "--as-needed", "-lgcc_s", "--pop-state", "/usr/lib/gcc/x86_64-redhat-linux/14/crtend.o", "/usr/lib/gcc/x86_64-redhat-linux/14/../../../../lib64/crtn.o"], 0x7fff9c992c60 /* 24 vars */ <unfinished ...> It looks all harmless. But then I see this: 669 execve("/tmp/pip-build-env-w5i_0ljx/normal/bin/patchelf", ["patchelf", "--print-rpath", "/meson-python/tests/packages/link-against-local-lib/.mesonpy-3adtna1_/lib/libexample.so"], 0x55b6edea9620 /* 17 vars */ <unfinished ...> 669 <... execve resumed>) = 0 670 execve("/tmp/pip-build-env-w5i_0ljx/normal/bin/patchelf", ["patchelf", "--print-rpath", "/meson-python/tests/packages/link-against-local-lib/.mesonpy-3adtna1_/example.cpython-313-x86_64-linux-gnu.so"], 0x55b6edea9620 /* 17 vars */ <unfinished ...> 670 <... execve resumed>) = 0 671 execve("/tmp/pip-build-env-w5i_0ljx/normal/bin/patchelf", ["patchelf", "--set-rpath", "$ORIGIN/.link_against_local_lib.mesonpy.libs:custom-rpath", "/meson-python/tests/packages/link-against-local-lib/.mesonpy-3adtna1_/example.cpython-313-x86_64-linux-gnu.so"], 0x55b6edea9620 /> 671 <... execve resumed>) = 0 Is it possible this is a duplicate of bug 2319341? Can you reproduce this without patchelf? At a conceptual level, patchelf cannot work on GNU/Linux, and it definitely should not be used in our standard build tools.
Hi Miro, It looks like Florian is onto something with the patchelf discovery. > Smaller reproducer: > > # dnf install git-core pip gcc ninja python3-devel > # git clone https://github.com/mesonbuild/meson-python.git > # cd meson-python/tests/packages/link-against-local-lib/ > # pip install . > # python3 -c 'import example' > Segmentation fault (core dumped) How do I specify which linker to use in the steps above ? Say for example I am trying to perform a binary search through the commits to the binutils source tree to find the one that triggers the bug. I checkout the binutils sources up to a specific commit, build a linker with them and then what ? How do I use this built-but-not-installed linker to trigger/not-trigger the bug ? Also - is it possible to specify some extra linker command line options to be used when building libexample.so ? For example I would like to see if adding -Wl,--no-rosegment makes any difference. (It might be that the changed layout triggered by --rosegment confuses patchelf which then goes on to create a broken binary). Cheers Nick
The --verbose flag for pip makes pip show the output of the build backend, the -Ccompile-args=-v option passes -v to meson (and hence also to ninja): $ pip --verbose install -Ccompile-args=-v . ... + /usr/bin/ninja -v [1/5] cc -Ilib/libexample.so.p -Ilib -I../lib -fdiagnostics-color=always -DNDEBUG -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -O3 -fPIC -MD -MQ lib/libexample.so.p/examplelib.c.o -MF lib/libexample.so.p/examplelib.c.o.d -o lib/libexample.so.p/examplelib.c.o -c ../lib/examplelib.c [2/5] cc -o lib/libexample.so lib/libexample.so.p/examplelib.c.o -Wl,--as-needed -Wl,--no-undefined -Wl,-O1 -shared -fPIC -Wl,-soname,libexample.so [3/5] cc -Iexample.cpython-313-x86_64-linux-gnu.so.p -I. -I.. -I/usr/include/python3.13 -fvisibility=hidden -fdiagnostics-color=always -DNDEBUG -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -O3 -fPIC -MD -MQ example.cpython-313-x86_64-linux-gnu.so.p/examplemod.c.o -MF example.cpython-313-x86_64-linux-gnu.so.p/examplemod.c.o.d -o example.cpython-313-x86_64-linux-gnu.so.p/examplemod.c.o -c ../examplemod.c [4/5] /tmp/pip-build-env-mpu743ej/overlay/bin/meson --internal symbolextractor /meson-python/tests/packages/link-against-local-lib/.mesonpy-bw4dftts lib/libexample.so lib/libexample.so lib/libexample.so.p/libexample.so.symbols [5/5] cc -o example.cpython-313-x86_64-linux-gnu.so example.cpython-313-x86_64-linux-gnu.so.p/examplemod.c.o -Wl,--as-needed -Wl,--allow-shlib-undefined -Wl,-O1 -shared -fPIC '-Wl,-rpath,$ORIGIN/lib' -Wl,-rpath-link,/meson-python/tests/packages/link-against-local-lib/.mesonpy-bw4dftts/lib lib/libexample.so -Wl,-rpath,custom-rpath [1/2] /meson-python/tests/packages/link-against-local-lib/.mesonpy-bw4dftts/lib/libexample.so [2/2] /meson-python/tests/packages/link-against-local-lib/.mesonpy-bw4dftts/example.cpython-313-x86_64-linux-gnu.so Unfortunately, it doe snot say anything about patchelf.
> Can you reproduce this without patchelf? I was not yet able to construct a reproducer that removes meson-python. In ELN, we patch patchelf out of meson-python, but we also skip this test. When unskipped, it simply fails with "púatchelf not supported". > How do I specify which linker to use in the steps above ? With $LDFLAGS. $ LDFLAGS=-fuse-ld=gold pip --verbose install -Ccompile-args=-v . ... ... [2/5] cc -o lib/libexample.so lib/libexample.so.p/examplelib.c.o -Wl,--as-needed -Wl,--no-undefined -Wl,-O1 -shared -fPIC -Wl,-soname,libexample.so -fuse-ld=gold ... [5/5] cc -o example.cpython-313-x86_64-linux-gnu.so example.cpython-313-x86_64-linux-gnu.so.p/examplemod.c.o -Wl,--as-needed -Wl,--allow-shlib-undefined -Wl,-O1 -shared -fPIC -fuse-ld=gold '-Wl,-rpath,$ORIGIN/lib' -Wl,-rpath-link,/meson-python/tests/packages/link-against-local-lib/.mesonpy-gsg2i47d/lib lib/libexample.so -Wl,-rpath,custom-rpath ... $ python -c 'import example' (Does not segfault.)
(In reply to Nick Clifton from comment #12) > Also - is it possible to specify some extra linker command line options to > be used when building libexample.so ? For example I would like to see if > adding -Wl,--no-rosegment makes any difference. Also LDFLAGS. Adding -Wl,--no-rosegment makes a difference (no segfault).
Should I still report this to binutils upstream?
(In reply to Miro Hrončok from comment #16) > Should I still report this to binutils upstream? If you can reproduce the problem without patchelf being part of the process then yes. Otherwise it might be worth reporting this bug to both the patchelf *and* the binutils upstream communities.
If I just patch the patchelf --set-rpath call out of the build, I obviously get: $ python3 -c 'import example; print(example.example_sum(1, 2))' Traceback (most recent call last): File "<string>", line 1, in <module> import example; print(example.example_sum(1, 2)) ^^^^^^^^^^^^^^ ImportError: libexample.so: cannot open shared object file: No such file or directory When I set LD_LIBRARY_PATH to compensate, the import works. So this is clearly isolated to patchelf --set-rpath.
(In reply to Miro Hrončok from comment #18) > If I just patch the patchelf --set-rpath call out of the build, I obviously > get: > > $ python3 -c 'import example; print(example.example_sum(1, 2))' > Traceback (most recent call last): > File "<string>", line 1, in <module> > import example; print(example.example_sum(1, 2)) > ^^^^^^^^^^^^^^ > ImportError: libexample.so: cannot open shared object file: No such file or > directory > > > When I set LD_LIBRARY_PATH to compensate, the import works. So this is > clearly isolated to patchelf --set-rpath. Right - so it is really necessary to use patchelf ? I presume that ordinary python users are not expected to do so, so wouldn't it be better to change the python test itself to use LD_LIBRATY_PATH instead of patchelf ?
What do you mean by "the python test"? Again, building or testing Python itself is not involved here at all. This is meson-python, which uses patchelf for the "auditwheel" case. The use case is described at https://github.com/mesonbuild/meson-python/blob/0.17.1/mesonpy/__init__.py#L431 Is it necessary to use patchelf? No idea. But using LD_LIBRATY_PATH in the test makes the test useless. The test tests that the built Python extension module is functional.
(In reply to Miro Hrončok from comment #20) > What do you mean by "the python test"? Again, building or testing Python > itself is not involved here at all. Sorry - I was not clear. I meant the test where a shared library is built but then cannot be loaded by Python. > Is it necessary to use patchelf? No idea. But using LD_LIBRATY_PATH in the > test makes the test useless. Really ? > The test tests that the built Python extension module is functional. And it would be functional if it were to be installed into the correct location. But since that is not something that the test harness wants to do, it makes sense to run the test with LD_LIBRARY_PATH specified. The test still checks that the module loads and behaves correctly, it just uses the system approved method for loading shared objects that are not installed into standard locations.
As I understand it, the point of all this patchelf/rpath stuff is to support building wheels with bundled shared libraries. Obviously we want to avoid that when building Fedora packages, but end users can use the system python3-meson-python package too, and it’s a use case meson-python is designed to support. Ideally this kind of bundling would work as advertised even though it’s not useful for building Fedora packages. Requiring LD_LIBRARY_PATH to be set because patchelf is broken (?) doesn’t help these users build “portable” wheels. Besides, this is not specific to Fedora’s python-meson-python package. As shown in https://bugzilla.redhat.com/show_bug.cgi?id=2321588#c8, it will also affect people using meson-python installed from PyPI in a virtual environment. This does seem awfully similar to bug 2319341 / https://github.com/NixOS/patchelf/issues/568.
(In reply to Ben Beasley from comment #22) > As I understand it, the point of all this patchelf/rpath stuff is to support > building wheels with bundled shared libraries. Obviously we want to avoid > that when building Fedora packages, but end users can use the system > python3-meson-python package too, and it’s a use case meson-python is > designed to support. Ideally this kind of bundling would work as advertised > even though it’s not useful for building Fedora packages. Requiring > LD_LIBRARY_PATH to be set because patchelf is broken (?) doesn’t help these > users build “portable” wheels. But using patchelf to rewrite the binaries that just have been built is really awkward. Why not use -Wl,-rpath=… during the linker invocations? I get that patchelf is sometimes necessary if you can't relink, but this shouldn't really be one of those scenarios.
> But using patchelf to rewrite the binaries that just have been built is really awkward. I don't disagree. But using LD_LIBRARY_PATH in the test defies the purpose of that test.
(In reply to Miro Hrončok from comment #24) > > But using patchelf to rewrite the binaries that just have been built is really awkward. > > I don't disagree. But using LD_LIBRARY_PATH in the test defies the purpose > of that test. From my perspective, the alternative to patchelf is using -Wl,-rpath=… during linking. Not setting LD_LIBRARY_PATH at run time. We can figure out if there is anything we can do on the toolchain side to make this easier for Meson/Python to accomplish. But using patchelf under such circumstances has this “I can't figure out how my build system works” smell, sorry.
Well, I admit, I can't figure out how this build system works. It's pip calling meson-python, calling meson, calling ninja, calling gcc. It's a frontend around a wrapper around a build system which is a wrapper. It's convoluted. Upstream meson-python has decided to use patchelf and telling me that it is wrong does not help me. Could meson-python tell meson to tell ninja to tell gcc that it will eventually move the shared libraries somewhere else, so it will change the value of -Wl,-rpath=… during linking? Maybe... I don't know. I am no meson expert. I understand your perspective, but if patchelf is fundamentally broken we should strive to deprecate it and remove it from Fedora rather than just saying it's bad. I reported what seems like a regression in binutils -- it worked and now it no longer works. Could you please help me with that? --- I tried bit bisecting between c839a44c391 (2.43.50-1) and 1f4aee70ed1 (2.43.50-2) but when I use `git archive` to produce snapshot tarablls for binutils.spec, the builds all fail with: configure: error: Building GDB requires GMP 4.2+, and MPFR 3.1.0+. ... make: *** No rule to make target 'all'. Stop. E.g. https://koji.fedoraproject.org/koji/taskinfo?taskID=125327758
See also this comment from 4 months ago https://src.fedoraproject.org/rpms/python-meson-python/pull-request/7#comment-203299 """ If you or someone else happens to be able to clearly articulate how meson-python could handle this better without having to fix up rpaths after the fact, it wouldn’t hurt to file an issue upstream. I don’t know if they would be receptive—this upstream has been known to act a bit prickly on technical matters—but as it stands upstream doesn’t know anyone is dissatisfied with the use of patchelf. """
(In reply to Miro Hrončok from comment #26) > I tried bit bisecting between c839a44c391 (2.43.50-1) and 1f4aee70ed1 > (2.43.50-2) but when I use `git archive` to produce snapshot tarablls for > binutils.spec, the builds all fail with: > > configure: error: Building GDB requires GMP 4.2+, and MPFR 3.1.0+. > ... > make: *** No rule to make target 'all'. Stop. Try just deleting the gdb/ directory after unpacking the tarball and before running configure. Or installing the mpfr-devel and gmp-devel rpms.
Deleting the gdb/ directory only gets me to a certain point. make[3]: *** No rule to make target '../../sim/../gdb/version.in', needed by 'common/version.c-stamp'. Stop. No idea how the tarballs actually used for the "official" snapshots are created.
(In reply to Miro Hrončok from comment #29) > Deleting the gdb/ directory only gets me to a certain point. > > make[3]: *** No rule to make target '../../sim/../gdb/version.in', needed by > 'common/version.c-stamp'. Stop. OK so maybe delete all of the gdb specific directories. ie gdb, sim, readline, libdecnumber, gnulib, gdbsupport, libbacktrace. > No idea how the tarballs actually used for the "official" snapshots are > created. There is a script at the top level of the combined gdb/binutils source tree which is used to create releases: src-release.sh. For example to create a binutils release in the form of an xz compressed tarball run: ./src-release.sh -x binutils
Thanks. git bisect says bf6d7087de0a7351fd1dfd5f41522a7f4f576180 is the first new commit commit bf6d7087de0a7351fd1dfd5f41522a7f4f576180 Author: Nick Clifton <nickc> Date: Thu Sep 19 16:45:30 2024 +0100 ld: Move the .note.build-id section to near the start of the memory map. This helps GDB to locate the debug information associated with a core dump. Core dumps include the first page of an executable's image, and if this page include the .note.build-id section then GDB can find it and then track down a debug info file for that build-id. ld/scripttempl/elf.sc | 39 +++++++++++++++++++++++++++++++++++++-- 1 file changed, 37 insertions(+), 2 deletions(-)
Yeah - I am not surprised. But also I am reasonably certain that this is not a bug in the linker, but rather an unexpected change in behaviour which is triggering an incorrect assumption in patchelf. I would urge you to contact the patchelf maintainers and consult with them about this problem. If they can show that the linker is doing something wrong then I will happily fix it. But at the moment I am of the opinion that it is their problem, not ours.
https://koschei.fedoraproject.org/package/pypy3.10 seems to show a similar regression: the build uses: + patchelf --set-soname libpypy3.10-c.so.0.1 builddir/pypy3.10-7.3.15/lib64/libpypy3.10-c.so.0.1 + patchelf --replace-needed libpypy3.10-c.so libpypy3.10-c.so.0.1 builddir/pypy3.10-7.3.15/bin/pypy3.10 And later dies: + /builddir/build/BUILD/pypy3.10-7.3.15-build/BUILDROOT/usr/bin/pypy3.10 -c 'import _tkinter' /var/tmp/rpm-tmp.JLMQ5b: line 186: 14279 Segmentation fault (core dumped) /builddir/build/BUILD/pypy3.10-7.3.15-build/BUILDROOT/usr/bin/pypy3.10 -c 'import _tkinter' If you can give advice on how to replace patchelf in PyPy, I would appreciate it.
> But at the moment I am of the opinion that it is their problem, not ours. As it happens, this is now *my* problem, caused by a change in binutils that has not been coordinated with the rest of the distribution, which still relies on patchelf. Would you please consider undoing this change for now, until the patchelf situation is resolved?
I also want to stress that even if we manage to figure out a way for meson-python not to use patchelf when building wheels, we don't blow the issue out of the water. Patchelf is needed by anyone building manylinux Python wheels (via auditwheel). Eventually, what is now in Fedora Rawhide will become "old" and will be used for manylinux wheel building. Unless we figure out a new patchelf-less way of doing it, we need patchelf to work. Uncoordinated changes to binutils or other parts of the toolset and "this is your problem, not ours" comments worry me.
> Blocks: 2322939 Does this mean we move this bugzilla back to binutils?
Hi Miro, > Does this mean we move this bugzilla back to binutils? Ah - I had not noticed that. Yes, I think that it makes sense to take this PR back. It is still entirely possible that this is a linker bug. We just need to find out the root cause. (In reply to Miro Hrončok from comment #34) > > But at the moment I am of the opinion that it is their problem, not ours. > > As it happens, this is now *my* problem, caused by a change in binutils that > has not been coordinated with the rest of the distribution, which still > relies on patchelf. Well in our defense there is no way to coordinate changes to the linker before they are made. Well not to the whole of the Fedora ecosystem. We do run some basic gating tests including building and running a kernel using the new linker, but it would be unfeasible to rebuild and retest all of Fedora every time there is a change to the linker. Or the compiler, Or glibc. In essence this is what the rawhide distribution is about. Changes are made and tested here before they are released to the world as a whole. Hence I made a change to the linker - not gratuitously, but in order to improve the security of linked binaries - and we are now testing this change in rawhide. You have encountered a problem, reported it here and we are now trying to solve it. This is how the process works. Anyway the point I was trying to make is not that I am refusing to make a change to the linker, but rather that we need some help from the patchelf maintainers in order to find out what is going wrong. Once we know that we can decide between ourselves the best course of action for solving the problem. To this end I have created a clone of this PR but assigned to the patchelf maintainers: https://bugzilla.redhat.com/show_bug.cgi?id=2322939 > Would you please consider undoing this change for now, until the patchelf > situation is resolved? But when will it be resolved and how will it be tested ? Right now we have a convenient testbed - rawhide - that the patchelf people can use to examine how their tool behaves with this, lets call it, experimental linker. If I revert the patch then it will be much harder to test and much easier to just ignore the problem. [[After re-reading this paragraph I realised that just downgrading to one of the -2 .. -6 versions of the rawhide linker would allow testing, so the point is moot. Oh well]]. Plus the problem is still going to exist in the upstream binutils sources, so unless we permanently include a patch to the Fedora binutils rpms the problem is going to keep on resurfacing until it is fixed. And - presumably - this problem is also going to affect other distributions, not just Fedora. So just reverting an upstream change is not going to solve things in the long run. I do appreciate however that this does not solve your problem right now. So I propose the following: * On Monday (Nov 4th) I am planning another update of the rawhide binutils to sync it with the latest upstream development sources. There is a small possibility that this update might resolve the problem. Small, but not non-existant. So once that build is available (it will be 2.43.50-7) I will ping you and you can try it out. * Assuming that the new build does not fix the problem I will then create a patch to revert commit bf6d7087de0a7351fd1dfd5f41522a7f4f576180 and a new scratch build from that. I am assuming that you will be able to install this scratch build somewhere and test it, without me having to make an official update to rawhide. Is this correct ? * If the patch solves the problem we then have the issue of what to do for GDB. Commit 4f576180 is for them. It addresses an issue that GDB has relied upon the linker putting the .note.build-id section in the first page of an executable's image, so that core dumps can be triaged and connected to debug info files. Of course the linker does not have to layout code in this way, but it been how the linker has worked for a long time and changing it breaks GDB. So in essence we need to go back even further, to the patch that moved the .note.build-id section away from the start of the file and decide what to do there. This patch was part of a code size initiative that attempts to reduce the size of binaries by only creating one read-only loadable segment instead of two. At this point things get fuzzy. I am not sure how to balance the needs of code size (which is very important for container builds), GDB and python wheels. (Incidentally this does suggest another possible workaround that you try - linking with "-Wl,--noseparate-code" - this should use the old layout which combines read-only and code sections into one segment. Which is what has been the default in older distributions and which presumably works just fine with patchelf). * In the meantime we can continue to help/prod the patchelf people to see if we can find out if there is a bad assumption in patchelf or if the linker is really broken. Does this work for you ? Cheers Nick
(In reply to Nick Clifton from comment #37) > ...You have > encountered a problem, reported it here and we are now trying to solve it. > This is how the process works. Indeed. I got the impression that you do not intend to solve this because it's a problem in patchelf. Perhaps I misunderstood. > > Would you please consider undoing this change for now, until the patchelf > > situation is resolved? > > But when will it be resolved and how will it be tested ? Right now we have > a convenient testbed - rawhide - that the patchelf people can use to examine > how their tool behaves with this, lets call it, experimental linker. If I > revert the patch then it will be much harder to test and much easier to just > ignore the problem. [[After re-reading this paragraph I realised that just > downgrading to one of the -2 .. -6 versions of the rawhide linker would allow > testing, so the point is moot. Oh well]]. > > Plus the problem is still going to exist in the upstream binutils sources, so > unless we permanently include a patch to the Fedora binutils rpms the problem > is going to keep on resurfacing until it is fixed. And - presumably - this > problem is also going to affect other distributions, not just Fedora. So > just reverting an upstream change is not going to solve things in the long > run. I agree. This would of course only be a temporary bandaid. > I do appreciate however that this does not solve your problem right now. > So I propose the following: > > * On Monday (Nov 4th) I am planning another update of the rawhide > binutils to sync it with the latest upstream development sources. > There is a small possibility that this update might resolve the > problem. Small, but not non-existant. So once that build is > available (it will be 2.43.50-7) I will ping you and you can try > it out. Thanks. If the possible solution exists upstream, I can try it already. > * Assuming that the new build does not fix the problem I will then > create a patch to revert commit bf6d7087de0a7351fd1dfd5f41522a7f4f576180 > and a new scratch build from that. I am assuming that you will > be able to install this scratch build somewhere and test it, > without me having to make an official update to rawhide. Is this > correct ? I can even try that myself, assuming the revert is cleanly applicable. In fact, let's do that now... it is. Will build and test. > * If the patch solves the problem we then have the issue of what > to do for GDB. Commit 4f576180 is for them. It addresses an issue > that GDB has relied upon the linker putting the .note.build-id > section in the first page of an executable's image, so that core > dumps can be triaged and connected to debug info files. Of course > the linker does not have to layout code in this way, but it been > how the linker has worked for a long time and changing it breaks > GDB. > > So in essence we need to go back even further, to the patch that > moved the .note.build-id section away from the start of the file > and decide what to do there. This patch was part of a code size > initiative that attempts to reduce the size of binaries by only > creating one read-only loadable segment instead of two. > > At this point things get fuzzy. I am not sure how to balance the > needs of code size (which is very important for container builds), > GDB and python wheels. > > (Incidentally this does suggest another possible workaround that > you try - linking with "-Wl,--noseparate-code" - this should use > the old layout which combines read-only and code sections into > one segment. Which is what has been the default in older > distributions and which presumably works just fine with patchelf). Is this the same as -Wl,--no-rosegment? I was about to ask if building stuff with this flag is dangerous in any way. > * In the meantime we can continue to help/prod the patchelf people > to see if we can find out if there is a bad assumption in patchelf > or if the linker is really broken. > > Does this work for you ? Yes, thanks.
(In reply to Miro Hrončok from comment #38) > Indeed. I got the impression that you do not intend to solve this because > it's a problem in patchelf. Perhaps I misunderstood. Well the truth is that I am hoping that this will turn out to be a problem with patchelf and that I will not need to do anything to the linker. But I am not going to cross my arms and refuse to do anything. I will still try to work with you and anyone else to get to the bottom of this matter and solve the problem. > > (Incidentally this does suggest another possible workaround that > > you try - linking with "-Wl,--noseparate-code" - this should use > > the old layout which combines read-only and code sections into > > one segment. Which is what has been the default in older > > distributions and which presumably works just fine with patchelf). > > Is this the same as -Wl,--no-rosegment? Not the same, but related. The --separate-code option (enabled by default for x86_64) tells the linker to create separate segments for read-only data and code. (Instead of combining them as it used to do). But this option ends up creating *two* read-only segments, one before the code segment and one after it. Which whilst perfectly legal according to the ELF specification does also mean an increase in file size (because segments in the file image are padded out so that they can just be mmap'ed into memory). And file size increases are a problem for containers. The --rosegment option is an attempt to address this issue by augmenting the behaviour of the --separate-code option so that only one read-only segment is produced. (If --separate-code is not active then --rosegment has no effect). It was doen this way because --separate-code was implemented a few years ago but the container size problem has only recently come to light. So, using -Wl,--noseparate-code should make the linker go back to its old layout behaviour, which should be something that is familiar to patchelf. > I was about to ask if building stuff > with this flag is dangerous in any way. Maybe... It increases the risk of successful ROP and JOP style attacks: https://en.wikipedia.org/wiki/Return-oriented_programming If code is separate from read-only data then the read-only data cannot be mis-interpreted as instructions and executed, thus reducing the attack surface for ROP/JOP attacks. Of course using --noseparate-code does not mean that these attacks will occur, or that guarantee that useful sequences of data-that-looks- like-code will be found. But using --separate-code just makes things that harder for potential bad actors.
binutils master @ 820ebe46a415a348c53bac0306347dd0ec962f20 -- still segfaults On top of that, reverting bf6d7087de0a7351fd1dfd5f41522a7f4f576180 -- no more segfault
The -7 build of the rawhide binutils should be able to generate binaries that can be edited by patchelf.
https://bodhi.fedoraproject.org/updates/FEDORA-2024-6e5201a4fe is stuck on bodhi gating
https://src.fedoraproject.org/rpms/binutils/c/875ad5b26717b98ddf7f74aeb91789382d082cdf?branch=rawhide removed this and the mass rebuild seems to have shipped that.
My attempt to backport a possible fix for patchelf: https://src.fedoraproject.org/rpms/patchelf/pull-request/5
FEDORA-2025-919fee952d (patchelf-0.18.0-8.fc42) has been submitted as an update to Fedora 42. https://bodhi.fedoraproject.org/updates/FEDORA-2025-919fee952d
FEDORA-2025-919fee952d (patchelf-0.18.0-8.fc42) has been pushed to the Fedora 42 stable repository. If problem still persists, please make note of it in this bug report.
Witch patchelf-0.18.0-8.fc42 this particular issue happens no more, so I leave this closed. That said, I would have appreciated a heads-up about the removal of the workaround :/
*** Bug 2341005 has been marked as a duplicate of this bug. ***