Bug 2321588 - binutils 2.43.50-1.fc42 -> 2.43.50-2.fc42 regression. unexpected Segmentation fault in meson-python test
Summary: binutils 2.43.50-1.fc42 -> 2.43.50-2.fc42 regression. unexpected Segmentation...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: binutils
Version: rawhide
Hardware: x86_64
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Nick Clifton
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
: 2341005 (view as bug list)
Depends On:
Blocks: F42FTBFS 2322939 2341121
TreeView+ depends on / blocked
 
Reported: 2024-10-24 19:33 UTC by Miro Hrončok
Modified: 2026-01-29 16:58 UTC (History)
13 users (show)

Fixed In Version: patchelf-0.18.0-8.fc42
Clone Of:
: 2322939 (view as bug list)
Environment:
Last Closed: 2025-02-03 22:40:52 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Fedora Package Sources patchelf pull-request 5 0 None None None 2025-01-31 13:15:00 UTC
Github NixOS patchelf issues 568 0 None open Patchelf set-rpath fails on Fedora 41 beta 2024-10-29 14:05:38 UTC
Github NixOS patchelf pull 544 0 None Merged Allocate PHT & SHT at the end of the *.elf file 2025-01-31 13:15:00 UTC
Github mesonbuild meson-python issues 698 0 None open binutils 2.43.50: Segmentation fault in test_local_lib 2024-10-24 19:38:21 UTC

Description Miro Hrončok 2024-10-24 19:33:38 UTC
Description of problem:
I see a strange Segmentation fault in test_local_lib of python-meson-python.

At first, I thought this was related to Python 3.14 which is what I was testing, but it happening with older Pythons as well.

The Segmentation fault reproduces with binutils 2.43.50-5.fc42, but not with binutils 2.43.1-2.fc42.

Version-Release number of selected component: 2.43.50-5.fc42


How reproducible:

# dnf install uv git-core cmake python3.13-devel python3.14-devel gcc patchelf gdb
$ git clone https://github.com/mesonbuild/meson-python.git
$ cd meson-python
$ uv venv --python=python3.14 venv  # or python3.13
$ . venv/bin/activate
$ uv pip install ninja .[test]
$ python -m pytest -k test_local_lib
...
============================= test session starts ==============================
platform linux -- Python 3.14.0a1, pytest-8.3.3, pluggy-1.5.0
rootdir: /meson-python
configfile: pyproject.toml
testpaths: tests
plugins: cov-5.0.0, mock-3.14.0
collected 123 items / 122 deselected / 1 selected                              

tests/test_wheel.py F                                                    [100%]

=================================== FAILURES ===================================
________________________________ test_local_lib ________________________________

venv = <tests.conftest.VEnv object at 0x7fb577566f90>
wheel_link_against_local_lib = PosixPath('/tmp/pytest-of-root/pytest-5/test0/mesonpy-test-5tupkd1z/link_against_local_lib-1.0.0-cp314-cp314-linux_x86_64.whl')

    @pytest.mark.skipif(sys.platform not in {'linux', 'darwin'}, reason='Not supported on this platform')
    def test_local_lib(venv, wheel_link_against_local_lib):
        venv.pip('install', wheel_link_against_local_lib)
>       output = venv.python('-c', 'import example; print(example.example_sum(1, 2))')

tests/test_wheel.py:160: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
tests/conftest.py:114: in python
    return subprocess.check_output([self.executable, *args]).decode()
/usr/lib64/python3.14/subprocess.py:472: in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

input = None, capture_output = False, timeout = None, check = True
popenargs = (['/tmp/pytest-of-root/pytest-5/mesonpy-test-venv0/bin/python3.14', '-c', 'import example; print(example.example_sum(1, 2))'],)
kwargs = {'stdout': -1}
process = <Popen: returncode: -11 args: ['/tmp/pytest-of-root/pytest-5/mesonpy-test-ve...>
stdout = b'', stderr = None, retcode = -11

    def run(*popenargs,
            input=None, capture_output=False, timeout=None, check=False, **kwargs):
        """Run command with arguments and return a CompletedProcess instance.
    
        The returned instance will have attributes args, returncode, stdout and
        stderr. By default, stdout and stderr are not captured, and those attributes
        will be None. Pass stdout=PIPE and/or stderr=PIPE in order to capture them,
        or pass capture_output=True to capture both.
    
        If check is True and the exit code was non-zero, it raises a
        CalledProcessError. The CalledProcessError object will have the return code
        in the returncode attribute, and output & stderr attributes if those streams
        were captured.
    
        If timeout is given, and the process takes too long, a TimeoutExpired
        exception will be raised.
    
        There is an optional argument "input", allowing you to
        pass bytes or a string to the subprocess's stdin.  If you use this argument
        you may not also use the Popen constructor's "stdin" argument, as
        it will be used internally.
    
        By default, all communication is in bytes, and therefore any "input" should
        be bytes, and the stdout and stderr will be bytes. If in text mode, any
        "input" should be a string, and stdout and stderr will be strings decoded
        according to locale encoding, or by "encoding" if set. Text mode is
        triggered by setting any of text, encoding, errors or universal_newlines.
    
        The other arguments are the same as for the Popen constructor.
        """
        if input is not None:
            if kwargs.get('stdin') is not None:
                raise ValueError('stdin and input arguments may not both be used.')
            kwargs['stdin'] = PIPE
    
        if capture_output:
            if kwargs.get('stdout') is not None or kwargs.get('stderr') is not None:
                raise ValueError('stdout and stderr arguments may not be used '
                                 'with capture_output.')
            kwargs['stdout'] = PIPE
            kwargs['stderr'] = PIPE
    
        with Popen(*popenargs, **kwargs) as process:
            try:
                stdout, stderr = process.communicate(input, timeout=timeout)
            except TimeoutExpired as exc:
                process.kill()
                if _mswindows:
                    # Windows accumulates the output in a single blocking
                    # read() call run on child threads, with the timeout
                    # being done in a join() on those threads.  communicate()
                    # _after_ kill() is required to collect that and add it
                    # to the exception.
                    exc.stdout, exc.stderr = process.communicate()
                else:
                    # POSIX _communicate already populated the output so
                    # far into the TimeoutExpired exception.
                    process.wait()
                raise
            except:  # Including KeyboardInterrupt, communicate handled that.
                process.kill()
                # We don't call process.wait() as .__exit__ does that for us.
                raise
            retcode = process.poll()
            if check and retcode:
>               raise CalledProcessError(retcode, process.args,
                                         output=stdout, stderr=stderr)
E               subprocess.CalledProcessError: Command '['/tmp/pytest-of-root/pytest-5/mesonpy-test-venv0/bin/python3.14', '-c', 'import example; print(example.example_sum(1, 2))']' died with <Signals.SIGSEGV: 11>.

/usr/lib64/python3.14/subprocess.py:577: CalledProcessError
---------------------------- Captured stdout setup -----------------------------
Initialized empty Git repository in /meson-python/tests/packages/link-against-local-lib/.git/
+ meson setup /meson-python/tests/packages/link-against-local-lib /meson-python/tests/packages/link-against-local-lib/.mesonpy-gx4cvlf7 -Dbuildtype=release -Db_ndebug=if-release -Db_vscrt=md --native-file=/meson-python/tests/packages/link-against-local-lib/.mesonpy-gx4cvlf7/meson-python-native-file.ini
The Meson build system
Version: 1.6.0
Source dir: /meson-python/tests/packages/link-against-local-lib
Build dir: /meson-python/tests/packages/link-against-local-lib/.mesonpy-gx4cvlf7
Build type: native build
Project name: link-against-local-lib
Project version: 1.0.0
C compiler for the host machine: cc (gcc 14.2.1 "cc (GCC) 14.2.1 20240912 (Red Hat 14.2.1-4)")
C linker for the host machine: cc ld.bfd 2.43.50.20241014
Host machine cpu family: x86_64
Host machine cpu: x86_64
Program python found: YES (/meson-python/venv/bin/python)
Found pkg-config: YES (/usr/bin/pkg-config) 2.3.0
Run-time dependency python found: YES 3.14
WARNING: Please do not define rpath with a linker argument, use install_rpath
or build_rpath properties instead.
This will become a hard error in a future Meson release.

Build targets in project: 2

link-against-local-lib 1.0.0

  User defined options
    Native files: /meson-python/tests/packages/link-against-local-lib/.mesonpy-gx4cvlf7/meson-python-native-file.ini
    b_ndebug    : if-release
    b_vscrt     : md
    buildtype   : release

Found ninja-1.11.1.git.kitware.jobserver-1 at /meson-python/venv/bin/ninja
+ /meson-python/venv/bin/ninja
[1/5] Compiling C object lib/libexample.so.p/examplelib.c.o
[2/5] Linking target lib/libexample.so
[3/5] Compiling C object example.cpython-314-x86_64-linux-gnu.so.p/examplemod.c.o
[4/5] Generating symbol file lib/libexample.so.p/libexample.so.symbols
[5/5] Linking target example.cpython-314-x86_64-linux-gnu.so
[1/2] /meson-python/tests/packages/link-against-local-lib/.mesonpy-gx4cvlf7/lib/libexample.so
[2/2] /meson-python/tests/packages/link-against-local-lib/.mesonpy-gx4cvlf7/example.cpython-314-x86_64-linux-gnu.so
=========================== short test summary info ============================
FAILED tests/test_wheel.py::test_local_lib - subprocess.CalledProcessError: Command '['/tmp/pytest-of-root/pytest-5/meso...
====================== 1 failed, 122 deselected in 3.01s =======================

# /tmp/pytest-of-root/pytest-5/mesonpy-test-venv0/bin/python3.14 -c 'import example; print(example.example_sum(1, 2))'
Segmentation fault (core dumped)

# gdb /tmp/pytest-of-root/pytest-5/mesonpy-test-venv0/bin/python3.14
(gdb) run -c 'import example; print(example.example_sum(1, 2))'
Program received signal SIGSEGV, Segmentation fault.
0x00007f05fbbe1294 in ?? ()
(gdb) bt
#0  0x00007f05fbbe1294 in ?? ()
#1  0x00007f05fbbfc310 in call_init (l=0x56104552bed0, argc=3, 
    argv=0x7fffbdb0b0f8, env=0x7fffbdb0b118) at dl-init.c:60
#2  call_init (l=0x56104552bed0, argc=3, argv=0x7fffbdb0b0f8, 
    env=0x7fffbdb0b118) at dl-init.c:26
#3  0x00007f05fbbfc42d in _dl_init (main_map=0x56104552bed0, argc=3, 
    argv=0x7fffbdb0b0f8, env=0x7fffbdb0b118) at dl-init.c:121
#4  0x00007f05fbbf9562 in __GI__dl_catch_exception (
    exception=exception@entry=0x0, 
    operate=operate@entry=0x7f05fbc030a0 <call_dl_init>, 
    args=args@entry=0x7fffbdb09fc0) at dl-catch.c:215
#5  0x00007f05fbc03039 in dl_open_worker (a=a@entry=0x7fffbdb09fc0)
    at dl-open.c:785
#6  0x00007f05fbbf94c3 in __GI__dl_catch_exception (
    exception=exception@entry=0x7fffbdb09fa0, 
    operate=operate@entry=0x7f05fbc02fb0 <dl_open_worker>, 
    args=args@entry=0x7fffbdb09fc0) at dl-catch.c:241
#7  0x00007f05fbc03424 in _dl_open (
    file=0x7f05fb0d94f0 "/tmp/pytest-of-root/pytest-5/mesonpy-test-venv0/lib64/python3.14/site-packages/example.cpython-314-x86_64-linux-gnu.so", 
    mode=<optimized out>, 
    caller_dlopen=0x7f05fb869e21 <_imp_create_dynamic+929>, 
    nsid=<optimized out>, argc=3, argv=0x7fffbdb0b0f8, env=0x7fffbdb0b118)
    at dl-open.c:860
#8  0x00007f05fb47a9b4 in dlopen_doit () from /lib64/libc.so.6
#9  0x00007f05fbbf94c3 in __GI__dl_catch_exception (
    exception=exception@entry=0x7fffbdb0a1b0, 
    operate=0x7f05fb47a950 <dlopen_doit>, args=0x7fffbdb0a270)
    at dl-catch.c:241
#10 0x00007f05fbbf9619 in _dl_catch_error (objname=0x7fffbdb0a218, 
    errstring=0x7fffbdb0a220, mallocedp=0x7fffbdb0a217, 
    operate=<optimized out>, args=<optimized out>) at dl-catch.c:260
#11 0x00007f05fb47a4a3 in _dlerror_run () from /lib64/libc.so.6
#12 0x00007f05fb47aa6f in dlopen.5 () from /lib64/libc.so.6
#13 0x00007f05fb869e21 in _imp_create_dynamic ()
   from /lib64/libpython3.14.so.1.0
#14 0x00007f05fb77dccb in cfunction_vectorcall_FASTCALL ()
   from /lib64/libpython3.14.so.1.0
#15 0x00007f05fb75b05a in _PyEval_EvalFrameDefault ()
   from /lib64/libpython3.14.so.1.0
#16 0x00007f05fb77aec2 in object_vacall () from /lib64/libpython3.14.so.1.0
#17 0x00007f05fb7b441e in PyObject_CallMethodObjArgs ()
   from /lib64/libpython3.14.so.1.0
#18 0x00007f05fb7b35bd in PyImport_ImportModuleLevelObject ()
   from /lib64/libpython3.14.so.1.0
#19 0x00007f05fb75d7c9 in _PyEval_EvalFrameDefault ()
   from /lib64/libpython3.14.so.1.0
#20 0x00007f05fb82d3bb in PyEval_EvalCode () from /lib64/libpython3.14.so.1.0
#21 0x00007f05fb852050 in run_eval_code_obj () from /lib64/libpython3.14.so.1.0
#22 0x00007f05fb84af83 in run_mod () from /lib64/libpython3.14.so.1.0
#23 0x00007f05fb83d8ee in _PyRun_StringFlagsWithName.constprop.0 ()
   from /lib64/libpython3.14.so.1.0
#24 0x00007f05fb83d798 in _PyRun_SimpleStringFlagsWithName ()
   from /lib64/libpython3.14.so.1.0
#25 0x00007f05fb8647e4 in Py_RunMain () from /lib64/libpython3.14.so.1.0
#26 0x00007f05fb81c7ec in Py_BytesMain () from /lib64/libpython3.14.so.1.0
#27 0x00007f05fb4120c8 in __libc_start_call_main () from /lib64/libc.so.6
#28 0x00007f05fb41218b in __libc_start_main_impl () from /lib64/libc.so.6
#29 0x0000561011e4f095 in _start ()

Comment 1 Miro Hrončok 2024-10-24 19:38:05 UTC
I can also reproduce by building https://src.fedoraproject.org/rpms/python-meson-python @ rawhide (90b81c9e8645a4c08f6f74127746eadddd18ae2b) in mock:

[python-meson-python (rawhide)]$ fedpkg --release rawhide mockbuild
...
=================================== FAILURES ===================================
________________________________ test_local_lib ________________________________

venv = <tests.conftest.VEnv object at 0x7ff3d4b590f0>
wheel_link_against_local_lib = PosixPath('/tmp/pytest-of-mockbuild/pytest-0/test0/mesonpy-test-pbruvgfn/link_against_local_lib-1.0.0-cp313-cp313-linux_x86_64.whl')

    @pytest.mark.skipif(sys.platform not in {'linux', 'darwin'}, reason='Not supported on this platform')
    def test_local_lib(venv, wheel_link_against_local_lib):
        venv.pip('install', wheel_link_against_local_lib)
>       output = venv.python('-c', 'import example; print(example.example_sum(1, 2))')

tests/test_wheel.py:160: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
tests/conftest.py:107: in python
    return subprocess.check_output([self.executable, *args]).decode()
/usr/lib64/python3.13/subprocess.py:472: in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

input = None, capture_output = False, timeout = None, check = True
popenargs = (['/tmp/pytest-of-mockbuild/pytest-0/mesonpy-test-venv4/bin/python3', '-c', 'import example; print(example.example_sum(1, 2))'],)
kwargs = {'stdout': -1}
process = <Popen: returncode: -11 args: ['/tmp/pytest-of-mockbuild/pytest-0/mesonpy-te...>
stdout = b'', stderr = None, retcode = -11

    def run(*popenargs,
            input=None, capture_output=False, timeout=None, check=False, **kwargs):
        """Run command with arguments and return a CompletedProcess instance.
    
        The returned instance will have attributes args, returncode, stdout and
        stderr. By default, stdout and stderr are not captured, and those attributes
        will be None. Pass stdout=PIPE and/or stderr=PIPE in order to capture them,
        or pass capture_output=True to capture both.
    
        If check is True and the exit code was non-zero, it raises a
        CalledProcessError. The CalledProcessError object will have the return code
        in the returncode attribute, and output & stderr attributes if those streams
        were captured.
    
        If timeout is given, and the process takes too long, a TimeoutExpired
        exception will be raised.
    
        There is an optional argument "input", allowing you to
        pass bytes or a string to the subprocess's stdin.  If you use this argument
        you may not also use the Popen constructor's "stdin" argument, as
        it will be used internally.
    
        By default, all communication is in bytes, and therefore any "input" should
        be bytes, and the stdout and stderr will be bytes. If in text mode, any
        "input" should be a string, and stdout and stderr will be strings decoded
        according to locale encoding, or by "encoding" if set. Text mode is
        triggered by setting any of text, encoding, errors or universal_newlines.
    
        The other arguments are the same as for the Popen constructor.
        """
        if input is not None:
            if kwargs.get('stdin') is not None:
                raise ValueError('stdin and input arguments may not both be used.')
            kwargs['stdin'] = PIPE
    
        if capture_output:
            if kwargs.get('stdout') is not None or kwargs.get('stderr') is not None:
                raise ValueError('stdout and stderr arguments may not be used '
                                 'with capture_output.')
            kwargs['stdout'] = PIPE
            kwargs['stderr'] = PIPE
    
        with Popen(*popenargs, **kwargs) as process:
            try:
                stdout, stderr = process.communicate(input, timeout=timeout)
            except TimeoutExpired as exc:
                process.kill()
                if _mswindows:
                    # Windows accumulates the output in a single blocking
                    # read() call run on child threads, with the timeout
                    # being done in a join() on those threads.  communicate()
                    # _after_ kill() is required to collect that and add it
                    # to the exception.
                    exc.stdout, exc.stderr = process.communicate()
                else:
                    # POSIX _communicate already populated the output so
                    # far into the TimeoutExpired exception.
                    process.wait()
                raise
            except:  # Including KeyboardInterrupt, communicate handled that.
                process.kill()
                # We don't call process.wait() as .__exit__ does that for us.
                raise
            retcode = process.poll()
            if check and retcode:
>               raise CalledProcessError(retcode, process.args,
                                         output=stdout, stderr=stderr)
E               subprocess.CalledProcessError: Command '['/tmp/pytest-of-mockbuild/pytest-0/mesonpy-test-venv4/bin/python3', '-c', 'import example; print(example.example_sum(1, 2))']' died with <Signals.SIGSEGV: 11>.

/usr/lib64/python3.13/subprocess.py:577: CalledProcessError
---------------------------- Captured stdout setup -----------------------------
Initialized empty Git repository in /builddir/build/BUILD/python-meson-python-0.17.0-build/meson_python-0.17.0/tests/packages/link-against-local-lib/.git/
+ meson setup /builddir/build/BUILD/python-meson-python-0.17.0-build/meson_python-0.17.0/tests/packages/link-against-local-lib /builddir/build/BUILD/python-meson-python-0.17.0-build/meson_python-0.17.0/tests/packages/link-against-local-lib/.mesonpy-2x5o62cf -Dbuildtype=release -Db_ndebug=if-release -Db_vscrt=md --native-file=/builddir/build/BUILD/python-meson-python-0.17.0-build/meson_python-0.17.0/tests/packages/link-against-local-lib/.mesonpy-2x5o62cf/meson-python-native-file.ini
The Meson build system
Version: 1.5.1
Source dir: /builddir/build/BUILD/python-meson-python-0.17.0-build/meson_python-0.17.0/tests/packages/link-against-local-lib
Build dir: /builddir/build/BUILD/python-meson-python-0.17.0-build/meson_python-0.17.0/tests/packages/link-against-local-lib/.mesonpy-2x5o62cf
Build type: native build
Project name: link-against-local-lib
Project version: 1.0.0
C compiler for the host machine: gcc (gcc 14.2.1 "gcc (GCC) 14.2.1 20240912 (Red Hat 14.2.1-4)")
C linker for the host machine: gcc ld.bfd 2.43.50.20241014
Host machine cpu family: x86_64
Host machine cpu: x86_64
Program python found: YES (/usr/bin/python3)
Found pkg-config: YES (/usr/bin/pkg-config) 2.3.0
Run-time dependency python found: YES 3.13
WARNING: Please do not define rpath with a linker argument, use install_rpath
or build_rpath properties instead.
This will become a hard error in a future Meson release.

Build targets in project: 2

link-against-local-lib 1.0.0

  User defined options
    Native files: /builddir/build/BUILD/python-meson-python-0.17.0-build/meson_python-0.17.0/tests/packages/link-against-local-lib/.mesonpy-2x5o62cf/meson-python-native-file.ini
    buildtype   : release
    b_ndebug    : if-release
    b_vscrt     : md

Found ninja-1.12.1 at /usr/bin/ninja
+ /usr/bin/ninja
[1/5] Compiling C object lib/libexample.so.p/examplelib.c.o
[2/5] Linking target lib/libexample.so
[3/5] Compiling C object example.cpython-313-x86_64-linux-gnu.so.p/examplemod.c.o
[4/5] Generating symbol file lib/libexample.so.p/libexample.so.symbols
[5/5] Linking target example.cpython-313-x86_64-linux-gnu.so
[1/2] /builddir/build/BUILD/python-meson-python-0.17.0-build/meson_python-0.17.0/tests/packages/link-against-local-lib/.mesonpy-2x5o62cf/lib/libexample.so
[2/2] /builddir/build/BUILD/python-meson-python-0.17.0-build/meson_python-0.17.0/tests/packages/link-against-local-lib/.mesonpy-2x5o62cf/example.cpython-313-x86_64-linux-gnu.so
=========================== short test summary info ============================
...
FAILED tests/test_wheel.py::test_local_lib - subprocess.CalledProcessError: C...
================== 1 failed, 111 passed, 11 skipped in 51.00s ==================

RPM build warnings:

RPM build errors:
error: Bad exit status from /var/tmp/rpm-tmp.koxgcb (%check)

Comment 2 Miro Hrončok 2024-10-24 19:44:44 UTC
In Koji, this happens on x86_64 and i686, but not on s390x, aarch64, ppc64le.

https://koji.fedoraproject.org/koji/taskinfo?taskID=125164632 x86_64
https://koji.fedoraproject.org/koji/taskinfo?taskID=125164654 i686

Reproduce with:

[python-meson-python (rawhide)]$ fedpkg build --scratch --arches x86_64
[python-meson-python (rawhide)]$ fedpkg build --scratch --arches i686

Comment 3 Miro Hrončok 2024-10-28 08:15:16 UTC
FWIW it looks like 2.43.50-1.fc42 to 2.43.50-4.fc42 never passed gating and 2.43.50-5.fc42 landed via https://bodhi.fedoraproject.org/updates/FEDORA-2024-109a9172e1 with the test waived.

Comment 4 Miro Hrončok 2024-10-28 08:23:42 UTC
binutils-2.43.50-1.fc42.x86_64 is OK
binutils-2.43.50-2.fc42.x86_64 is bad
binutils-2.43.50-4.fc42.x86_64 is bad
binutils-2.43.50-5.fc42.x86_64 is bad

Comment 5 Nick Clifton 2024-10-28 12:28:02 UTC
Some comments/suggestions:

* Given that rawhide binutils builds are now based upon snapshots of the upstream binutils development sources, you may have more success reporting this bug upstream.

* Are you building with LTO enabled ?  If so, does the problem go away if LTO is not used ?  (Not that this is a solution, but it does help to narrow down the problem area).

* Is it possible to create a small stand-alone reproducer for the problem ?  Having to build python from scratch and then run its testsuite each time is going to make it very hard to narrow down which commit to the binutils sources introduced the bug.

Comment 6 Miro Hrončok 2024-10-28 16:05:05 UTC
I'll check the rest, but just a short answer:

> Having to build python from scratch and then run its testsuite...

This is probably a confusion. Building Python from source is not needed. Use Python packaged in Fedora.

Comment 7 Miro Hrončok 2024-10-28 16:21:25 UTC
Sticking %global _lto_cflags %{nil} into python-meson-python.spec makes no difference.

I have not tested using a Python built with LTO disabled (because I am using Python packaged in Fedora).

Comment 8 Miro Hrončok 2024-10-28 16:31:57 UTC
Smaller reproducer:

# dnf install git-core pip gcc ninja python3-devel
# git clone https://github.com/mesonbuild/meson-python.git
# cd meson-python/tests/packages/link-against-local-lib/
# pip install .
# python3 -c 'import example'
Segmentation fault (core dumped)

Comment 9 Ben Beasley 2024-10-28 18:32:48 UTC
It looks like the main unique thing about this particular test is that it uses -Wl,-rpath,custom-rpath. That’s probably relevant.

https://github.com/mesonbuild/meson-python/blob/b43ffcd0c64fa9ef97e99c15ac3f1f43d9572324/tests/packages/link-against-local-lib/meson.build#L15C18-L15C41

Comment 10 Miro Hrončok 2024-10-29 10:19:43 UTC
Built with binutils-2.43.50-1.fc42.x86_64:

$ readelf -d /usr/local/lib64/python3.13/site-packages/example.cpython-313-x86_64-linux-gnu.so 

Dynamic section at offset 0x2e28 contains 22 entries:
  Tag        Type                         Name/Value
 0x0000000000000001 (NEEDED)             Shared library: [libexample.so]
 0x000000000000001d (RUNPATH)            Library runpath: [$ORIGIN/.link_against_local_lib.mesonpy.libs:custom-rpath]
 0x000000000000000c (INIT)               0x1000
 0x000000000000000d (FINI)               0x11a4
 0x0000000000000019 (INIT_ARRAY)         0x3e10
 0x000000000000001b (INIT_ARRAYSZ)       8 (bytes)
 0x000000000000001a (FINI_ARRAY)         0x3e18
 0x000000000000001c (FINI_ARRAYSZ)       8 (bytes)
 0x000000006ffffef5 (GNU_HASH)           0x2028
 0x0000000000000005 (STRTAB)             0x5000
 0x0000000000000006 (SYMTAB)             0x2050
 0x000000000000000a (STRSZ)              251 (bytes)
 0x000000000000000b (SYMENT)             24 (bytes)
 0x0000000000000003 (PLTGOT)             0x3fe8
 0x0000000000000002 (PLTRELSZ)           120 (bytes)
 0x0000000000000014 (PLTREL)             RELA
 0x0000000000000017 (JMPREL)             0x2310
 0x0000000000000007 (RELA)               0x2208
 0x0000000000000008 (RELASZ)             264 (bytes)
 0x0000000000000009 (RELAENT)            24 (bytes)
 0x000000006ffffff9 (RELACOUNT)          7
 0x0000000000000000 (NULL)               0x0


Built with binutils-2.43.50-5.fc42.x86_64:

$ readelf -d /usr/local/lib64/python3.13/site-packages/example.cpython-313-x86_64-linux-gnu.so 

Dynamic section at offset 0x1e28 contains 22 entries:
  Tag        Type                         Name/Value
 0x0000000000000001 (NEEDED)             Shared library: [libexample.so]
 0x000000000000001d (RUNPATH)            Library runpath: [$ORIGIN/.link_against_local_lib.mesonpy.libs:custom-rpath]
 0x000000000000000c (INIT)               0x294
 0x000000000000000d (FINI)               0x434
 0x0000000000000019 (INIT_ARRAY)         0x2e10
 0x000000000000001b (INIT_ARRAYSZ)       8 (bytes)
 0x000000000000001a (FINI_ARRAY)         0x2e18
 0x000000000000001c (FINI_ARRAYSZ)       8 (bytes)
 0x000000006ffffef5 (GNU_HASH)           0x1000
 0x0000000000000005 (STRTAB)             0x41d0
 0x0000000000000006 (SYMTAB)             0x1028
 0x000000000000000a (STRSZ)              251 (bytes)
 0x000000000000000b (SYMENT)             24 (bytes)
 0x0000000000000003 (PLTGOT)             0x2fe8
 0x0000000000000002 (PLTRELSZ)           120 (bytes)
 0x0000000000000014 (PLTREL)             RELA
 0x0000000000000017 (JMPREL)             0x12e8
 0x0000000000000007 (RELA)               0x11e0
 0x0000000000000008 (RELASZ)             264 (bytes)
 0x0000000000000009 (RELAENT)            24 (bytes)
 0x000000006ffffff9 (RELACOUNT)          7
 0x0000000000000000 (NULL)               0x0

diff
-Dynamic section at offset 0x2e28 contains 22 entries:
+Dynamic section at offset 0x1e28 contains 22 entries:
   Tag        Type                         Name/Value
  0x0000000000000001 (NEEDED)             Shared library: [libexample.so]
  0x000000000000001d (RUNPATH)            Library runpath: [$ORIGIN/.link_against_local_lib.mesonpy.libs:custom-rpath]
- 0x000000000000000c (INIT)               0x1000
- 0x000000000000000d (FINI)               0x11a4
- 0x0000000000000019 (INIT_ARRAY)         0x3e10
+ 0x000000000000000c (INIT)               0x294
+ 0x000000000000000d (FINI)               0x434
+ 0x0000000000000019 (INIT_ARRAY)         0x2e10
  0x000000000000001b (INIT_ARRAYSZ)       8 (bytes)
- 0x000000000000001a (FINI_ARRAY)         0x3e18
+ 0x000000000000001a (FINI_ARRAY)         0x2e18
  0x000000000000001c (FINI_ARRAYSZ)       8 (bytes)
- 0x000000006ffffef5 (GNU_HASH)           0x2028
- 0x0000000000000005 (STRTAB)             0x5000
- 0x0000000000000006 (SYMTAB)             0x2050
+ 0x000000006ffffef5 (GNU_HASH)           0x1000
+ 0x0000000000000005 (STRTAB)             0x41d0
+ 0x0000000000000006 (SYMTAB)             0x1028
  0x000000000000000a (STRSZ)              251 (bytes)
  0x000000000000000b (SYMENT)             24 (bytes)
- 0x0000000000000003 (PLTGOT)             0x3fe8
+ 0x0000000000000003 (PLTGOT)             0x2fe8
  0x0000000000000002 (PLTRELSZ)           120 (bytes)
  0x0000000000000014 (PLTREL)             RELA
- 0x0000000000000017 (JMPREL)             0x2310
- 0x0000000000000007 (RELA)               0x2208
+ 0x0000000000000017 (JMPREL)             0x12e8
+ 0x0000000000000007 (RELA)               0x11e0
  0x0000000000000008 (RELASZ)             264 (bytes)
  0x0000000000000009 (RELAENT)            24 (bytes)
  0x000000006ffffff9 (RELACOUNT)          7

Comment 11 Florian Weimer 2024-10-29 10:48:03 UTC
(In reply to Miro Hrončok from comment #10)
> [$ORIGIN/.link_against_local_lib.mesonpy.libs:custom-rpath]
> - 0x000000000000000d (FINI)               0x11a4
> - 0x0000000000000019 (INIT_ARRAY)         0x3e10
> + 0x000000000000000c (INIT)               0x294
> + 0x000000000000000d (FINI)               0x434
> + 0x0000000000000019 (INIT_ARRAY)         0x2e10

This caught my interest. DT_INIT points to non-executable memory (same page as the ELF header, which is not executable with separate-code). The backtrace confirms the crash happens when DT_INIT is invoked:

(gdb) bt
#0  0x00007f51053c3294 in ?? ()
#1  0x00007f51055065d0 in call_init (l=0x5647f6caa7e0, argc=3, 
    argv=0x7ffd85e5d8f8, env=0x7ffd85e5d918) at dl-init.c:60
#2  call_init (l=0x5647f6caa7e0, argc=3, argv=0x7ffd85e5d8f8, 
    env=0x7ffd85e5d918) at dl-init.c:26
#3  0x00007f51055066ed in _dl_init (main_map=0x5647f6caa7e0, argc=3, 
    argv=0x7ffd85e5d8f8, env=0x7ffd85e5d918) at dl-init.c:121

Unlike DT_INIT_ARRAY, DT_INIT points to actual code, so it has to be in an executable segment.

Is there a way to get pip to pass some flags to ninja so that it prints the actual commands executed?

I got something out of strace:

606   execve("/usr/bin/ld", ["/usr/bin/ld", "-plugin", "/usr/libexec/gcc/x86_64-redhat-linux/14/liblto_plugin.so", "-plugin-opt=/usr/libexec/gcc/x86_64-redhat-linux/14/lto-wrapper", "-plugin-opt=-fresolution=/tmp/ccK0d8nu.res", "-plugin-opt=-pass-through=-lgcc", "-plugin-opt=-pass-through=-lgcc_s", "-plugin-opt=-pass-through=-lc", "-plugin-opt=-pass-through=-lgcc", "-plugin-opt=-pass-through=-lgcc_s", "--build-id", "--no-add-needed", "--eh-frame-hdr", "--hash-style=gnu", "-m", "elf_x86_64", "-dynamic-linker", "/lib64/ld-linux-x86-64.so.2", "/usr/lib/gcc/x86_64-redhat-linux/14/../../../../lib64/crt1.o", "/usr/lib/gcc/x86_64-redhat-linux/14/../../../../lib64/crti.o", "/usr/lib/gcc/x86_64-redhat-linux/14/crtbegin.o", "-L/usr/lib/gcc/x86_64-redhat-linux/14", "-L/usr/lib/gcc/x86_64-redhat-linux/14/../../../../lib64", "-L/lib/../lib64", "-L/usr/lib/../lib64", "-L/usr/lib/gcc/x86_64-redhat-linux/14/../../..", "--version", "-lgcc", "--push-state", "--as-needed", "-lgcc_s", "--pop-state", "-lc", "-lgcc", "--push-state", "--as-needed", "-lgcc_s", "--pop-state", "/usr/lib/gcc/x86_64-redhat-linux/14/crtend.o", "/usr/lib/gcc/x86_64-redhat-linux/14/../../../../lib64/crtn.o"], 0x7fff9c992c60 /* 24 vars */ <unfinished ...>

It looks all harmless.

But then I see this:

669   execve("/tmp/pip-build-env-w5i_0ljx/normal/bin/patchelf", ["patchelf", "--print-rpath", "/meson-python/tests/packages/link-against-local-lib/.mesonpy-3adtna1_/lib/libexample.so"], 0x55b6edea9620 /* 17 vars */ <unfinished ...>
669   <... execve resumed>)             = 0
670   execve("/tmp/pip-build-env-w5i_0ljx/normal/bin/patchelf", ["patchelf", "--print-rpath", "/meson-python/tests/packages/link-against-local-lib/.mesonpy-3adtna1_/example.cpython-313-x86_64-linux-gnu.so"], 0x55b6edea9620 /* 17 vars */ <unfinished ...>
670   <... execve resumed>)             = 0
671   execve("/tmp/pip-build-env-w5i_0ljx/normal/bin/patchelf", ["patchelf", "--set-rpath", "$ORIGIN/.link_against_local_lib.mesonpy.libs:custom-rpath", "/meson-python/tests/packages/link-against-local-lib/.mesonpy-3adtna1_/example.cpython-313-x86_64-linux-gnu.so"], 0x55b6edea9620 />
671   <... execve resumed>)             = 0

Is it possible this is a duplicate of bug 2319341?

Can you reproduce this without patchelf? At a conceptual level, patchelf cannot work on GNU/Linux, and it definitely should not be used in our standard build tools.

Comment 12 Nick Clifton 2024-10-29 10:54:46 UTC
Hi Miro,

It looks like Florian is onto something with the patchelf discovery.

> Smaller reproducer:
> 
> # dnf install git-core pip gcc ninja python3-devel
> # git clone https://github.com/mesonbuild/meson-python.git
> # cd meson-python/tests/packages/link-against-local-lib/
> # pip install .
> # python3 -c 'import example'
> Segmentation fault (core dumped)

How do I specify which linker to use in the steps above ?

Say for example I am trying to perform a binary search through the commits
to the binutils source tree to find the one that triggers the bug.  I
checkout the binutils sources up to a specific commit, build a linker with
them and then what ?  How do I use this built-but-not-installed linker to
trigger/not-trigger the bug ?

Also - is it possible to specify some extra linker command line options to 
be used when building libexample.so ?  For example I would like to see if
adding -Wl,--no-rosegment makes any difference.

(It might be that the changed layout triggered by --rosegment confuses 
patchelf which then goes on to create a broken binary).

Cheers
  Nick

Comment 13 Miro Hrončok 2024-10-29 11:06:07 UTC
The --verbose flag for pip makes pip show the output of the build backend, the -Ccompile-args=-v option passes -v to meson (and hence also to ninja):

$ pip --verbose install -Ccompile-args=-v . 
...
  + /usr/bin/ninja -v
  [1/5] cc -Ilib/libexample.so.p -Ilib -I../lib -fdiagnostics-color=always -DNDEBUG -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -O3 -fPIC -MD -MQ lib/libexample.so.p/examplelib.c.o -MF lib/libexample.so.p/examplelib.c.o.d -o lib/libexample.so.p/examplelib.c.o -c ../lib/examplelib.c
  [2/5] cc  -o lib/libexample.so lib/libexample.so.p/examplelib.c.o -Wl,--as-needed -Wl,--no-undefined -Wl,-O1 -shared -fPIC -Wl,-soname,libexample.so
  [3/5] cc -Iexample.cpython-313-x86_64-linux-gnu.so.p -I. -I.. -I/usr/include/python3.13 -fvisibility=hidden -fdiagnostics-color=always -DNDEBUG -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -O3 -fPIC -MD -MQ example.cpython-313-x86_64-linux-gnu.so.p/examplemod.c.o -MF example.cpython-313-x86_64-linux-gnu.so.p/examplemod.c.o.d -o example.cpython-313-x86_64-linux-gnu.so.p/examplemod.c.o -c ../examplemod.c
  [4/5] /tmp/pip-build-env-mpu743ej/overlay/bin/meson --internal symbolextractor /meson-python/tests/packages/link-against-local-lib/.mesonpy-bw4dftts lib/libexample.so lib/libexample.so lib/libexample.so.p/libexample.so.symbols
  [5/5] cc  -o example.cpython-313-x86_64-linux-gnu.so example.cpython-313-x86_64-linux-gnu.so.p/examplemod.c.o -Wl,--as-needed -Wl,--allow-shlib-undefined -Wl,-O1 -shared -fPIC '-Wl,-rpath,$ORIGIN/lib' -Wl,-rpath-link,/meson-python/tests/packages/link-against-local-lib/.mesonpy-bw4dftts/lib lib/libexample.so -Wl,-rpath,custom-rpath
  [1/2] /meson-python/tests/packages/link-against-local-lib/.mesonpy-bw4dftts/lib/libexample.so
  [2/2] /meson-python/tests/packages/link-against-local-lib/.mesonpy-bw4dftts/example.cpython-313-x86_64-linux-gnu.so


Unfortunately, it doe snot say anything about patchelf.

Comment 14 Miro Hrončok 2024-10-29 11:15:02 UTC
> Can you reproduce this without patchelf?

I was not yet able to construct a reproducer that removes meson-python. In ELN, we patch patchelf out of meson-python, but we also skip this test. When unskipped, it simply fails with "púatchelf not supported".


> How do I specify which linker to use in the steps above ?

With $LDFLAGS.

$ LDFLAGS=-fuse-ld=gold pip --verbose install -Ccompile-args=-v .
...
  ...
  [2/5] cc  -o lib/libexample.so lib/libexample.so.p/examplelib.c.o -Wl,--as-needed -Wl,--no-undefined -Wl,-O1 -shared -fPIC -Wl,-soname,libexample.so -fuse-ld=gold
  ...
  [5/5] cc  -o example.cpython-313-x86_64-linux-gnu.so example.cpython-313-x86_64-linux-gnu.so.p/examplemod.c.o -Wl,--as-needed -Wl,--allow-shlib-undefined -Wl,-O1 -shared -fPIC -fuse-ld=gold '-Wl,-rpath,$ORIGIN/lib' -Wl,-rpath-link,/meson-python/tests/packages/link-against-local-lib/.mesonpy-gsg2i47d/lib lib/libexample.so -Wl,-rpath,custom-rpath
...
$ python -c 'import example'
(Does not segfault.)

Comment 15 Miro Hrončok 2024-10-29 11:17:18 UTC
(In reply to Nick Clifton from comment #12)
> Also - is it possible to specify some extra linker command line options to 
> be used when building libexample.so ?  For example I would like to see if
> adding -Wl,--no-rosegment makes any difference.

Also LDFLAGS. Adding -Wl,--no-rosegment makes a difference (no segfault).

Comment 16 Miro Hrončok 2024-10-29 12:09:36 UTC
Should I still report this to binutils upstream?

Comment 17 Nick Clifton 2024-10-29 12:17:20 UTC
(In reply to Miro Hrončok from comment #16)
> Should I still report this to binutils upstream?

If you can reproduce the problem without patchelf being part of the process then yes.  

Otherwise it might be worth reporting this bug to both the patchelf *and* the binutils upstream communities.

Comment 18 Miro Hrončok 2024-10-29 12:33:23 UTC
If I just patch the patchelf --set-rpath call out of the build, I obviously get:

$ python3 -c 'import example; print(example.example_sum(1, 2))'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
    import example; print(example.example_sum(1, 2))
    ^^^^^^^^^^^^^^
ImportError: libexample.so: cannot open shared object file: No such file or directory


When I set LD_LIBRARY_PATH to compensate, the import works. So this is clearly isolated to patchelf --set-rpath.

Comment 19 Nick Clifton 2024-10-29 12:54:34 UTC
(In reply to Miro Hrončok from comment #18)
> If I just patch the patchelf --set-rpath call out of the build, I obviously
> get:
> 
> $ python3 -c 'import example; print(example.example_sum(1, 2))'
> Traceback (most recent call last):
>   File "<string>", line 1, in <module>
>     import example; print(example.example_sum(1, 2))
>     ^^^^^^^^^^^^^^
> ImportError: libexample.so: cannot open shared object file: No such file or
> directory
> 
> 
> When I set LD_LIBRARY_PATH to compensate, the import works. So this is
> clearly isolated to patchelf --set-rpath.

Right - so it is really necessary to use patchelf ?  I presume that ordinary
python users are not expected to do so, so wouldn't it be better to change
the python test itself to use LD_LIBRATY_PATH instead of patchelf ?

Comment 20 Miro Hrončok 2024-10-29 13:01:18 UTC
What do you mean by "the python test"? Again, building or testing Python itself is not involved here at all.

This is meson-python, which uses patchelf for the "auditwheel" case. The use case is described at https://github.com/mesonbuild/meson-python/blob/0.17.1/mesonpy/__init__.py#L431

Is it necessary to use patchelf? No idea. But using LD_LIBRATY_PATH in the test makes the test useless. The test tests that the built Python extension module is functional.

Comment 21 Nick Clifton 2024-10-29 13:26:11 UTC
(In reply to Miro Hrončok from comment #20)
> What do you mean by "the python test"? Again, building or testing Python
> itself is not involved here at all.

Sorry - I was not clear.  I meant the test where a shared library is built but then cannot be loaded by Python.

> Is it necessary to use patchelf? No idea. But using LD_LIBRATY_PATH in the
> test makes the test useless. 

Really ?  

> The test tests that the built Python extension module is functional.

And it would be functional if it were to be installed into the correct location.
But since that is not something that the test harness wants to do, it makes
sense to run the test with LD_LIBRARY_PATH specified.  The test still checks
that the module loads and behaves correctly, it just uses the system approved
method for loading shared objects that are not installed into standard locations.

Comment 22 Ben Beasley 2024-10-29 14:03:06 UTC
As I understand it, the point of all this patchelf/rpath stuff is to support building wheels with bundled shared libraries. Obviously we want to avoid that when building Fedora packages, but end users can use the system python3-meson-python package too, and it’s a use case meson-python is designed to support. Ideally this kind of bundling would work as advertised even though it’s not useful for building Fedora packages. Requiring LD_LIBRARY_PATH to be set because patchelf is broken (?) doesn’t help these users build “portable” wheels.

Besides, this is not specific to Fedora’s python-meson-python package. As shown in https://bugzilla.redhat.com/show_bug.cgi?id=2321588#c8, it will also affect people using meson-python installed from PyPI in a virtual environment.

This does seem awfully similar to bug 2319341 / https://github.com/NixOS/patchelf/issues/568.

Comment 23 Florian Weimer 2024-10-29 14:08:04 UTC
(In reply to Ben Beasley from comment #22)
> As I understand it, the point of all this patchelf/rpath stuff is to support
> building wheels with bundled shared libraries. Obviously we want to avoid
> that when building Fedora packages, but end users can use the system
> python3-meson-python package too, and it’s a use case meson-python is
> designed to support. Ideally this kind of bundling would work as advertised
> even though it’s not useful for building Fedora packages. Requiring
> LD_LIBRARY_PATH to be set because patchelf is broken (?) doesn’t help these
> users build “portable” wheels.

But using patchelf to rewrite the binaries that just have been built is really awkward. Why not use -Wl,-rpath=… during the linker invocations? I get that patchelf is sometimes necessary if you can't relink, but this shouldn't really be one of those scenarios.

Comment 24 Miro Hrončok 2024-10-29 14:42:04 UTC
> But using patchelf to rewrite the binaries that just have been built is really awkward.

I don't disagree. But using LD_LIBRARY_PATH in the test defies the purpose of that test.

Comment 25 Florian Weimer 2024-10-29 15:49:15 UTC
(In reply to Miro Hrončok from comment #24)
> > But using patchelf to rewrite the binaries that just have been built is really awkward.
> 
> I don't disagree. But using LD_LIBRARY_PATH in the test defies the purpose
> of that test.

From my perspective, the alternative to patchelf is using -Wl,-rpath=… during linking. Not setting LD_LIBRARY_PATH at run time.

We can figure out if there is anything we can do on the toolchain side to make this easier for Meson/Python to accomplish. But using patchelf under such circumstances has this “I can't figure out how my build system works” smell, sorry.

Comment 26 Miro Hrončok 2024-10-29 16:09:22 UTC
Well, I admit, I can't figure out how this build system works. It's pip calling meson-python, calling meson, calling ninja, calling gcc. It's a frontend around a wrapper around a build system which is a wrapper. It's convoluted.

Upstream meson-python has decided to use patchelf and telling me that it is wrong does not help me.
Could meson-python tell meson to tell ninja to tell gcc that it will eventually move the shared libraries somewhere else, so it will change the value of -Wl,-rpath=… during linking? Maybe... I don't know. I am no meson expert.

I understand your perspective, but if patchelf is fundamentally broken we should strive to deprecate it and remove it from Fedora rather than just saying it's bad.

I reported what seems like a regression in binutils -- it worked and now it no longer works. Could you please help me with that?

---

I tried bit bisecting between c839a44c391 (2.43.50-1) and 1f4aee70ed1 (2.43.50-2) but when I use `git archive` to produce snapshot tarablls for binutils.spec, the builds all fail with:

  configure: error: Building GDB requires GMP 4.2+, and MPFR 3.1.0+.
  ...
  make: *** No rule to make target 'all'.  Stop.

E.g. https://koji.fedoraproject.org/koji/taskinfo?taskID=125327758

Comment 27 Miro Hrončok 2024-10-29 16:14:44 UTC
See also this comment from 4 months ago https://src.fedoraproject.org/rpms/python-meson-python/pull-request/7#comment-203299

"""
If you or someone else happens to be able to clearly articulate how meson-python could handle this better without having to fix up rpaths after the fact, it wouldn’t hurt to file an issue upstream. I don’t know if they would be receptive—this upstream has been known to act a bit prickly on technical matters—but as it stands upstream doesn’t know anyone is dissatisfied with the use of patchelf.
"""

Comment 28 Nick Clifton 2024-10-29 17:07:21 UTC
(In reply to Miro Hrončok from comment #26)

> I tried bit bisecting between c839a44c391 (2.43.50-1) and 1f4aee70ed1
> (2.43.50-2) but when I use `git archive` to produce snapshot tarablls for
> binutils.spec, the builds all fail with:
> 
>   configure: error: Building GDB requires GMP 4.2+, and MPFR 3.1.0+.
>   ...
>   make: *** No rule to make target 'all'.  Stop.
 
Try just deleting the gdb/ directory after unpacking the tarball and before running configure.

Or installing the mpfr-devel and gmp-devel rpms.

Comment 29 Miro Hrončok 2024-10-29 17:43:01 UTC
Deleting the gdb/ directory only gets me to a certain point.

make[3]: *** No rule to make target '../../sim/../gdb/version.in', needed by 'common/version.c-stamp'.  Stop.


No idea how the tarballs actually used for the "official" snapshots are created.

Comment 30 Nick Clifton 2024-10-30 08:55:20 UTC
(In reply to Miro Hrončok from comment #29)
> Deleting the gdb/ directory only gets me to a certain point.
> 
> make[3]: *** No rule to make target '../../sim/../gdb/version.in', needed by
> 'common/version.c-stamp'.  Stop.

OK so maybe delete all of the gdb specific directories.  ie gdb, sim, readline, 
libdecnumber, gnulib, gdbsupport, libbacktrace.


> No idea how the tarballs actually used for the "official" snapshots are
> created.

There is a script at the top level of the combined gdb/binutils source tree 
which is used to create releases: src-release.sh.  For example to create a
binutils release in the form of an xz compressed tarball run:

  ./src-release.sh -x binutils

Comment 31 Miro Hrončok 2024-10-30 14:29:45 UTC
Thanks. git bisect says


bf6d7087de0a7351fd1dfd5f41522a7f4f576180 is the first new commit
commit bf6d7087de0a7351fd1dfd5f41522a7f4f576180
Author: Nick Clifton <nickc>
Date:   Thu Sep 19 16:45:30 2024 +0100

    ld: Move the .note.build-id section to near the start of the memory map.
    
    This helps GDB to locate the debug information associated with a core dump.
    Core dumps include the first page of an executable's image, and if this
    page include the .note.build-id section then GDB can find it and then track
    down a debug info file for that build-id.

 ld/scripttempl/elf.sc | 39 +++++++++++++++++++++++++++++++++++++--
 1 file changed, 37 insertions(+), 2 deletions(-)

Comment 32 Nick Clifton 2024-10-30 16:07:39 UTC
Yeah - I am not surprised.

But also I am reasonably certain that this is not a bug in the linker,
but rather an unexpected change in behaviour which is triggering an
incorrect assumption in patchelf.

I would urge you to contact the patchelf maintainers and consult with
them about this problem.  If they can show that the linker is doing
something wrong then I will happily fix it.  But at the moment I am
of the opinion that it is their problem, not ours.

Comment 33 Miro Hrončok 2024-10-30 17:54:04 UTC
https://koschei.fedoraproject.org/package/pypy3.10 seems to show a similar regression: the build uses:

+ patchelf --set-soname libpypy3.10-c.so.0.1 builddir/pypy3.10-7.3.15/lib64/libpypy3.10-c.so.0.1
+ patchelf --replace-needed libpypy3.10-c.so libpypy3.10-c.so.0.1 builddir/pypy3.10-7.3.15/bin/pypy3.10

And later dies:

+ /builddir/build/BUILD/pypy3.10-7.3.15-build/BUILDROOT/usr/bin/pypy3.10 -c 'import _tkinter'
/var/tmp/rpm-tmp.JLMQ5b: line 186: 14279 Segmentation fault      (core dumped) /builddir/build/BUILD/pypy3.10-7.3.15-build/BUILDROOT/usr/bin/pypy3.10 -c 'import _tkinter'

If you can give advice on how to replace patchelf in PyPy, I would appreciate it.

Comment 34 Miro Hrončok 2024-10-30 17:59:55 UTC
> But at the moment I am of the opinion that it is their problem, not ours.

As it happens, this is now *my* problem, caused by a change in binutils that has not been coordinated with the rest of the distribution, which still relies on patchelf.

Would you please consider undoing this change for now, until the patchelf situation is resolved?

Comment 35 Miro Hrončok 2024-10-31 09:42:35 UTC
I also want to stress that even if we manage to figure out a way for meson-python not to use patchelf when building wheels, we don't blow the issue out of the water.

Patchelf is needed by anyone building manylinux Python wheels (via auditwheel). Eventually, what is now in Fedora Rawhide will become "old" and will be used for manylinux wheel building. Unless we figure out a new patchelf-less way of doing it, we need patchelf to work.

Uncoordinated changes to binutils or other parts of the toolset and "this is your problem, not ours" comments worry me.

Comment 36 Miro Hrončok 2024-10-31 13:59:07 UTC
> Blocks: 2322939

Does this mean we move this bugzilla back to binutils?

Comment 37 Nick Clifton 2024-10-31 14:19:59 UTC
Hi Miro,

> Does this mean we move this bugzilla back to binutils?

Ah - I had not noticed that.  Yes, I think that it makes sense to take this
PR back.  It is still entirely possible that this is a linker bug.  We just
need to find out the root cause.


(In reply to Miro Hrončok from comment #34)
> > But at the moment I am of the opinion that it is their problem, not ours.
> 
> As it happens, this is now *my* problem, caused by a change in binutils that
> has not been coordinated with the rest of the distribution, which still
> relies on patchelf.

Well in our defense there is no way to coordinate changes to the linker before
they are made.  Well not to the whole of the Fedora ecosystem.  We do run some
basic gating tests including building and running a kernel using the new linker,
but it would be unfeasible to rebuild and retest all of Fedora every time there
is a change to the linker.  Or the compiler,  Or glibc.

In essence this is what the rawhide distribution is about.  Changes are made and 
tested here before they are released to the world as a whole.  Hence I made a 
change to the linker - not gratuitously, but in order to improve the security 
of linked binaries - and we are now testing this change in rawhide.  You have 
encountered a problem, reported it here and we are now  trying to solve it.  
This is how the process works.

Anyway the point I was trying to make is not that I am refusing to make a 
change to the linker, but rather that we need some help from the patchelf 
maintainers in order to find out what is going wrong.  Once we know that
we can decide between ourselves the best course of action for solving the
problem.

To this end I have created a clone of this PR but assigned to the patchelf
maintainers:

  https://bugzilla.redhat.com/show_bug.cgi?id=2322939


> Would you please consider undoing this change for now, until the patchelf
> situation is resolved?

But when will it be resolved and how will it be tested ?  Right now we have
a convenient testbed - rawhide - that the patchelf people can use to examine
how their tool behaves with this, lets call it, experimental linker.  If I
revert the patch then it will be much harder to test and much easier to just
ignore the problem.   [[After re-reading this paragraph I realised that just
downgrading to one of the -2 .. -6 versions of the rawhide linker would allow
testing, so the point is moot.  Oh well]].

Plus the problem is still going to exist in the upstream binutils sources, so
unless we permanently include a patch to the Fedora binutils rpms the problem
is going to keep on resurfacing until it is fixed.  And - presumably - this
problem is also going to affect other distributions, not just Fedora.  So
just reverting an upstream change is not going to solve things in the long
run.

I do appreciate however that this does not solve your problem right now.
So I propose the following:

  * On Monday (Nov 4th) I am planning another update of the rawhide 
    binutils to sync it with the latest upstream development sources.
    There is a small possibility that this update might resolve the
    problem.  Small, but not non-existant.  So once that build is
    available (it will be 2.43.50-7) I will ping you and you can try
    it out.

  * Assuming that the new build does not fix the problem I will then
    create a patch to revert commit bf6d7087de0a7351fd1dfd5f41522a7f4f576180 
    and a new scratch build from that.  I am assuming that you will
    be able to install this scratch build somewhere and test it,
    without me having to make an official update to rawhide.  Is this
    correct ?  

  * If the patch solves the problem we then have the issue of what
    to do for GDB.  Commit 4f576180 is for them.  It addresses an issue
    that GDB has relied upon the linker putting the .note.build-id 
    section in the first page of an executable's image, so that core
    dumps can be triaged and connected to debug info files.  Of course
    the linker does not have to layout code in this way, but it been 
    how the linker has worked for a long time and changing it breaks
    GDB.  

    So in essence we need to go back even further, to the patch that
    moved the .note.build-id section away from the start of the file
    and decide what to do there.  This patch was part of a code size
    initiative that attempts to reduce the size of binaries by only
    creating one read-only loadable segment instead of two.

    At this point things get fuzzy.  I am not sure how to balance the
    needs of code size (which is very important for container builds),
    GDB and python wheels.

   (Incidentally this does suggest another possible workaround that
    you try - linking with "-Wl,--noseparate-code" - this should use
    the old layout which combines read-only and code sections into
    one segment.  Which is what has been the default in older 
    distributions and which presumably works just fine with patchelf).

  * In the meantime we can continue to help/prod the patchelf people
    to see if we can find out if there is a bad assumption in patchelf
    or if the linker is really broken.

Does this work for you ?

Cheers
  Nick

Comment 38 Miro Hrončok 2024-10-31 14:34:37 UTC
(In reply to Nick Clifton from comment #37)
> ...You have 
> encountered a problem, reported it here and we are now  trying to solve it.  
> This is how the process works.

Indeed. I got the impression that you do not intend to solve this because it's a problem in patchelf. Perhaps I misunderstood.


> > Would you please consider undoing this change for now, until the patchelf
> > situation is resolved?
> 
> But when will it be resolved and how will it be tested ?  Right now we have
> a convenient testbed - rawhide - that the patchelf people can use to examine
> how their tool behaves with this, lets call it, experimental linker.  If I
> revert the patch then it will be much harder to test and much easier to just
> ignore the problem.   [[After re-reading this paragraph I realised that just
> downgrading to one of the -2 .. -6 versions of the rawhide linker would allow
> testing, so the point is moot.  Oh well]].
> 
> Plus the problem is still going to exist in the upstream binutils sources, so
> unless we permanently include a patch to the Fedora binutils rpms the problem
> is going to keep on resurfacing until it is fixed.  And - presumably - this
> problem is also going to affect other distributions, not just Fedora.  So
> just reverting an upstream change is not going to solve things in the long
> run.

I agree. This would of course only be a temporary bandaid.

> I do appreciate however that this does not solve your problem right now.
> So I propose the following:
> 
>   * On Monday (Nov 4th) I am planning another update of the rawhide 
>     binutils to sync it with the latest upstream development sources.
>     There is a small possibility that this update might resolve the
>     problem.  Small, but not non-existant.  So once that build is
>     available (it will be 2.43.50-7) I will ping you and you can try
>     it out.

Thanks. If the possible solution exists upstream, I can try it already.

>   * Assuming that the new build does not fix the problem I will then
>     create a patch to revert commit bf6d7087de0a7351fd1dfd5f41522a7f4f576180 
>     and a new scratch build from that.  I am assuming that you will
>     be able to install this scratch build somewhere and test it,
>     without me having to make an official update to rawhide.  Is this
>     correct ?  

I can even try that myself, assuming the revert is cleanly applicable. In fact, let's do that now... it is. Will build and test.

>   * If the patch solves the problem we then have the issue of what
>     to do for GDB.  Commit 4f576180 is for them.  It addresses an issue
>     that GDB has relied upon the linker putting the .note.build-id 
>     section in the first page of an executable's image, so that core
>     dumps can be triaged and connected to debug info files.  Of course
>     the linker does not have to layout code in this way, but it been 
>     how the linker has worked for a long time and changing it breaks
>     GDB.  
> 
>     So in essence we need to go back even further, to the patch that
>     moved the .note.build-id section away from the start of the file
>     and decide what to do there.  This patch was part of a code size
>     initiative that attempts to reduce the size of binaries by only
>     creating one read-only loadable segment instead of two.
> 
>     At this point things get fuzzy.  I am not sure how to balance the
>     needs of code size (which is very important for container builds),
>     GDB and python wheels.
> 
>    (Incidentally this does suggest another possible workaround that
>     you try - linking with "-Wl,--noseparate-code" - this should use
>     the old layout which combines read-only and code sections into
>     one segment.  Which is what has been the default in older 
>     distributions and which presumably works just fine with patchelf).

Is this the same as -Wl,--no-rosegment? I was about to ask if building stuff with this flag is dangerous in any way.

>   * In the meantime we can continue to help/prod the patchelf people
>     to see if we can find out if there is a bad assumption in patchelf
>     or if the linker is really broken.
> 
> Does this work for you ?

Yes, thanks.

Comment 39 Nick Clifton 2024-10-31 15:28:55 UTC
(In reply to Miro Hrončok from comment #38)

> Indeed. I got the impression that you do not intend to solve this because
> it's a problem in patchelf. Perhaps I misunderstood.

Well the truth is that I am hoping that this will turn out to be a problem
with patchelf and that I will not need to do anything to the linker.  But
I am not going to cross my arms and refuse to do anything.  I will still try
to work with you and anyone else to get to the bottom of this matter and
solve the problem.

 
  
> >    (Incidentally this does suggest another possible workaround that
> >     you try - linking with "-Wl,--noseparate-code" - this should use
> >     the old layout which combines read-only and code sections into
> >     one segment.  Which is what has been the default in older 
> >     distributions and which presumably works just fine with patchelf).
> 
> Is this the same as -Wl,--no-rosegment? 

Not the same, but related.  The --separate-code option (enabled by default
for x86_64) tells the linker to create separate segments for read-only data
and code.  (Instead of combining them as it used to do).  But this option
ends up creating *two* read-only segments, one before the code segment and
one after it.  Which whilst perfectly legal according to the ELF specification
does also mean an increase in file size (because segments in the file image
are padded out so that they can just be mmap'ed into memory).  And file
size increases are a problem for containers.

The --rosegment option is an attempt to address this issue by augmenting
the behaviour of the --separate-code option so that only one read-only
segment is produced.  (If --separate-code is not active then --rosegment
has no effect).  It was doen this way because --separate-code was implemented
a few years ago but the container size problem has only recently come to
light.

So, using -Wl,--noseparate-code should make the linker go back to its
old layout behaviour, which should be something that is familiar to patchelf.


> I was about to ask if building stuff
> with this flag is dangerous in any way.
 
Maybe...  It increases the risk of successful ROP and JOP style attacks:

  https://en.wikipedia.org/wiki/Return-oriented_programming

If code is separate from read-only data then the read-only data cannot
be mis-interpreted as instructions and executed, thus reducing the 
attack surface for ROP/JOP attacks.

Of course using --noseparate-code does not mean that these attacks
will occur, or that guarantee that useful sequences of data-that-looks-
like-code will be found.  But using --separate-code just makes things
that harder for potential bad actors.

Comment 40 Miro Hrončok 2024-10-31 16:26:06 UTC
binutils master @ 820ebe46a415a348c53bac0306347dd0ec962f20 -- still segfaults

On top of that, reverting bf6d7087de0a7351fd1dfd5f41522a7f4f576180 -- no more segfault

Comment 41 Nick Clifton 2024-11-12 14:47:21 UTC
The -7 build of the rawhide binutils should be able to generate binaries that can be edited by patchelf.

Comment 42 Miro Hrončok 2024-11-13 10:55:38 UTC
https://bodhi.fedoraproject.org/updates/FEDORA-2024-6e5201a4fe is stuck on bodhi gating

Comment 43 Miro Hrončok 2025-01-31 13:00:32 UTC
https://src.fedoraproject.org/rpms/binutils/c/875ad5b26717b98ddf7f74aeb91789382d082cdf?branch=rawhide removed this and the mass rebuild seems to have shipped that.

Comment 44 Miro Hrončok 2025-01-31 13:15:01 UTC
My attempt to backport a possible fix for patchelf: https://src.fedoraproject.org/rpms/patchelf/pull-request/5

Comment 45 Fedora Update System 2025-02-03 19:44:00 UTC
FEDORA-2025-919fee952d (patchelf-0.18.0-8.fc42) has been submitted as an update to Fedora 42.
https://bodhi.fedoraproject.org/updates/FEDORA-2025-919fee952d

Comment 46 Fedora Update System 2025-02-03 22:40:52 UTC
FEDORA-2025-919fee952d (patchelf-0.18.0-8.fc42) has been pushed to the Fedora 42 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 47 Miro Hrončok 2025-02-04 11:32:13 UTC
Witch patchelf-0.18.0-8.fc42 this particular issue happens no more, so I leave this closed.

That said, I would have appreciated a heads-up about the removal of the workaround :/

Comment 48 Petr Pisar 2026-01-29 16:57:21 UTC
*** Bug 2341005 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.