On COPR ppc64le builders – but not on real Fedora builders, and not in qemu-user-static emulation – importing numpy from numpy-2.2.1-1.fc42 crashes with: Illegal instruction (core dumped) I first observed this while impact-checking a flatbuffers update for Rawhide, but I was able to reproduce the problem without involving any of the flatbuffers code. Reproducible: Always Steps to Reproduce: Adjust a working spec file – it shouldn’t matter what package – to include "BuildRequires: %{py3_dist numpy}" and to include "python3 -c 'import numpy'" as the first line of %build. Then, build the package in COPR for ppc64le. Actual Results: + python3 -c 'import numpy' RPM build errors: /var/tmp/rpm-tmp.j2ixdL: line 47: 581 Illegal instruction (core dumped) PYTHONFAULTHANDER=1 python3 -c 'import numpy' Expected Results: + python3 -c 'import numpy' [no error output, build continues…] I tried exporting PYTHONFAULTHANDLER=1 in the reproducer above, but didn’t get anything useful. However, when I tried exporting PYTHONFAULTHANDLER=1 in %check in the flatbuffers package, I did get something a little more useful: + PYTHONFAULTHANDLER=1 + ./tests/PythonTest.sh Testing with interpreter: /usr/bin/python3 Fatal Python error: Illegal instruction Current thread 0x00007fff9a6d3c20 (most recent call first): File "/usr/lib64/python3.13/site-packages/numpy/_core/getlimits.py", line 181 in _register_known_types File "/usr/lib64/python3.13/site-packages/numpy/__init__.py", line 298 in <module> File "<frozen importlib._bootstrap>", line 488 in _call_with_frames_removed File "<frozen importlib._bootstrap_external>", line 1026 in exec_module File "<frozen importlib._bootstrap>", line 935 in _load_unlocked File "<frozen importlib._bootstrap>", line 1331 in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 1360 in _find_and_load File "/builddir/build/BUILD/flatbuffers-24.3.25-build/BUILDROOT/usr/lib/python3.13/site-packages/flatbuffers/compat.py", line 71 in import_numpy File "/builddir/build/BUILD/flatbuffers-24.3.25-build/BUILDROOT/usr/lib/python3.13/site-packages/flatbuffers/number_types.py", line 21 in <module> File "<frozen importlib._bootstrap>", line 488 in _call_with_frames_removed File "<frozen importlib._bootstrap_external>", line 1026 in exec_module File "<frozen importlib._bootstrap>", line 935 in _load_unlocked File "<frozen importlib._bootstrap>", line 1331 in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 1360 in _find_and_load File "<frozen importlib._bootstrap>", line 488 in _call_with_frames_removed File "<frozen importlib._bootstrap>", line 1415 in _handle_fromlist File "/builddir/build/BUILD/flatbuffers-24.3.25-build/BUILDROOT/usr/lib/python3.13/site-packages/flatbuffers/builder.py", line 15 in <module> File "<frozen importlib._bootstrap>", line 488 in _call_with_frames_removed File "<frozen importlib._bootstrap_external>", line 1026 in exec_module File "<frozen importlib._bootstrap>", line 935 in _load_unlocked File "<frozen importlib._bootstrap>", line 1331 in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 1360 in _find_and_load File "/builddir/build/BUILD/flatbuffers-24.3.25-build/BUILDROOT/usr/lib/python3.13/site-packages/flatbuffers/__init__.py", line 15 in <module> File "<frozen importlib._bootstrap>", line 488 in _call_with_frames_removed File "<frozen importlib._bootstrap_external>", line 1026 in exec_module File "<frozen importlib._bootstrap>", line 935 in _load_unlocked File "<frozen importlib._bootstrap>", line 1331 in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 1360 in _find_and_load File "/builddir/build/BUILD/flatbuffers-24.3.25-build/flatbuffers-24.3.25/tests/py_test.py", line 27 in <module> I’m not sure if the illegal instruction is always at File "/usr/lib64/python3.13/site-packages/numpy/_core/getlimits.py", line 181 in _register_known_types but it’s something.
Ok, so I though that since simply importing numpy at the beginning of %build reproduced the issue, this *had* to be a general issue. However, I just tried my reproduction instructions with python-fast-simplification and everything worked fine. I’m going to reassign this to flatbuffers until I can figure out why I can’t yet reproduce it elsewhere.
(In reply to Ben Beasley from comment #0) > RPM build errors: > /var/tmp/rpm-tmp.j2ixdL: line 47: 581 Illegal instruction (core > dumped) PYTHONFAULTHANDER=1 python3 -c 'import numpy' I'm pretty sure you copied the output above, not typing it over. Why does it say PYTHONFAULTHANDER? Note the missing 'L' in 'HANDER'. Does that come from the spec file? It would explain: > I tried exporting PYTHONFAULTHANDLER=1 in the reproducer above, but didn’t > get anything useful.
(In reply to Sandro from comment #2) > I'm pretty sure you copied the output above, not typing it over. Why does it > say PYTHONFAULTHANDER? Note the missing 'L' in 'HANDER'. Does that come from > the spec file? It would explain: Great catch! I’ll try it again. Also, I missed something the first time I tried to reproduce this with python-fast-simplification. It turns out that simply building its spec file from Rawhide in COPR does reproduce this, as numpy is imported in %generate_buildrequires and the illegal instruction is invoked: https://copr.fedorainfracloud.org/coprs/music/numpy-ppc64le/build/8447676/ I’m therefore comfortable reassigning this back to numpy.
I crafted a minimal spec file to reproduce this: ---- numpy-ppc64le-demo.spec: Name: numpy-ppc64le-demo Version: 0 Release: 1%{?dist} Summary: Demonstrates failure to import numpy on ppc64le COPR builders License: WTFPL BuildRequires: python3-devel BuildRequires: python3dist(numpy) %description %{summary}. %build PYTHONFAULTHANDLER=1 python3 -c 'import numpy' %changelog * Wed Dec 25 2024 Benjamin A. Beasley <code> - 0-1 - Initial spec file ---- I built this in COPR, https://copr.fedorainfracloud.org/coprs/music/numpy-ppc64le/build/8447681/ and got (on ppc64le only): + PYTHONFAULTHANDLER=1 + python3 -c 'import numpy' Fatal Python error: Illegal instruction Current thread 0x00007fff8a893c20 (most recent call first): File "/usr/lib64/python3.13/site-packages/numpy/_core/getlimits.py", line 181 in _register_known_types File "/usr/lib64/python3.13/site-packages/numpy/__init__.py", line 298 in <module> File "<frozen importlib._bootstrap>", line 488 in _call_with_frames_removed File "<frozen importlib._bootstrap_external>", line 1026 in exec_module File "<frozen importlib._bootstrap>", line 935 in _load_unlocked File "<frozen importlib._bootstrap>", line 1331 in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 1360 in _find_and_load File "<string>", line 1 in <module> Extension modules: numpy._core._multiarray_umath, numpy.linalg._umath_linalg (total: 2) /var/tmp/rpm-tmp.q9gCBz: line 46: 41 Illegal instruction (core dumped) PYTHONFAULTHANDLER=1 python3 -c 'import numpy'
Does a scratch build on koji have the same problem?
(In reply to Gwyn Ciesla from comment #5) > Does a scratch build on koji have the same problem? No, I can only reproduce this in COPR. For example, the scratch build for https://src.fedoraproject.org/rpms/flatbuffers/pull-request/12 is just fine. It would seem that the compiled numpy code probably includes an instruction that the koji builder hardware supports but the COPR builder hardware doesn’t. (I’m not at a computer, but my recollection is that COPR builders claim to be POWER8, and real koji builders claim to be POWER9.) I haven’t put much time into investigating which instruction, or what our ppc64le architectural baseline should be, or what the root cause is.
Spec line 155: #fix flags for ELN ppc64le %ifarch ppc64le && 0%{?rhel} >= 10 find . -type f -print0 | xargs -0 sed -i s/mcpu=power8/mcpu=power9/ %endif Is there another conditional we can check to NOT do this on copr?
Copr sets some custom macros: https://docs.pagure.org/copr.copr/user_documentation.html#rpm-macros That page suggests: %if 0%{?copr_projectname:1} # This happens only in Copr %endif
(In reply to Gwyn Ciesla from comment #7) > Spec line 155: > > #fix flags for ELN ppc64le > %ifarch ppc64le && 0%{?rhel} >= 10 > find . -type f -print0 | xargs -0 sed -i s/mcpu=power8/mcpu=power9/ > %endif I'm not sure this will help, though. Looking at the most recent rawhide build, the build log has `-mcpu=power8 -mtune=power8`, which should satisfy the Copr builders. Above "hack" is for ELN which uses a different baseline. Even with numpy build on Copr, which claims to be POWER8 and thus build and run time CPU match, the issue is still present.
(In reply to Gwyn Ciesla from comment #7) > Spec line 155: > > #fix flags for ELN ppc64le > %ifarch ppc64le && 0%{?rhel} >= 10 > find . -type f -print0 | xargs -0 sed -i s/mcpu=power8/mcpu=power9/ > %endif > > > Is there another conditional we can check to NOT do this on copr? Additionally, I would note that I am testing this in Rawhide chroots on COPR, not RHEL10, so this conditional is false and the package is theoretically being built for a POWER8 baseline.
Numpy appears to support runtime dispatch of VSX3 (POWER9) and VSX10 (POWER10) instructions. Maybe something is wrong with the dispatcher. I wish I could run https://numpy.org/doc/stable/reference/generated/numpy.lib.introspect.opt_func_info.html, $ python3 -c 'import numpy; print(numpy.lib.introspect.opt_func_info())' on the COPR builders, but since numpy explodes during import, I guess I can’t do that.
You could run this on Koji, though. Since Koji builds with POWER8 (`mcpu` and `mtune`), it shouldn't show any POWER9 instructions. Unless...
(In reply to Sandro from comment #12) > You could run this on Koji, though. Since Koji builds with POWER8 (`mcpu` > and `mtune`), it shouldn't show any POWER9 instructions. Unless... Well, ideally it would show POWER9 implementations as “current,” and POWER8 and POWER10 implementations as “available,” because it’s supposed to support *runtime* dispatch.
I thought limiting the optimizations during build would make anything above the baseline unavailable. I guess enabling core dumps and inspecting those may be the only option then. Since Copr allows SSH access to the builders, this should be doable.
I tinkered with it on a Copr ppc64le builder. It gets more interesting. First, I tried importing numpy in order to get a core dump. Nope. No error. No core dump. Then I thought, let's see what above introspect has to say: # python3 -c 'import numpy; print(numpy.lib.introspect.opt_func_info())' Traceback (most recent call last): File "<string>", line 1, in <module> import numpy; print(numpy.lib.introspect.opt_func_info()) ^^^^^^^^^^^^^^^^^^^^ File "/usr/lib64/python3.13/site-packages/numpy/lib/__init__.py", line 90, in __getattr__ raise AttributeError("module {!r} has no attribute " "{!r}".format(__name__, attr)) AttributeError: module 'numpy.lib' has no attribute 'introspect'
(In reply to Sandro from comment #15) > AttributeError: module 'numpy.lib' has no attribute 'introspect' Are you sure this is Rawhide / numpy 2? I don’t think numpy 1 has numpy.lib.introspect.
:facepalm: My bad. You are right. The host (the actual builder) is not rawhide. It's F41. Let me try again.
I need to dedicate some more time to it. I may also have to bug Copr folks. I'll give it another shot another time unless someone else beats me to it or discovers the cause of the issue before that. It will definitely be next year. ;)
Here is the backtrace from the core dump: #0 HALF_exp2 (args=<optimized out>, dimensions=<optimized out>, steps=<optimized out>, __NPY_UNUSED_TAGGEDdata=<optimized out>) at ../numpy/_core/src/umath/loops_umath_fp.dispatch.c.src:182 #1 0x00007fffbcbf615c in generic_wrapped_legacy_loop (__NPY_UNUSED_TAGGEDcontext=<optimized out>, data=<optimized out>, dimensions=<optimized out>, strides=<optimized out>, auxdata=<optimized out>) at ../numpy/_core/src/umath/legacy_array_method.c:98 #2 0x00007fffbcc0d2f0 in try_trivial_single_output_loop (context=0x7fffc0b6f0c0, op=0x7fffc0b6f7e0, order=<optimized out>, errormask=<optimized out>) at ../numpy/_core/src/umath/ufunc_object.c:969 #3 PyUFunc_GenericFunctionInternal (ufunc=<optimized out>, ufuncimpl=<optimized out>, operation_descrs=0x7fffc0b6f3e0, op=0x7fffc0b6f7e0, casting=NPY_SAME_KIND_CASTING, order=<optimized out>, wheremask=0x0) at ../numpy/_core/src/umath/ufunc_object.c:2237 #4 ufunc_generic_fastcall (ufunc=<optimized out>, args=<optimized out>, len_args=<optimized out>, kwnames=<optimized out>, outer=<optimized out>) at ../numpy/_core/src/umath/ufunc_object.c:4530 #5 0x00007fffbd7c9e30 in _PyObject_VectorcallTstate (tstate=0x7fffbdcbe2b8 <_PyRuntime+282952>, callable=0x7fffbcffc440, args=0x7fffbd120580, nargsf=9223372036854775809, kwnames=0x0) at /usr/src/debug/python3.13-3.13.1-2.fc42.ppc64le/Include/internal/pycore_call.h:168 #6 PyObject_Vectorcall (callable=0x7fffbcffc440, args=0x7fffbd120580, nargsf=9223372036854775809, kwnames=0x0) at /usr/src/debug/python3.13-3.13.1-2.fc42.ppc64le/Objects/call.c:327 #7 0x00007fffbd7e7000 in _PyEval_EvalFrameDefault (tstate=0x7fffbdcbe2b8 <_PyRuntime+282952>, frame=0x7fffbd120580, throwflag=<optimized out>) at /usr/src/debug/python3.13-3.13.1-2.fc42.ppc64le/Python/generated_cases.c.h:813 #8 0x00007fffbd9124ac in PyEval_EvalCode (co=0x10039852eb0, globals=<optimized out>, locals=0x7fffbd0500c0) at /usr/src/debug/python3.13-3.13.1-2.fc42.ppc64le/Python/ceval.c:601 #9 0x00007fffbd9412b4 in builtin_exec_impl (module=<optimized out>, source=0x10039852eb0, globals=0x7fffbd0500c0, locals=0x7fffbd0500c0, closure=<optimized out>) at /usr/src/debug/python3.13-3.13.1-2.fc42.ppc64le/Python/bltinmodule.c:1145 #10 builtin_exec (module=<optimized out>, args=<optimized out>, nargs=<optimized out>, kwnames=<optimized out>) at /usr/src/debug/python3.13-3.13.1-2.fc42.ppc64le/Python/clinic/bltinmodule.c.h:556 #11 0x00007fffbd806dfc in cfunction_vectorcall_FASTCALL_KEYWORDS (func=0x7fffbd24dd50, args=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at /usr/src/debug/python3.13-3.13.1-2.fc42.ppc64le/Objects/methodobject.c:441 #12 0x00007fffbd92a514 in _PyVectorcall_Call (tstate=0x7fffbdcbe2b8 <_PyRuntime+282952>, func=<optimized out>, callable=0x7fffbd24dd50, tuple=0x7fffbd0b7a00, kwargs=<optimized out>) at /usr/src/debug/python3.13-3.13.1-2.fc42.ppc64le/Objects/call.c:273 #13 _PyObject_Call (tstate=0x7fffbdcbe2b8 <_PyRuntime+282952>, callable=0x7fffbd24dd50, args=0x7fffbd0b7a00, kwargs=<optimized out>) at /usr/src/debug/python3.13-3.13.1-2.fc42.ppc64le/Objects/call.c:348 #14 0x00007fffbd7e83bc in PyObject_Call (callable=<optimized out>, args=<optimized out>, kwargs=<optimized out>) at /usr/src/debug/python3.13-3.13.1-2.fc42.ppc64le/Objects/call.c:373 #15 PyCFunction_Call (callable=<optimized out>, args=<optimized out>, kwargs=<optimized out>) at /usr/src/debug/python3.13-3.13.1-2.fc42.ppc64le/Objects/call.c:381 #16 _PyEval_EvalFrameDefault (tstate=0x7fffbdcbe2b8 <_PyRuntime+282952>, frame=0x1, throwflag=<optimized out>) at /usr/src/debug/python3.13-3.13.1-2.fc42.ppc64le/Python/generated_cases.c.h:1355 #17 0x00007fffbd812f68 in _PyObject_VectorcallTstate (tstate=0x7fffbdcbe2b8 <_PyRuntime+282952>, callable=0x7fffbd280400, args=0x7fffc0b74a40, nargsf=<optimized out>, kwnames=0x0) at /usr/src/debug/python3.13-3.13.1-2.fc42.ppc64le/Include/internal/pycore_call.h:168 #18 object_vacall (tstate=0x7fffbdcbe2b8 <_PyRuntime+282952>, base=<optimized out>, callable=0x7fffbd280400, vargs=<optimized out>, vargs@entry=0x7fffc0b74b30 "\340\250\t\275\377\177") at /usr/src/debug/python3.13-3.13.1-2.fc42.ppc64le/Objects/call.c:819 #19 0x00007fffbd8665c0 in PyObject_CallMethodObjArgs (obj=<optimized out>, name=<optimized out>) at /usr/src/debug/python3.13-3.13.1-2.fc42.ppc64le/Objects/call.c:880 #20 0x00007fffbd8640b4 in import_find_and_load (tstate=0x7fffbdcbe2b8 <_PyRuntime+282952>, abs_name=0x7fffbd09a8e0) at /usr/src/debug/python3.13-3.13.1-2.fc42.ppc64le/Python/import.c:3692 #21 PyImport_ImportModuleLevelObject (name=0x7fffbd09a8e0, globals=<optimized out>, locals=<optimized out>, fromlist=0x7fffbdc4e0f0 <_Py_NoneStruct>, level=<optimized out>) at /usr/src/debug/python3.13-3.13.1-2.fc42.ppc64le/Python/import.c:3774 #22 0x00007fffbd7eabe8 in import_name (tstate=<optimized out>, frame=<optimized out>, name=<optimized out>, fromlist=<optimized out>, level=<optimized out>) at /usr/src/debug/python3.13-3.13.1-2.fc42.ppc64le/Python/ceval.c:2698 #23 _PyEval_EvalFrameDefault (tstate=0x7fffbdcbe2b8 <_PyRuntime+282952>, frame=0x7fffbdc7c7d8 <_PyRuntime+13928>, throwflag=<optimized out>) at /usr/src/debug/python3.13-3.13.1-2.fc42.ppc64le/Python/generated_cases.c.h:3201 #24 0x00007fffbd9124ac in PyEval_EvalCode (co=0x7fffbd0a5990, globals=<optimized out>, locals=0x7fffbd050040) at /usr/src/debug/python3.13-3.13.1-2.fc42.ppc64le/Python/ceval.c:601 #25 0x00007fffbd950ce0 in run_eval_code_obj (tstate=tstate@entry=0x7fffbdcbe2b8 <_PyRuntime+282952>, co=co@entry=0x7fffbd0a5990, globals=globals@entry=0x7fffbd050040, locals=locals@entry=0x7fffbd050040) at /usr/src/debug/python3.13-3.13.1-2.fc42.ppc64le/Python/pythonrun.c:1337 #26 0x00007fffbd946314 in run_mod (mod=mod@entry=0x1003984a318, filename=filename@entry=0x7fffbd050130, globals=globals@entry=0x7fffbd050040, locals=locals@entry=0x7fffbd050040, flags=flags@entry=0x7fffc0b75138, arena=arena@entry=0x7fffbd1cbcd0, interactive_src=interactive_src@entry=0x7fffbd0501b0, generate_new_source=generate_new_source@entry=0) at /usr/src/debug/python3.13-3.13.1-2.fc42.ppc64le/Python/pythonrun.c:1422 #27 0x00007fffbd92f8b8 in _PyRun_StringFlagsWithName (str=str@entry=0x7fffbd09a840 "import numpy\n", name=name@entry=0x7fffbd050130, start=start@entry=257, globals=globals@entry=0x7fffbd050040, locals=locals@entry=0x7fffbd050040, flags=flags@entry=0x7fffc0b75138, generate_new_source=0) at /usr/src/debug/python3.13-3.13.1-2.fc42.ppc64le/Python/pythonrun.c:1221 #28 0x00007fffbd92f748 in _PyRun_SimpleStringFlagsWithName (command=0x7fffbd09a840 "import numpy\n", name=name@entry=0x7fffbda7fae8 "<string>", flags=flags@entry=0x7fffc0b75138) at /usr/src/debug/python3.13-3.13.1-2.fc42.ppc64le/Python/pythonrun.c:547 #29 0x00007fffbd99091c in pymain_run_command (command=<optimized out>) at /usr/src/debug/python3.13-3.13.1-2.fc42.ppc64le/Modules/main.c:253 #30 pymain_run_python (exitcode=0x7fffc0b75104) at /usr/src/debug/python3.13-3.13.1-2.fc42.ppc64le/Modules/main.c:687 #31 Py_RunMain () at /usr/src/debug/python3.13-3.13.1-2.fc42.ppc64le/Modules/main.c:775 #32 0x00007fffbd8f7174 in Py_BytesMain (argc=<optimized out>, argv=<optimized out>) at /usr/src/debug/python3.13-3.13.1-2.fc42.ppc64le/Modules/main.c:829 #33 0x0000000116840918 in main (argc=<optimized out>, argv=<optimized out>) at /usr/src/debug/python3.13-3.13.1-2.fc42.ppc64le/Programs/python.c:15 And the inner most frames: #0 HALF_exp2 (args=<optimized out>, dimensions=<optimized out>, steps=<optimized out>, __NPY_UNUSED_TAGGEDdata=<optimized out>) at ../numpy/_core/src/umath/loops_umath_fp.dispatch.c.src:182 182 *((npy_half *)op1) = npy_float_to_half(npy_@intrin@f(in1)); #1 0x00007fffbcbf615c in generic_wrapped_legacy_loop (__NPY_UNUSED_TAGGEDcontext=<optimized out>, data=<optimized out>, dimensions=<optimized out>, strides=<optimized out>, auxdata=<optimized out>) at ../numpy/_core/src/umath/legacy_array_method.c:98 98 ldata->loop((char **)data, dimensions, strides, ldata->user_data); #2 0x00007fffbcc0d2f0 in try_trivial_single_output_loop (context=0x7fffc0b6f0c0, op=0x7fffc0b6f7e0, order=<optimized out>, errormask=<optimized out>) at ../numpy/_core/src/umath/ufunc_object.c:969 969 int res = strided_loop(context, data, &count, fixed_strides, auxdata); #3 PyUFunc_GenericFunctionInternal (ufunc=<optimized out>, ufuncimpl=<optimized out>, operation_descrs=0x7fffc0b6f3e0, op=0x7fffc0b6f7e0, casting=NPY_SAME_KIND_CASTING, order=<optimized out>, wheremask=0x0) at ../numpy/_core/src/umath/ufunc_object.c:2237 2237 int retval = try_trivial_single_output_loop(&context, I'm not very familiar with debugging beyond the basics. But I'd happily run some more commands on request.
This showed up in the impact check for a python-hypothesis update: https://src.fedoraproject.org/rpms/python-hypothesis/pull-request/31#comment-238001
Btw, building NumPy without optimization with either `-Dbuildtype=debug` or `-Ddisable-optimization=true` makes the issue go away. I tried this hoping for more information in the backtrace. Alas, the core dump no longer occurred.
This is probably a duplicate of bug 2336127, and fixed in the latest build?
I can confirm that NumPy no longer explodes on import in Copr. Though, I have my doubts, for now, that the PR mentioned in bug 2336127 actually solved it.
I'm going to close this as fixed since https://src.fedoraproject.org/rpms/numpy/pull-request/51 did fix the issue. Though, as mentioned in bug 2336127 comment 7, I believe there's a bug in NumPy as well. If someone could confirm that, I'd be happy to report it upstream. If we could reproduce it outside mock even better.