In 2024-12-21, we started to see an MLIR test failure on LLVM daily snapshots running on Rawhide on Power8. We can only reproduce this issue on Rawhide. After investigating, I found a numpy function using a Power9/Power ISA 3.0 instruction (mtvsrws): Disassembly: (gdb) disas Dump of assembler code for function HALF_exp2(char**, npy_intp const*, npy_intp const*, void*): ... 0x00007fffe3bad210 <+160>: addis r9,r9,14336 0x00007fffe3bad214 <+164>: add r7,r7,r9 => 0x00007fffe3bad218 <+168>: mtvsrws vs1,r7 0x00007fffe3bad21c <+172>: xscvspdpn vs1,vs1 0x00007fffe3bad220 <+176>: bl 0x7fffe38b3580 <0000001a.plt_call.exp2f@@GLIBC_2.27> Backtrace: (gdb) bt #0 HALF_exp2 (args=<optimized out>, dimensions=<optimized out>, steps=<optimized out>, __NPY_UNUSED_TAGGEDdata=<optimized out>) at ../numpy/_core/src/umath/loops_umath_fp.dispatch.c.src:182 #1 0x00007fffe3af615c in generic_wrapped_legacy_loop (__NPY_UNUSED_TAGGEDcontext=<optimized out>, data=<optimized out>, dimensions=<optimized out>, strides=<optimized out>, auxdata=<optimized out>) at ../numpy/_core/src/umath/legacy_array_method.c:98 #2 0x00007fffe3b0d2f0 in try_trivial_single_output_loop (context=0x7fffffff8410, op=0x7fffffff8b30, order=<optimized out>, errormask=<optimized out>) at ../numpy/_core/src/umath/ufunc_object.c:969 #3 PyUFunc_GenericFunctionInternal (ufunc=<optimized out>, ufuncimpl=<optimized out>, operation_descrs=0x7fffffff8730, op=0x7fffffff8b30, casting=NPY_SAME_KIND_CASTING, order=<optimized out>, wheremask=0x0) at ../numpy/_core/src/umath/ufunc_object.c:2237 #4 ufunc_generic_fastcall (ufunc=<optimized out>, args=<optimized out>, len_args=<optimized out>, kwnames=<optimized out>, outer=<optimized out>) at ../numpy/_core/src/umath/ufunc_object.c:4530 #5 0x00007ffff79e9e30 in PyObject_Vectorcall () from /lib64/libpython3.13.so.1.0 Reproducible: Always
I proposed a fix here: https://src.fedoraproject.org/rpms/numpy/pull-request/51
FEDORA-2025-adaf2943f9 (numpy-2.2.1-2.fc42) has been submitted as an update to Fedora 42. https://bodhi.fedoraproject.org/updates/FEDORA-2025-adaf2943f9
FEDORA-2025-adaf2943f9 (numpy-2.2.1-2.fc42) has been pushed to the Fedora 42 stable repository. If problem still persists, please make note of it in this bug report.
(In reply to Tulio Magno Quites Machado Filho from comment #1) > I proposed a fix here: > https://src.fedoraproject.org/rpms/numpy/pull-request/51 This doesn't make a difference. If you grep through `build.log` for `mcpu=power`, you will only find `-mcpu=power8` for the Fedora rawhide ppc64le build and only `-mcpu=power9` for the ELN ppc64le build - both before and after that PR was merged. Or is there more going on behind the scenes? Interestingly, it does appear to resolve bug 2334097. So, the question "how so?" becomes intriguing.
The prior build was not using verbose compilation, so you wouldn't see the patched compile arguments. All you are seeing is the default build flags being set before the build. I also don't think this patching is correct, but because it is just a blunt-force sed. NumPy is using CPU-dispatching, so forcing a file to Power9 when it was attempting to build a file as Power8 doesn't make sense. You'll just get two files that run Power9, with NumPy thinking one of them should be Power8 (hence the crashing.) If you don't want it to even try to dispatch to Power8, then you should set the cpu-baseline option: https://numpy.org/doc/stable/reference/simd/build-options.html
Indeed. Enabling verbose output during compilation shows many more occurrences of `mcpu=power`. Looking more closely at the latest ppc64le rawhide `build.log`[1], I notice six occurrences of `-mcpu=power9` and another five occurrences of `-mcpu=power10` still. I'm not sure where that leaves us. I'll be running a few test builds with `cpu-baseline` and `cpu-dispatch` and compare the results to what we have now. At least for fedora the default of 'baseline: min+detect' appears to do the right thing. It selects VSX and VSX2 as baseline and dispatches VSX3 and VSX4. I suppose the power9/power10 occurrences noted above are related to the dispatched optimizations. [1] I wasn't looking very closely, before, when I stated that nothing had changed. I blame low caffeine levels.
(In reply to Elliott Sales de Andrade from comment #5) > I also don't think this patching is correct, but because it is just a > blunt-force sed. NumPy is using CPU-dispatching, so forcing a file to Power9 > when it was attempting to build a file as Power8 doesn't make sense. You'll > just get two files that run Power9, with NumPy thinking one of them should > be Power8 (hence the crashing.) > > If you don't want it to even try to dispatch to Power8, then you should set > the cpu-baseline option: > https://numpy.org/doc/stable/reference/simd/build-options.html Having run a few test builds and having played with `cpu-baseline` and `cpu-dispatch`, I have come to the conclusion that the applied patch is correct - at least in our build environment and considering the results below. On rhel >= 10 `-mcpu=power9 -mtune=power10` is set in the build flags. According to the build options that you linked in comment 5, I should be able to achieve the same with defining `-Csetup-args=-Dcpu-baseline="vsx3"`. I tried just that and it fails for both Fedora and ELN with the same error thrown in two places. Looking at the output of the ELN build, which uses `-mcpu=power9` and `-mtune-power10` by default, you can observe that NumPy is throwing in a `-mcpu=power8` which overrules the settings from the build flags, exactly as assumed in bug 2332211 comment 2: [91/342] g++ -Inumpy/_core/libhighway_qsort_16bit.dispatch.h_VSX2.a.p -Inumpy/_core -I../numpy/_core -Inumpy/_core/include -I../numpy/_core/include -I../numpy/_core/src/common -I../numpy/_core/src/multiarray -I../numpy/_core/src/npymath -I../numpy/_core/src/umath -I../numpy/_core/src/highway -I/usr/include/python3.12 -I/builddir/build/BUILD/numpy-2.2.0/.mesonpy-wkjt5lbq/meson_cpu -fdiagnostics-color=always -DNDEBUG -Wall -Winvalid-pch -std=c++17 -O3 -mcpu=power9 -DNPY_HAVE_VSX -DNPY_HAVE_VSX_ASM -DNPY_HAVE_VSX3 -DNPY_HAVE_VSX3_HALF_DOUBLE -O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-U_FORTIFY_SOURCE,-D_FORTIFY_SOURCE=3 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mcpu=power9 -mtune=power10 -fasynchronous-unwind-tables -fstack-clash-protection -fPIC -DNPY_INTERNAL_BUILD -DHAVE_NPY_CONFIG_H -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE=1 -D_LARGEFILE64_SOURCE=1 -fno-exceptions -fno-rtti -O3 -DNPY_HAVE_VSX2 -mcpu=power8 -DNPY_MTARGETS_CURRENT=VSX2 -MD -MQ numpy/_core/libhighway_qsort_16bit.dispatch.h_VSX2.a.p/src_npysort_highway_qsort_16bit.dispatch.cpp.o -MF numpy/_core/libhighway_qsort_16bit.dispatch.h_VSX2.a.p/src_npysort_highway_qsort_16bit.dispatch.cpp.o.d -o numpy/_core/libhighway_qsort_16bit.dispatch.h_VSX2.a.p/src_npysort_highway_qsort_16bit.dispatch.cpp.o -c ../numpy/_core/src/npysort/highway_qsort_16bit.dispatch.cpp FAILED: numpy/_core/libhighway_qsort_16bit.dispatch.h_VSX2.a.p/src_npysort_highway_qsort_16bit.dispatch.cpp.o g++ -Inumpy/_core/libhighway_qsort_16bit.dispatch.h_VSX2.a.p -Inumpy/_core -I../numpy/_core -Inumpy/_core/include -I../numpy/_core/include -I../numpy/_core/src/common -I../numpy/_core/src/multiarray -I../numpy/_core/src/npymath -I../numpy/_core/src/umath -I../numpy/_core/src/highway -I/usr/include/python3.12 -I/builddir/build/BUILD/numpy-2.2.0/.mesonpy-wkjt5lbq/meson_cpu -fdiagnostics-color=always -DNDEBUG -Wall -Winvalid-pch -std=c++17 -O3 -mcpu=power9 -DNPY_HAVE_VSX -DNPY_HAVE_VSX_ASM -DNPY_HAVE_VSX3 -DNPY_HAVE_VSX3_HALF_DOUBLE -O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-U_FORTIFY_SOURCE,-D_FORTIFY_SOURCE=3 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mcpu=power9 -mtune=power10 -fasynchronous-unwind-tables -fstack-clash-protection -fPIC -DNPY_INTERNAL_BUILD -DHAVE_NPY_CONFIG_H -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE=1 -D_LARGEFILE64_SOURCE=1 -fno-exceptions -fno-rtti -O3 -DNPY_HAVE_VSX2 -mcpu=power8 -DNPY_MTARGETS_CURRENT=VSX2 -MD -MQ numpy/_core/libhighway_qsort_16bit.dispatch.h_VSX2.a.p/src_npysort_highway_qsort_16bit.dispatch.cpp.o -MF numpy/_core/libhighway_qsort_16bit.dispatch.h_VSX2.a.p/src_npysort_highway_qsort_16bit.dispatch.cpp.o.d -o numpy/_core/libhighway_qsort_16bit.dispatch.h_VSX2.a.p/src_npysort_highway_qsort_16bit.dispatch.cpp.o -c ../numpy/_core/src/npysort/highway_qsort_16bit.dispatch.cpp In file included from ../numpy/_core/src/common/common.hpp:10, from ../numpy/_core/src/npysort/highway_qsort.hpp:6, from ../numpy/_core/src/npysort/highway_qsort_16bit.dispatch.cpp:1: ../numpy/_core/src/common/half.hpp: In member function ‘np::Half::operator float() const’: ../numpy/_core/src/common/half.hpp:95:54: error: ‘__builtin_vsx_vextract_fp_from_shorth’ requires the ‘-mcpu=power9’ and ‘-mvsx’ options 95 | return vec_extract(vec_extract_fp_from_shorth(vec_splats(bits_)), 0); | ^ ../numpy/_core/src/common/half.hpp:95:54: note: overloaded builtin ‘__builtin_vec_vextract_fp_from_shorth’ is implemented by builtin ‘__builtin_vsx_vextract_fp_from_shorth’ The output above is without the rhel tweak, but with a SVX3 baseline. It fails for Fedora the same way, except that the output shows Fedora's build flags before NumPy's overruling flags. I'd appreciate a second pair of eyes. But it looks to me like this is a bug in NumPy since NumPy enforces power8 where it needs power9. It should be easy for upstream to reproduce by simply passing `-Csetup-args=-Dcpu-baseline="svx3"`.
Sandro, IMHO it's OK for a project to have part of the files built with a processor-specific compiler flag. I could not spot any wrong usage of those flags yet, but I'm not an expert in numpy. I have confirmed the patch I proposed did fix the issue we were seeing on LLVM/MLIR. Log of the build: https://copr.fedorainfracloud.org/coprs/g/fedora-llvm-team/llvm-snapshots-big-merge-20250113/build/8507960/ Could you elaborate what is the issue you're seeing?
(In reply to Tulio Magno Quites Machado Filho from comment #8) > Could you elaborate what is the issue you're seeing? The only issue I'm seeing is that what upstream suggests doing is not working. https://numpy.org/doc/stable/reference/simd/build-options.html Upstream documents the use of build options for enabling / limiting processor specific options. Elliott suggested using those instead of overwriting build flags. I kind of agree. At the same time I found out that the mechanism upstream suggests, does not work. I think the package maintainer, or someone knowledgeable enough, should report that upstream. I'd be willing to do that, if someone is able to confirm my findings. Downstream the issue is solved for now. We could revisit the solution if upstream's build options offer a working alternative.