Description of problem: I'm testing builds of openmpi 3.1 in a COPR. I'm seeing many tests fail on Fedora Rawhide x86_64 with Illegal Instructions errors. Backtrace: /lib64/libpsm2.so.2(+0x46c14)[0x7f013b9afc14] /lib64/libpsm2.so.2(+0x46eb5)[0x7f013b9afeb5] /lib64/libpsm2.so.2(+0x4bbcb)[0x7f013b9b4bcb] /lib64/libpsm2.so.2(psm2_init+0x221)[0x7f013b98be61] /lib64/libfabric.so.1(+0xc846f)[0x7f013b27846f] /lib64/libfabric.so.1(fi_getinfo+0x296)[0x7f013b1c79c6] /usr/lib64/openmpi/lib/openmpi/mca_mtl_ofi.so(+0x579a)[0x7f013b8e479a] /usr/lib64/openmpi/lib/libmpi.so.40(ompi_mtl_base_select+0xa4)[0x7f01423bffb4] /usr/lib64/openmpi/lib/openmpi/mca_pml_cm.so(+0x5cee)[0x7f013ba9acee] /usr/lib64/openmpi/lib/libmpi.so.40(mca_pml_base_select+0x1e4)[0x7f01423c88e4] /usr/lib64/openmpi/lib/libmpi.so.40(ompi_mpi_init+0x6ba)[0x7f01423561fa] /usr/lib64/openmpi/lib/libmpi.so.40(MPI_Init+0x72)[0x7f0142385a72] ../pddrive(+0xfe4c)[0x55cf75f2ce4c] /lib64/libc.so.6(__libc_start_main+0xf3)[0x7f0141e3cee3] ../pddrive(+0x103fe)[0x55cf75f2d3fe] That address appears to contain an AVX2 instruction: 46c14: c5 f9 ef c0 vpxor %xmm0,%xmm0,%xmm0 Is this a bug in libpsm2 incorrectly trying to call AVX2 code, or perhaps libfabric incorrectly trying to use libpsm2 on non-AVX2 capable hardware. Or something else. Version-Release number of selected component (if applicable): libpsm2-11.2.23-1.fc30.x86_64 Additional info: I'm unable to reproduce this error outside of COPR, so perhaps it's triggered by something specific about the COPR hardware, which seems to be: model name : Intel(R) Xeon(R) CPU X5690 @ 3.47GHz flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl cpuid pni pclmulqdq vmx ssse3 cx16 pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes hypervisor lahf_lm pti tpr_shadow vnmi flexpriority ept vpid tsc_adjust arat
Can you please provide a reproducer?
If you make use of https://copr.fedorainfracloud.org/coprs/g/scitech/openmpi3.1/ on Fedora Rawhide and then try to build superlu_dist, that is what the above backtrace is from. In %check it runs: mpirun -n 4 ../pddrive -r 2 -c 2 g20.rua which fails. If running on less than 4 cores, you'll need to add: export OMPI_MCA_rmaps_base_oversubscribe=1 If you apply for membership in the FAS group scitech, you can submit builds in the COPR in case you cannot reproduce locally (as I was not).
I see that libpsm2-11.2.23-1.fc30.x86_64 is built with -march=avx2. That seems wrong for a general purpose x86_64 library.https://kojipkgs.fedoraproject.org//packages/libpsm2/11.2.23/1.fc30/data/logs/x86_64/build.log
The source assumes incorrectly that the run-time hardware will be at least as capable as the compile-time hardware. buildflags.mak (included from Makefile): # # test if compiler supports 32B(AVX2)/64B(AVX512F) move instruction. # ifeq (${CC},icc) MAVX2=-march=core-avx2 -DPSM_AVX512 else MAVX2=-mavx2 endif RET := $(shell echo "int main() {}" | ${CC} ${MAVX2} -E -dM -xc - 2>&1 | grep -q AVX2 ; echo $$?) ifeq (0,${RET}) BASECFLAGS += ${MAVX2} else $(error Compiler does not support AVX2 ) endif Fix: delete all those lines, and also the lines which test for -mavx512f.
Indeed, compiling with non-Fedora-mandated compiler flags should be avoided and needs justification. Assuming the code doesn't support runtime-CPU-detection, you could try building twice, once for vanilla x86_64 (without -march) and second time with -mavx2/-mavx512f and putting the AVX-enabled binaries in /usr/lib64/haswell/ or /usr/lib64/haswell/avx512_1/. See: https://clearlinux.org/blogs/transparent-use-library-packages-optimized-intel-architecture .
Looks like the CPU you are using in the test environment is quite old and does not support AVX2 instructions. Intel Omni-Path program does not support CPUs that do not support AVX2, hence it is included by default at compile time. You can force disable the use of AVX2 instructions at build time by setting PSM_DISABLE_AVX2=1.
Packages built for Fedora need to run without modification on all supported hardware. If psm2 is going to be used by any Fedora packages it will need to be built in a way to support non-AVX2 hardware, or else packages will need to drop psm2 support.
You can export PSM_DISABLE_AVX2=1 in the specfile (libpsm2.spec.in) and build for all architectures. That should allow psm2 included in Fedora to work in any/all supported hardware.
Setting PSM_DISABLE_AVX2=1 with 11.2.68 simply replaces -mavx2 with -mavx, which is not sufficient. But it appears that -mavx is required to build as without it I get: gcc -O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions -fstack-protector-strong -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -pthread -Wall -Werror -D_DEFAULT_SOURCE -D_SVID_SOURCE -D_BSD_SOURCE -O3 -g3 -fpic -fPIC -D_GNU_SOURCE -funwind-tables -Wno-strict-aliasing -Wformat-security -I/builddir/build/BUILD/libpsm2-11.2.68/include -I/builddir/build/BUILD/libpsm2-11.2.68/mpspawn -I/builddir/build/BUILD/libpsm2-11.2.68/include/linux-x86_64 -I/usr/include/uapi -I/builddir/build/BUILD/libpsm2-11.2.68 -I/builddir/build/BUILD/libpsm2-11.2.68/ptl_ips -I/builddir/build/BUILD/libpsm2-11.2.68/build_release -I/builddir/build/BUILD/libpsm2-11.2.68/opa/.. -I/builddir/build/BUILD/libpsm2-11.2.68/opa/../ptl_ips -c /builddir/build/BUILD/libpsm2-11.2.68/opa/opa_dwordcpy-x86_64.c -o /builddir/build/BUILD/libpsm2-11.2.68/build_release/opa/opa_dwordcpy-x86_64.o gcc -O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions -fstack-protector-strong -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -pthread -Wall -Werror -D_DEFAULT_SOURCE -D_SVID_SOURCE -D_BSD_SOURCE -O3 -g3 -fpic -fPIC -D_GNU_SOURCE -funwind-tables -Wno-strict-aliasing -Wformat-security -I/builddir/build/BUILD/libpsm2-11.2.68/include -I/builddir/build/BUILD/libpsm2-11.2.68/mpspawn -I/builddir/build/BUILD/libpsm2-11.2.68/include/linux-x86_64 -I/usr/include/uapi -I/builddir/build/BUILD/libpsm2-11.2.68 -I/builddir/build/BUILD/libpsm2-11.2.68/ptl_ips -I/builddir/build/BUILD/libpsm2-11.2.68/build_release -I/builddir/build/BUILD/libpsm2-11.2.68/opa/.. -I/builddir/build/BUILD/libpsm2-11.2.68/opa/../ptl_ips -c /builddir/build/BUILD/libpsm2-11.2.68/opa/opa_sysfs.c -o /builddir/build/BUILD/libpsm2-11.2.68/build_release/opa/opa_sysfs.o gcc -O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions -fstack-protector-strong -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -pthread -Wall -Werror -D_DEFAULT_SOURCE -D_SVID_SOURCE -D_BSD_SOURCE -O3 -g3 -fpic -fPIC -D_GNU_SOURCE -funwind-tables -Wno-strict-aliasing -Wformat-security -I/builddir/build/BUILD/libpsm2-11.2.68/include -I/builddir/build/BUILD/libpsm2-11.2.68/mpspawn -I/builddir/build/BUILD/libpsm2-11.2.68/include/linux-x86_64 -I/usr/include/uapi -I/builddir/build/BUILD/libpsm2-11.2.68 -I/builddir/build/BUILD/libpsm2-11.2.68/ptl_ips -I/builddir/build/BUILD/libpsm2-11.2.68/build_release -I/builddir/build/BUILD/libpsm2-11.2.68/opa/.. -I/builddir/build/BUILD/libpsm2-11.2.68/opa/../ptl_ips -c /builddir/build/BUILD/libpsm2-11.2.68/opa/opa_syslog.c -o /builddir/build/BUILD/libpsm2-11.2.68/build_release/opa/opa_syslog.o gcc -g3 -fpic -c /builddir/build/BUILD/libpsm2-11.2.68/opa/opa_dwordcpy-x86_64-fast.S -o /builddir/build/BUILD/libpsm2-11.2.68/build_release/opa/opa_dwordcpy-x86_64-fast.o /builddir/build/BUILD/libpsm2-11.2.68/opa/opa_dwordcpy-x86_64.c: In function 'hfi_pio_blockcpy_256': /builddir/build/BUILD/libpsm2-11.2.68/opa/opa_dwordcpy-x86_64.c:206:12: error: AVX vector return without AVX enabled changes the ABI [-Werror=psabi] __m256i tmp0 = _mm256_load_si256(sp); ^~~~ In file included from /usr/lib/gcc/x86_64-redhat-linux/8/include/immintrin.h:41, from /builddir/build/BUILD/libpsm2-11.2.68/opa/opa_dwordcpy-x86_64.c:57: /usr/lib/gcc/x86_64-redhat-linux/8/include/avxintrin.h:913:1: error: inlining failed in call to always_inline '_mm256_store_si256': target specific option mismatch _mm256_store_si256 (__m256i *__P, __m256i __A) ^~~~~~~~~~~~~~~~~~ /builddir/build/BUILD/libpsm2-11.2.68/opa/opa_dwordcpy-x86_64.c:209:4: note: called from here _mm256_store_si256((__m256i *)(dp + 1), tmp1); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In file included from /usr/lib/gcc/x86_64-redhat-linux/8/include/immintrin.h:41, from /builddir/build/BUILD/libpsm2-11.2.68/opa/opa_dwordcpy-x86_64.c:57: /usr/lib/gcc/x86_64-redhat-linux/8/include/avxintrin.h:913:1: error: inlining failed in call to always_inline '_mm256_store_si256': target specific option mismatch _mm256_store_si256 (__m256i *__P, __m256i __A) ^~~~~~~~~~~~~~~~~~ /builddir/build/BUILD/libpsm2-11.2.68/opa/opa_dwordcpy-x86_64.c:208:4: note: called from here _mm256_store_si256((__m256i *)dp, tmp0); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In file included from /usr/lib/gcc/x86_64-redhat-linux/8/include/immintrin.h:41, from /builddir/build/BUILD/libpsm2-11.2.68/opa/opa_dwordcpy-x86_64.c:57: /usr/lib/gcc/x86_64-redhat-linux/8/include/avxintrin.h:907:1: error: inlining failed in call to always_inline '_mm256_load_si256': target specific option mismatch _mm256_load_si256 (__m256i const *__P) ^~~~~~~~~~~~~~~~~ /builddir/build/BUILD/libpsm2-11.2.68/opa/opa_dwordcpy-x86_64.c:207:19: note: called from here __m256i tmp1 = _mm256_load_si256(sp + 1); ^~~~~~~~~~~~~~~~~~~~~~~~~ In file included from /usr/lib/gcc/x86_64-redhat-linux/8/include/immintrin.h:41, from /builddir/build/BUILD/libpsm2-11.2.68/opa/opa_dwordcpy-x86_64.c:57: /usr/lib/gcc/x86_64-redhat-linux/8/include/avxintrin.h:907:1: error: inlining failed in call to always_inline '_mm256_load_si256': target specific option mismatch _mm256_load_si256 (__m256i const *__P) ^~~~~~~~~~~~~~~~~ /builddir/build/BUILD/libpsm2-11.2.68/opa/opa_dwordcpy-x86_64.c:206:19: note: called from here __m256i tmp0 = _mm256_load_si256(sp); ^~~~~~~~~~~~~~~~~~~~~ In file included from /usr/lib/gcc/x86_64-redhat-linux/8/include/immintrin.h:41, from /builddir/build/BUILD/libpsm2-11.2.68/opa/opa_dwordcpy-x86_64.c:57: /usr/lib/gcc/x86_64-redhat-linux/8/include/avxintrin.h:913:1: error: inlining failed in call to always_inline '_mm256_store_si256': target specific option mismatch _mm256_store_si256 (__m256i *__P, __m256i __A) ^~~~~~~~~~~~~~~~~~ /builddir/build/BUILD/libpsm2-11.2.68/opa/opa_dwordcpy-x86_64.c:217:4: note: called from here _mm256_store_si256((__m256i *)(dp + 1), tmp1); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In file included from /usr/lib/gcc/x86_64-redhat-linux/8/include/immintrin.h:41, from /builddir/build/BUILD/libpsm2-11.2.68/opa/opa_dwordcpy-x86_64.c:57: /usr/lib/gcc/x86_64-redhat-linux/8/include/avxintrin.h:913:1: error: inlining failed in call to always_inline '_mm256_store_si256': target specific option mismatch _mm256_store_si256 (__m256i *__P, __m256i __A) ^~~~~~~~~~~~~~~~~~ /builddir/build/BUILD/libpsm2-11.2.68/opa/opa_dwordcpy-x86_64.c:216:4: note: called from here _mm256_store_si256((__m256i *)dp, tmp0); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In file included from /usr/lib/gcc/x86_64-redhat-linux/8/include/immintrin.h:41, from /builddir/build/BUILD/libpsm2-11.2.68/opa/opa_dwordcpy-x86_64.c:57: /usr/lib/gcc/x86_64-redhat-linux/8/include/avxintrin.h:919:1: error: inlining failed in call to always_inline '_mm256_loadu_si256': target specific option mismatch _mm256_loadu_si256 (__m256i_u const *__P) ^~~~~~~~~~~~~~~~~~ /builddir/build/BUILD/libpsm2-11.2.68/opa/opa_dwordcpy-x86_64.c:215:19: note: called from here __m256i tmp1 = _mm256_loadu_si256(sp + 1); ^~~~~~~~~~~~~~~~~~~~~~~~~~ In file included from /usr/lib/gcc/x86_64-redhat-linux/8/include/immintrin.h:41, from /builddir/build/BUILD/libpsm2-11.2.68/opa/opa_dwordcpy-x86_64.c:57: /usr/lib/gcc/x86_64-redhat-linux/8/include/avxintrin.h:919:1: error: inlining failed in call to always_inline '_mm256_loadu_si256': target specific option mismatch _mm256_loadu_si256 (__m256i_u const *__P) ^~~~~~~~~~~~~~~~~~ /builddir/build/BUILD/libpsm2-11.2.68/opa/opa_dwordcpy-x86_64.c:214:19: note: called from here __m256i tmp0 = _mm256_loadu_si256(sp); ^~~~~~~~~~~~~~~~~~~~~~ cc1: all warnings being treated as errors make[1]: Leaving directory '/builddir/build/BUILD/libpsm2-11.2.68/opa' So, where do we go from here?
Why is RHEL putting and testing OmniPath into machines that are not supported. March 2016 original public release notes for OmniPath: https://www.intel.com/content/dam/support/us/en/documents/network-and-i-o/fabric-products/Intel_OP_Software_10_0_RN_J16607_v3_0.pdf States Haswell or newer CPU's are required. Is it common to put incompatible hardware together and require it to operate?
(In reply to russell.w.mcguire from comment #10) > Why is RHEL putting and testing OmniPath into machines that are not > supported. No, RHEL will not ship OmniPath for machines with old CPUs. This bug is against Fedora, not RHEL. However, I could suggest close this bug as NOTABUG because of unsupported old CPU.
(In reply to Honggang LI from comment #11) > (In reply to russell.w.mcguire from comment #10) > > Why is RHEL putting and testing OmniPath into machines that are not > > supported. > > No, RHEL will not ship OmniPath for machines with old CPUs. This bug is > against Fedora, not RHEL. > > However, I could suggest close this bug as NOTABUG because of unsupported > old CPU. No. Fedora still supports plain x86_64 (i.e. SSE2-only), so failing to run on such (admittedly old) hardware is still a bug. If you disagree, feel free to open a FESCo ticket.
(In reply to Orion Poplawski from comment #9) > Setting PSM_DISABLE_AVX2=1 with 11.2.68 simply replaces -mavx2 with -mavx, > which is not sufficient. > > But it appears that -mavx is required to build as without it I get: <snip> > /builddir/build/BUILD/libpsm2-11.2.68/opa/opa_dwordcpy-x86_64.c: In function > 'hfi_pio_blockcpy_256': <====================== > So, where do we go from here? Looking at the code, these hfip_pio_blockcpy_XXXXX functions are to implement "PIO block copying routine". When CPU supports "higher" vector instruction, "higher copying routine" is to be seleted, see psm_hal_gen1/psm_hal_gen1_spio.c for example: https://github.com/intel/opa-psm2/blob/8a12e84dc7e3a89eb81f7d0d2fba13c5d9d9c484/psm_hal_gen1/psm_hal_gen1_spio.c#L160 So these line firstly defines ctrl->spio_blockcpy_routines[i] methods, then call get_cpuid (L172) and determine what spio_blockcpy_routines[] method can be actually used , and put it into ctrl->spio_blockcpy_selected . As hfi_pio_blockcpy_64() is written in "pure C", I guess we can assume we can always use this as ctrl->spio_blockcpy_selected . (Or, maybe we can fix the selection method to determine ctrl->spio_blockcpy_selected - I think ideally if CPU does not actually support AVX, hfi_pio_blockcpy_64() should be correctly selected _even if_ hfi_pio_blockcpy_256 or so is enabled *at compilation time*)
So the method written in psm_hal_gen1_spio.c to examine supported instruction set is not right for Intel(R) Xeon(R) series??
> No. Fedora still supports plain x86_64 (i.e. SSE2-only), so failing to run > on such (admittedly old) hardware is still a bug. If you disagree, feel free > to open a FESCo ticket. I think I see another combined issue that has caused this to arise now and not in the past. libfabric is being used here, and this likely came recently as a new default within OpenMPI. libfabric will attempt to initialized ALL providers even if their hardware is not present, in effect forcing execution of libpsm2 on unsupported hardware. Technically one solution to this is NOT building libfabric with libspm2 for THIS older machine configuration, as the libfabric on this machine is incompatible with its hardware. Although I don't like the idea of removing libpsm2 as this test case is unique to this machine configuration and Intel wants libspm2 to remain as default enabled within libfabric. So a real question here, does this machine test platform actually have OmniPath hardware present and the code pathways being executed are a result of a real init taking place? Or is there no OmniPath hardware present and this is rudimentary basic init code that would normally just return an error, but can't due to some variation of memcpy() being invoked with avx instructions. My goal here is to understand the environment. One solution might be to simply ensure that libpsm2 init pathways are clean and run only SSE4.2 instructions (say some #pragma's) and leave the rest of the program stack unaffected. Removing AVX2 and even faulting back to AVX1 will have negative impact on performance for HPC customers. It would be best to keep the instructions in the code, but just clean up the init pathways for unsupported machines. Bottom line is this older platform is NOT compatible, so we need to cleanup enough to keep it in distro, but maintain performance (and thus the entire reason for purchasing a 100Gbps card). Thoughts?
This is probably the more fruitful approach - to get openmpi and/or libfabric to avoid calling into psm2 when not needed. The machine(s) in question at the moment are the COPR builders - really no idea what hardware they have.
I think there was additions to the psm2 provider in libfabric recently to avoid calling into libpsm2, and thus psm2_init(), if the hfi1 OmniPath driver was not actually running on the machine (i.e. the presence of /dev/hfi1_<N>). If this patch is able to be pulled in then this should resolve this issue. Perhaps we can find the version of libfabric and the psm2 provider being tested and see if we find this patch to address the problem?
I looked at the code in libfabric v1.6.0 (in prov/psm2/src/psmx2_init.c). psmx2_unit_active() does check for presence of active unit during fi_getinfo() time and if none present, it will error out (you may need to set FI_LOG_LEVEL=info to see a relevant error message). (https://github.com/ofiwg/libfabric/blob/master/prov/psm2/src/psmx2_init.c#L269) So, question now is to check if any real hardware is present on the COPR builders. As Russ mentioned in comment #10, the Release Notes state Haswell or newer CPUs are required. [ Just FYI- link to newer version of release notes document: https://www.intel.com/content/dam/support/us/en/documents/network-and-i-o/fabric-products/Intel_OP_Fabric_Software_10_8_RN_K21143_v3_0.pdf ] If there is OPA hardware on the systems, could you please remove it and retry? (With Open MPI OFI MTL, you may also have to set "-mca mtl_ofi_provider_include sockets" parameter on command line as well)
I'm pretty sure I'm hitting this issue. Here's the machine's processor: model name : Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid xsaveopt dtherm ida arat pln pts spec_ctrl intel_stibp flush_l1d It has InfiniBand installed, and does not have OPA hardware installed. This same OS image gets used on a machine which does have OPA, which is why we have psm2 installed. Do you know what commits or versions of libfabric have the /dev/hif1_* testing in it? I would like to try to help progress this if I can.
This bug appears to have been reported against 'rawhide' during the Fedora 31 development cycle. Changing version to '31'.
This bug appears to have been reported against 'rawhide' during the Fedora 31 development cycle. Changing version to 31.
This message is a reminder that Fedora 31 is nearing its end of life. Fedora will stop maintaining and issuing updates for Fedora 31 on 2020-11-24. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a Fedora 'version' of '31'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora 31 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior this bug is closed as described in the policy above. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete.
Fedora 31 changed to end-of-life (EOL) status on 2020-11-24. Fedora 31 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. If you are unable to reopen this bug, please file a new report against the current release. If you experience problems, please add a comment to this bug. Thank you for reporting this bug and we are sorry it could not be fixed.
This is still an issue in Fedora 33. I'm seeing SIGILL in Fedora-using containers under `libfabric`: Stack trace (most recent call last): #12 Object "", at 0xffffffffffffffff, in #11 Object "/builds/gitlab-kitware-sciviz-ci/build/tests/kd-tree-test2", at 0x40c1ad, in _start #10 Object "/usr/lib64/libc-2.32.so", at 0x7f8b991e01e1, in __libc_start_main #9 Object "/builds/gitlab-kitware-sciviz-ci/build/tests/kd-tree-test2", at 0x40ad8f, in main #8 Object "/usr/lib64/openmpi/lib/libmpi.so.40.20.5", at 0x7f8b9960744a, in PMPI_Init_thread #7 Object "/usr/lib64/openmpi/lib/libmpi.so.40.20.5", at 0x7f8b99666f94, in ompi_mpi_init #6 Object "/usr/lib64/openmpi/lib/libmpi.so.40.20.5", at 0x7f8b99627fca, in mca_bml_base_init #5 Object "/usr/lib64/openmpi/lib/openmpi/mca_bml_r2.so", at 0x7f8b9616f177, in mca_bml_r2_component_init #4 Object "/usr/lib64/openmpi/lib/libopen-pal.so.40.20.5", at 0x7f8b98f7c988, in mca_btl_base_select #3 Object "/usr/lib64/openmpi/lib/openmpi/mca_btl_usnic.so", at 0x7f8b9615a32f, in usnic_component_init #2 Object "/usr/lib64/libfabric.so.1.15.1", at 0x7f8b95f5357c, in fi_getinfo #1 Object "/usr/lib64/libfabric.so.1.15.1", at 0x7f8b95f4fc26, in fi_ini #0 Object "/usr/lib64/libfabric.so.1.15.1", at 0x7f8b9605a270, in fi_psm3_ini Illegal instruction (Illegal operand [0x7f8b9605a270])
I downgraded to libfabric-1.11.2-1.fc33.x86_64 and I get no error if libfabric is called with valid arguments. Also 1.12.0-0.1 is working for me, only libfabric-0.12.1-1 is broken. Let me know if I can test anything to help with this.
(In reply to david08741 from comment #25) > I downgraded to libfabric-1.11.2-1.fc33.x86_64 and I get no error if > libfabric is called with valid arguments. > > Also 1.12.0-0.1 is working for me, only libfabric-0.12.1-1 is broken. libfabric-1.12.1-1 is the first release supports psm3.
This message is a reminder that Fedora 33 is nearing its end of life. Fedora will stop maintaining and issuing updates for Fedora 33 on 2021-11-30. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a Fedora 'version' of '33'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora 33 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior this bug is closed as described in the policy above. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete.
Fedora 34 still has AVX256 instructions in `libpsm2`; not sure if they're guarded by runtime checks or not (just by inspecting the disassembly).
This message is a reminder that Fedora Linux 34 is nearing its end of life. Fedora will stop maintaining and issuing updates for Fedora Linux 34 on 2022-06-07. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a 'version' of '34'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, change the 'version' to a later Fedora Linux version. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora Linux 34 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora Linux, you are encouraged to change the 'version' to a later version prior to this bug being closed.
Fedora Linux 34 entered end-of-life (EOL) status on 2022-06-07. Fedora Linux 34 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. If you are unable to reopen this bug, please file a new report against the current release. Thank you for reporting this bug and we are sorry it could not be fixed.