Bug 1965692 - openmpi generates illegal instruction on Celeron N4000
Summary: openmpi generates illegal instruction on Celeron N4000
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: openmpi
Version: 34
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Orion Poplawski
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-05-28 20:10 UTC by Susi Lehtola
Modified: 2022-06-08 06:31 UTC (History)
6 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2022-06-08 06:31:35 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github open-mpi ompi issues 9022 0 None open Need a way to disable use of -mcx16 compile flag 2021-05-30 22:11:10 UTC

Description Susi Lehtola 2021-05-28 20:10:03 UTC
I recently acquired a cheap wintel toy tablet/laptop (a Kano PC), which has a Celeron N4000 processor. I tried running HPL benchmarks on it to figure out how fast the thing is. However, xhpl_openmpi crashed with an illegal instruction. I then installed the MPICH version of HPL, and xhpl_mpich runs fine. The issue is thus in OpenMPI.

Looking at the build logs of openmpi-4.1.0-5.fc34.x86_64, I see that it has been compiled with the -mcx16 flag detected by configure. According to the GCC man page

           This option enables GCC to generate "CMPXCHG16B" instructions in
           64-bit code to implement compare-and-exchange operations on 16-byte
           aligned 128-bit objects.  This is useful for atomic updates of data
           structures exceeding one machine word in size.  The compiler uses
           this instruction to implement __sync Builtins.  However, for
           __atomic Builtins operating on 128-bit integers, a library call is
           always used.

However, because the instruction is not available on all x86_64 processors, the use of the flag should be disabled in OpenMPI.

Comment 1 Orion Poplawski 2021-05-30 22:11:14 UTC
We're going to need upstream support here to do this.  Filed an issue upstream.

Comment 2 Orion Poplawski 2021-06-04 03:22:18 UTC
Can you give this build a try when it completes? https://koji.fedoraproject.org/koji/taskinfo?taskID=69275729

Comment 3 david08741 2021-06-04 10:26:17 UTC
Do you have a backtrace?

It might well be a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1961113 if the CPU doesn't support AVX ...

Comment 4 Ben Cotton 2022-05-12 16:58:04 UTC
This message is a reminder that Fedora Linux 34 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora Linux 34 on 2022-06-07.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
'version' of '34'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, change the 'version' 
to a later Fedora Linux version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora Linux 34 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora Linux, you are encouraged to change the 'version' to a later version
prior to this bug being closed.

Comment 5 Ben Cotton 2022-06-08 06:31:35 UTC
Fedora Linux 34 entered end-of-life (EOL) status on 2022-06-07.

Fedora Linux 34 is no longer maintained, which means that it
will not receive any further security or bug fix updates. As a result we
are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.