Bug 1961113 - SIGILL on x86_64v1
Summary: SIGILL on x86_64v1
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: libfabric
Version: 34
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Orion Poplawski
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-05-17 09:46 UTC by david08741
Modified: 2022-06-08 00:25 UTC (History)
5 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2022-06-08 00:25:05 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description david08741 2021-05-17 09:46:45 UTC
Description of problem:
libfabric errors with SIGILL on 

Version-Release number of selected component (if applicable):
libfabric-1.12.1-1.fc33.x86_64

How reproducible:
on old CPU, always

Steps to Reproduce:
1. call MPI_Init from openmpi

OR:
1. compile:
main() {
  fi_getinfo();
  return 0;
}
2. with gcc <file> -lfabric
3. run ./a.out

Actual results:
SIGILL

Thread 1 "python3" received signal SIGILL, Illegal instruction.
0x00007fffe542a270 in fi_psm3_ini () from /lib64/libfabric.so.1
(gdb) bt
#0  0x00007fffe542a270 in fi_psm3_ini () from /lib64/libfabric.so.1
#1  0x00007fffe531fc27 in fi_ini () from /lib64/libfabric.so.1
#2  0x00007fffe532357d in fi_getinfo () from /lib64/libfabric.so.1
#3  0x00007fffe554c330 in usnic_component_init () from /usr/lib64/openmpi/lib/openmpi/mca_btl_usnic.so
#4  0x00007fffe69d5989 in mca_btl_base_select () from /usr/lib64/openmpi/lib/libopen-pal.so.40
#5  0x00007fffe56ca178 in mca_bml_r2_component_init () from /usr/lib64/openmpi/lib/openmpi/mca_bml_r2.so


Expected results:
Fall back to save instructions, get no error.

Additional info:
CPU Info:
model name	: Pentium(R) Dual-Core  CPU      E5300  @ 2.60GHz
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nopl cpuid aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm xsave lahf_lm pti tpr_shadow vnmi flexpriority vpid dtherm
vmx flags	: vnmi flexpriority tsc_offset vtpr vapic
bugs		: cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit

Comment 1 Honggang LI 2021-05-17 10:20:37 UTC
(In reply to david08741 from comment #0)

> model name	: Pentium(R) Dual-Core  CPU      E5300  @ 2.60GHz

                  ^^^^^^^^^^^^^^^^^^^^^^^^^

Well, it is a really old CPU.

> flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat
> pse36 clflush dts acpi mmx fxsr sse sse2 ht tm pbe syscall nx lm
> constant_tsc arch_perfmon pebs bts rep_good nopl cpuid aperfmperf pni dtes64
> monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm xsave lahf_lm pti tpr_shadow
> vnmi flexpriority vpid dtherm

I did not look into the backtrace dump. But it seems the CPU does not support AVX/AVX2 instructions.

It is likely a duplicated issue of https://bugzilla.redhat.com/show_bug.cgi?id=1659852 .

Comment 2 Honggang LI 2021-05-17 10:22:53 UTC
 Anton, could you please have a look?

Comment 3 Anton Bodner 2021-05-17 12:09:18 UTC
At first pass, I would agree with Hongang. I've added Adam G. to take a closer look. Keeping NEED INFO flag.

Comment 4 Adam Goldman 2021-05-17 12:53:09 UTC
Agreed, with Anton and Hongang, PSM3 is only meant to be run on AVX or higher CPUs.

Comment 5 Anton Bodner 2021-05-21 11:22:38 UTC
Clearing need info flag for anton.bodner

Comment 6 Anton Bodner 2021-05-21 11:23:23 UTC
Sorry, 2nd attempt

Comment 7 david08741 2021-05-21 12:07:19 UTC
So is this a bug in openmpi, as it shouldn't call `fi_getinfo` on non-AVX CPUs, or is this a bug in libfabric?

I assume the fix for https://bugzilla.redhat.com/show_bug.cgi?id=1659852 for the PSM2 could also be applied for PSM3?

Comment 8 Honggang LI 2021-05-24 02:14:37 UTC
(In reply to david08741 from comment #7)
> So is this a bug in openmpi, as it shouldn't call `fi_getinfo` on non-AVX
> CPUs, or is this a bug in libfabric?

I think it is libfabric bug.

> I assume the fix for https://bugzilla.redhat.com/show_bug.cgi?id=1659852 for
> the PSM2 could also be applied for PSM3?

Yes, you are right. But it is unlikely will be fixed for old CPUs. As you see, bz1659852 hang on for 2.5+ years without fix.

Comment 9 Ben Cotton 2021-11-04 13:55:11 UTC
This message is a reminder that Fedora 33 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora 33 on 2021-11-30.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
Fedora 'version' of '33'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 33 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 10 Ben Cotton 2021-11-04 14:24:35 UTC
This message is a reminder that Fedora 33 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora 33 on 2021-11-30.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
Fedora 'version' of '33'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 33 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 11 Ben Cotton 2021-11-04 15:22:15 UTC
This message is a reminder that Fedora 33 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora 33 on 2021-11-30.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
Fedora 'version' of '33'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 33 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 12 david08741 2021-11-05 14:25:09 UTC
Still present in F34

Comment 13 Ben Cotton 2022-05-12 16:21:30 UTC
This message is a reminder that Fedora Linux 34 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora Linux 34 on 2022-06-07.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
'version' of '34'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, change the 'version' 
to a later Fedora Linux version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora Linux 34 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora Linux, you are encouraged to change the 'version' to a later version
prior to this bug being closed.

Comment 14 Ben Cotton 2022-06-08 00:25:05 UTC
Fedora Linux 34 entered end-of-life (EOL) status on 2022-06-07.

Fedora Linux 34 is no longer maintained, which means that it
will not receive any further security or bug fix updates. As a result we
are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.