Bug 2155197 - elpa testsuite crashes on i686 under OpenMPI
Summary: elpa testsuite crashes on i686 under OpenMPI
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Fedora
Classification: Fedora
Component: elpa
Version: 38
Hardware: i686
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Orphan Owner
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On: 2142304
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-12-20 11:23 UTC by Dominik 'Rathann' Mierzejewski
Modified: 2024-01-13 05:14 UTC (History)
3 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2024-01-13 05:14:03 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Dominik 'Rathann' Mierzejewski 2022-12-20 11:23:59 UTC
Description of problem:
ELPA testsuite crashes on i686 under OpenMPI.

Version-Release number of selected component (if applicable):
elpa-devel-2022.05.001-1.fc38

How reproducible:
Always.

Steps to Reproduce:
1. koji build --scratch rawhide elpa-2022.05.001-1.fc38.src.rpm

Actual results:
...
FAIL: validate_double_instance_openmp_default.sh
================================================

--------------------------------------------------------------------------
A system call failed during shared memory initialization that should
not have.  It is likely that your MPI job will now either abort or
experience performance degradation.

  Local host:  66acb71cd12b466092e59ed2a896bd91
  System call: mmap(2) 
  Error:       Bad address (errno 14)
--------------------------------------------------------------------------

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:
#0  0xf7219505 in ???
#1  0xf721862d in ???
#2  0xf7f7756f in ???
#3  0xf13429f5 in ???
#4  0xf1340b76 in ???
#5  0xf1146e2b in ???
#6  0xf1147766 in ???
#7  0xf114adfc in ???
#8  0xf74e62c5 in ???
#9  0xf74eb72f in ???
#10  0xf74e7f5a in ???
#11  0xf69dd6cf in ???
#12  0xf69ee454 in ???
#13  0xf74ecf3f in ???
#14  0xf74ef289 in ???
#15  0xf75242a2 in ???
#16  0xf763ea2c in ???
#17  0xf762cd07 in ???
#18  0xf762d0ba in ???
#19  0x5665b821 in set_up_blacsgrid_f
	at test/shared/test_blacs_infrastructure.F90:103
#20  0x5665b821 in test_interface
	at test/Fortran/elpa2/double_instance.F90:266
#21  0x566567e2 in main
	at test/Fortran/elpa2/double_instance.F90:62
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpiexec noticed that process rank 1 with PID 0 on node 66acb71cd12b466092e59ed2a896bd91 exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
[66acb71cd12b466092e59ed2a896bd91:190993] 1 more process has sent help message help-opal-shmem-mmap.txt / sys call fail
[66acb71cd12b466092e59ed2a896bd91:190993] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
FAIL validate_double_instance_openmp_default.sh (exit status: 139)

FAIL: validate_real_2stage_banded_openmp_default.sh
===================================================

--------------------------------------------------------------------------
A system call failed during shared memory initialization that should
not have.  It is likely that your MPI job will now either abort or
experience performance degradation.

  Local host:  66acb71cd12b466092e59ed2a896bd91
  System call: mmap(2) 
  Error:       Bad address (errno 14)
--------------------------------------------------------------------------

Standard eigenvalue problem - REAL version

Matrix size=150, Number of eigenvectors=50, Block size=16
Number of processor rows=2, cols=1, total=2


Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:
#0  0xf7219505 in ???
#1  0xf721862d in ???
#2  0xf7f5c56f in ???
#3  0xf1b249f5 in ???
#4  0xf1b22b76 in ???
#5  0xf112ee2b in ???
#6  0xf112f766 in ???
#7  0xf1132dfc in ???
#8  0xf74e62c5 in ???
#9  0xf74eb72f in ???
#10  0xf74e7f5a in ???
#11  0xf69dd6cf in ???
#12  0xf69ee454 in ???
#13  0xf74ecf3f in ???
#14  0xf74ef289 in ???
#15  0xf75242a2 in ???
#16  0xf763ea2c in ???
#17  0xf762cd07 in ???
#18  0xf762d0ba in ???
#19  0x565a6de2 in set_up_blacsgrid_f
	at test/shared/test_blacs_infrastructure.F90:103
#20  0x565a6de2 in test_real2_double_banded
	at test/Fortran/elpa2/real_2stage_banded.F90:320
#21  0x565a5702 in main
	at test/Fortran/elpa2/real_2stage_banded.F90:101
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpiexec noticed that process rank 1 with PID 0 on node 66acb71cd12b466092e59ed2a896bd91 exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
[66acb71cd12b466092e59ed2a896bd91:191037] 1 more process has sent help message help-opal-shmem-mmap.txt / sys call fail
[66acb71cd12b466092e59ed2a896bd91:191037] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
FAIL validate_real_2stage_banded_openmp_default.sh (exit status: 139)

FAIL: validate_complex_2stage_banded_openmp_default.sh
======================================================

--------------------------------------------------------------------------
A system call failed during shared memory initialization that should
not have.  It is likely that your MPI job will now either abort or
experience performance degradation.

  Local host:  66acb71cd12b466092e59ed2a896bd91
  System call: mmap(2) 
  Error:       Bad address (errno 14)
--------------------------------------------------------------------------

Standard eigenvalue problem - COMPLEX version

Matrix size=150, Number of eigenvectors=50, Block size=16
Number of processor rows=2, cols=1, total=2


Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:
#0  0xf7419505 in ???
#1  0xf741862d in ???
#2  0xf7fc456f in ???
#3  0xf1d6f9f5 in ???
#4  0xf1d6db76 in ???
#5  0xf1046e2b in ???
#6  0xf1047766 in ???
#7  0xf104adfc in ???
#8  0xf70f82c5 in ???
#9  0xf70fd72f in ???
#10  0xf70f9f5a in ???
#11  0xf6a296cf in ???
#12  0xf6a3a454 in ???
#13  0xf70fef3f in ???
#14  0xf7101289 in ???
#15  0xf71362a2 in ???
#16  0xf783ea2c in ???
#17  0xf782cd07 in ???
#18  0xf782d0ba in ???
#19  0x5659455f in set_up_blacsgrid_f
	at test/shared/test_blacs_infrastructure.F90:103
#20  0x5659455f in test_complex2_double_banded
	at test/Fortran/elpa2/complex_2stage_banded.F90:323
#21  0x56592722 in main
	at test/Fortran/elpa2/complex_2stage_banded.F90:99
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpiexec noticed that process rank 1 with PID 0 on node 66acb71cd12b466092e59ed2a896bd91 exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
[66acb71cd12b466092e59ed2a896bd91:191081] 1 more process has sent help message help-opal-shmem-mmap.txt / sys call fail
[66acb71cd12b466092e59ed2a896bd91:191081] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
FAIL validate_complex_2stage_banded_openmp_default.sh (exit status: 139)

Expected results:
All tests PASS.

Additional info:
Likely related to bug 2142304.

Comment 1 Ben Cotton 2023-02-07 15:03:16 UTC
This bug appears to have been reported against 'rawhide' during the Fedora Linux 38 development cycle.
Changing version to 38.

Comment 2 Fedora Admin user for bugzilla script actions 2024-01-05 12:16:32 UTC
This package has changed maintainer in Fedora. Reassigning to the new maintainer of this component.

Comment 3 Orion Poplawski 2024-01-13 05:14:03 UTC
openmpi has dropped i686 at this point, so this is moot.


Note You need to log in before you can comment on or make changes to this bug.