Bug 1639646 - Presumed memory leaks errors of OpenMPI on x86
Summary: Presumed memory leaks errors of OpenMPI on x86
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: openmpi
Version: rawhide
Hardware: i686
OS: Linux
unspecified
urgent
Target Milestone: ---
Assignee: Jarod Wilson
QA Contact: Fedora Extras Quality Assurance
URL: https://github.com/open-mpi/ompi/issu...
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-10-16 09:42 UTC by Antonio T. (sagitter)
Modified: 2018-12-02 20:46 UTC (History)
6 users (show)

Fixed In Version: openmpi-2.1.6-0.1.rc1
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-11-30 03:28:19 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
OpenMPI Sundials test failure (590.58 KB, text/plain)
2018-10-20 17:07 UTC, Antonio T. (sagitter)
no flags Details

Description Antonio T. (sagitter) 2018-10-16 09:42:50 UTC
Description of problem:
Test of PETSc-3.10.2 is failing on x86 architectures only. The error looks attributable to OpenMPI:

+ export
LD_LIBRARY_PATH=/builddir/build/BUILD/petsc-3.10.2/buildopenmpi_dir/i386/lib
+
LD_LIBRARY_PATH=/builddir/build/BUILD/petsc-3.10.2/buildopenmpi_dir/i386/lib
+ export PETSC_DIR=/builddir/build/BUILD/petsc-3.10.2/buildopenmpi_dir
+ PETSC_DIR=/builddir/build/BUILD/petsc-3.10.2/buildopenmpi_dir
+ export PETSC_ARCH=i386
+ PETSC_ARCH=i386
+ export MPI_INTERFACE_HOSTNAME=localhost
+ MPI_INTERFACE_HOSTNAME=localhost
+ export 'PETSCVALGRIND_OPTIONS= --tool=memcheck --leak-check=yes
--track-origins=yes'
+ PETSCVALGRIND_OPTIONS=' --tool=memcheck --leak-check=yes
--track-origins=yes'
+ export 'CFLAGS=-O0 -g -Wl,-z,now -fPIC'
+ CFLAGS='-O0 -g -Wl,-z,now -fPIC'
+ export 'CXXFLAGS=-O0 -g -Wl,-z,now -fPIC'
+ CXXFLAGS='-O0 -g -Wl,-z,now -fPIC'
+ export 'FFLAGS=-O0 -g -Wl,-z,now -fPIC -I/usr/lib/gfortran/modules'
+ FFLAGS='-O0 -g -Wl,-z,now -fPIC -I/usr/lib/gfortran/modules'
+ make -C buildopenmpi_dir test
'MPIEXEC=/builddir/build/BUILD/petsc-3.10.2/buildopenmpi_dir/lib/petsc/bin/petscmpiexec
-valgrind'
make: Entering directory
'/builddir/build/BUILD/petsc-3.10.2/buildopenmpi_dir'
Running test examples to verify correct installation
Using PETSC_DIR=/builddir/build/BUILD/petsc-3.10.2/buildopenmpi_dir and
PETSC_ARCH=i386
Possible error running C/C++ src/snes/examples/tutorials/ex19 with 1 MPI
process
See http://www.mcs.anl.gov/petsc/documentation/faq.html
==25868== Conditional jump or move depends on uninitialised value(s)
==25868==    at 0x8E2CCA3: opal_value_unload (in
/usr/lib/openmpi/lib/libopen-pal.so.20.10.4)
==25868==    by 0x6088607: ompi_proc_complete_init (in
/usr/lib/openmpi/lib/libmpi.so.20.10.3)
==25868==    by 0x608C845: ompi_mpi_init (in
/usr/lib/openmpi/lib/libmpi.so.20.10.3)
==25868==    by 0x60B2A97: PMPI_Init_thread (in
/usr/lib/openmpi/lib/libmpi.so.20.10.3)
==25868==    by 0x49E0B0F: PetscInitialize (pinit.c:875)
==25868==    by 0x8049643: main (ex19.c:106)
==25868==  Uninitialised value was created by a stack allocation
==25868==    at 0x6088593: ompi_proc_complete_init (in
/usr/lib/openmpi/lib/libmpi.so.20.10.3)
==25868==
lid velocity = 0.0016, prandtl # = 1., grashof # = 1.
Number of SNES iterations = 2
==25868== 10 bytes in 1 blocks are definitely lost in loss record 11 of 189
==25868==    at 0x40356A4: malloc (vg_replace_malloc.c:299)
==25868==    by 0xA890F6F: ??? (in
/usr/lib/openmpi/lib/openmpi/mca_pmix_pmix112.so)
==25868==    by 0xA88FEFD: pmix_bfrop_unpack_buffer (in
/usr/lib/openmpi/lib/openmpi/mca_pmix_pmix112.so)
==25868==    by 0xA8901D0: ??? (in
/usr/lib/openmpi/lib/openmpi/mca_pmix_pmix112.so)
==25868==    by 0xA89BED4: ??? (in
/usr/lib/openmpi/lib/openmpi/mca_pmix_pmix112.so)
==25868==    by 0xA899C7C: ??? (in
/usr/lib/openmpi/lib/openmpi/mca_pmix_pmix112.so)
==25868==    by 0x8E63AFD: opal_libevent2022_event_base_loop (in
/usr/lib/openmpi/lib/libopen-pal.so.20.10.4)
==25868==    by 0xA897C32: ??? (in
/usr/lib/openmpi/lib/openmpi/mca_pmix_pmix112.so)
==25868==    by 0x61635DD: start_thread (in
/usr/lib/libpthread-2.28.9000.so)
==25868==    by 0x626A699: clone (in /usr/lib/libc-2.28.9000.so)
==25868==
==25868== 10 bytes in 1 blocks are definitely lost in loss record 12 of 189
==25868==    at 0x40356A4: malloc (vg_replace_malloc.c:299)
==25868==    by 0x6201519: strdup (in /usr/lib/libc-2.28.9000.so)
==25868==    by 0x8E4B0FE: mca_base_var_enum_create_flag (in
/usr/lib/openmpi/lib/libopen-pal.so.20.10.4)
==25868==    by 0x8E5DDA5: ??? (in
/usr/lib/openmpi/lib/libopen-pal.so.20.10.4)
==25868==    by 0x8E4CC34: mca_base_framework_register (in
/usr/lib/openmpi/lib/libopen-pal.so.20.10.4)
==25868==    by 0x8E4CCE0: mca_base_framework_open (in
/usr/lib/openmpi/lib/libopen-pal.so.20.10.4)
==25868==    by 0x60DD028: ??? (in /usr/lib/openmpi/lib/libmpi.so.20.10.3)
==25868==    by 0x8E4CD68: mca_base_framework_open (in
/usr/lib/openmpi/lib/libopen-pal.so.20.10.4)
==25868==    by 0x608C410: ompi_mpi_init (in
/usr/lib/openmpi/lib/libmpi.so.20.10.3)
==25868==    by 0x60B2A97: PMPI_Init_thread (in
/usr/lib/openmpi/lib/libmpi.so.20.10.3)
==25868==    by 0x49E0B0F: PetscInitialize (pinit.c:875)
==25868==    by 0x8049643: main (ex19.c:106)
==25868==
==25868== 17 bytes in 1 blocks are definitely lost in loss record 79 of 189
==25868==    at 0x40356A4: malloc (vg_replace_malloc.c:299)
==25868==    by 0x6201519: strdup (in /usr/lib/libc-2.28.9000.so)
==25868==    by 0x8E4B0FE: mca_base_var_enum_create_flag (in
/usr/lib/openmpi/lib/libopen-pal.so.20.10.4)
==25868==    by 0x8E5DDC0: ??? (in
/usr/lib/openmpi/lib/libopen-pal.so.20.10.4)
==25868==    by 0x8E4CC34: mca_base_framework_register (in
/usr/lib/openmpi/lib/libopen-pal.so.20.10.4)
==25868==    by 0x8E4CCE0: mca_base_framework_open (in
/usr/lib/openmpi/lib/libopen-pal.so.20.10.4)
==25868==    by 0x60DD028: ??? (in /usr/lib/openmpi/lib/libmpi.so.20.10.3)
==25868==    by 0x8E4CD68: mca_base_framework_open (in
/usr/lib/openmpi/lib/libopen-pal.so.20.10.4)
==25868==    by 0x608C410: ompi_mpi_init (in
/usr/lib/openmpi/lib/libmpi.so.20.10.3)
==25868==    by 0x60B2A97: PMPI_Init_thread (in
/usr/lib/openmpi/lib/libmpi.so.20.10.3)
==25868==    by 0x49E0B0F: PetscInitialize (pinit.c:875)
==25868==    by 0x8049643: main (ex19.c:106)
...


Version-Release number of selected component (if applicable):
openmpi-2.1.5

How reproducible:
Always on i686 and armv7hl

Additional info:
Discussion with upstream: https://lists.mcs.anl.gov/pipermail/petsc-dev/2018-October/023619.html

Comment 1 Antonio T. (sagitter) 2018-10-20 17:07:16 UTC
Created attachment 1495980 [details]
OpenMPI Sundials test failure

Other OpenMPI tests failed, on i686 and armv7hl only, with Sundials this time (PETSc support disabled).

See attached file.

Comment 2 Orion Poplawski 2018-11-29 15:39:56 UTC
Antonio - can you retry with openmpi 2.1.6rc1 now in rawhide?  Thanks.

Comment 3 Orion Poplawski 2018-11-30 03:28:19 UTC
Appears to be resolved now.

Comment 4 Antonio T. (sagitter) 2018-12-02 20:46:18 UTC
(In reply to Orion Poplawski from comment #3)
> Appears to be resolved now.

Thank you Orion.


Note You need to log in before you can comment on or make changes to this bug.