Bug 1639646

Summary: Presumed memory leaks errors of OpenMPI on x86
Product: [Fedora] Fedora Reporter: Antonio T. (sagitter) <anto.trande>
Component: openmpiAssignee: Jarod Wilson <jarod>
Status: CLOSED RAWHIDE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: rawhideCC: anto.trande, dakingun, dledford, hladky.jiri, honli, orion
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
URL: https://github.com/open-mpi/ompi/issues/5932
Whiteboard:
Fixed In Version: openmpi-2.1.6-0.1.rc1 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-11-30 03:28:19 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
OpenMPI Sundials test failure none

Description Antonio T. (sagitter) 2018-10-16 09:42:50 UTC
Description of problem:
Test of PETSc-3.10.2 is failing on x86 architectures only. The error looks attributable to OpenMPI:

+ export
LD_LIBRARY_PATH=/builddir/build/BUILD/petsc-3.10.2/buildopenmpi_dir/i386/lib
+
LD_LIBRARY_PATH=/builddir/build/BUILD/petsc-3.10.2/buildopenmpi_dir/i386/lib
+ export PETSC_DIR=/builddir/build/BUILD/petsc-3.10.2/buildopenmpi_dir
+ PETSC_DIR=/builddir/build/BUILD/petsc-3.10.2/buildopenmpi_dir
+ export PETSC_ARCH=i386
+ PETSC_ARCH=i386
+ export MPI_INTERFACE_HOSTNAME=localhost
+ MPI_INTERFACE_HOSTNAME=localhost
+ export 'PETSCVALGRIND_OPTIONS= --tool=memcheck --leak-check=yes
--track-origins=yes'
+ PETSCVALGRIND_OPTIONS=' --tool=memcheck --leak-check=yes
--track-origins=yes'
+ export 'CFLAGS=-O0 -g -Wl,-z,now -fPIC'
+ CFLAGS='-O0 -g -Wl,-z,now -fPIC'
+ export 'CXXFLAGS=-O0 -g -Wl,-z,now -fPIC'
+ CXXFLAGS='-O0 -g -Wl,-z,now -fPIC'
+ export 'FFLAGS=-O0 -g -Wl,-z,now -fPIC -I/usr/lib/gfortran/modules'
+ FFLAGS='-O0 -g -Wl,-z,now -fPIC -I/usr/lib/gfortran/modules'
+ make -C buildopenmpi_dir test
'MPIEXEC=/builddir/build/BUILD/petsc-3.10.2/buildopenmpi_dir/lib/petsc/bin/petscmpiexec
-valgrind'
make: Entering directory
'/builddir/build/BUILD/petsc-3.10.2/buildopenmpi_dir'
Running test examples to verify correct installation
Using PETSC_DIR=/builddir/build/BUILD/petsc-3.10.2/buildopenmpi_dir and
PETSC_ARCH=i386
Possible error running C/C++ src/snes/examples/tutorials/ex19 with 1 MPI
process
See http://www.mcs.anl.gov/petsc/documentation/faq.html
==25868== Conditional jump or move depends on uninitialised value(s)
==25868==    at 0x8E2CCA3: opal_value_unload (in
/usr/lib/openmpi/lib/libopen-pal.so.20.10.4)
==25868==    by 0x6088607: ompi_proc_complete_init (in
/usr/lib/openmpi/lib/libmpi.so.20.10.3)
==25868==    by 0x608C845: ompi_mpi_init (in
/usr/lib/openmpi/lib/libmpi.so.20.10.3)
==25868==    by 0x60B2A97: PMPI_Init_thread (in
/usr/lib/openmpi/lib/libmpi.so.20.10.3)
==25868==    by 0x49E0B0F: PetscInitialize (pinit.c:875)
==25868==    by 0x8049643: main (ex19.c:106)
==25868==  Uninitialised value was created by a stack allocation
==25868==    at 0x6088593: ompi_proc_complete_init (in
/usr/lib/openmpi/lib/libmpi.so.20.10.3)
==25868==
lid velocity = 0.0016, prandtl # = 1., grashof # = 1.
Number of SNES iterations = 2
==25868== 10 bytes in 1 blocks are definitely lost in loss record 11 of 189
==25868==    at 0x40356A4: malloc (vg_replace_malloc.c:299)
==25868==    by 0xA890F6F: ??? (in
/usr/lib/openmpi/lib/openmpi/mca_pmix_pmix112.so)
==25868==    by 0xA88FEFD: pmix_bfrop_unpack_buffer (in
/usr/lib/openmpi/lib/openmpi/mca_pmix_pmix112.so)
==25868==    by 0xA8901D0: ??? (in
/usr/lib/openmpi/lib/openmpi/mca_pmix_pmix112.so)
==25868==    by 0xA89BED4: ??? (in
/usr/lib/openmpi/lib/openmpi/mca_pmix_pmix112.so)
==25868==    by 0xA899C7C: ??? (in
/usr/lib/openmpi/lib/openmpi/mca_pmix_pmix112.so)
==25868==    by 0x8E63AFD: opal_libevent2022_event_base_loop (in
/usr/lib/openmpi/lib/libopen-pal.so.20.10.4)
==25868==    by 0xA897C32: ??? (in
/usr/lib/openmpi/lib/openmpi/mca_pmix_pmix112.so)
==25868==    by 0x61635DD: start_thread (in
/usr/lib/libpthread-2.28.9000.so)
==25868==    by 0x626A699: clone (in /usr/lib/libc-2.28.9000.so)
==25868==
==25868== 10 bytes in 1 blocks are definitely lost in loss record 12 of 189
==25868==    at 0x40356A4: malloc (vg_replace_malloc.c:299)
==25868==    by 0x6201519: strdup (in /usr/lib/libc-2.28.9000.so)
==25868==    by 0x8E4B0FE: mca_base_var_enum_create_flag (in
/usr/lib/openmpi/lib/libopen-pal.so.20.10.4)
==25868==    by 0x8E5DDA5: ??? (in
/usr/lib/openmpi/lib/libopen-pal.so.20.10.4)
==25868==    by 0x8E4CC34: mca_base_framework_register (in
/usr/lib/openmpi/lib/libopen-pal.so.20.10.4)
==25868==    by 0x8E4CCE0: mca_base_framework_open (in
/usr/lib/openmpi/lib/libopen-pal.so.20.10.4)
==25868==    by 0x60DD028: ??? (in /usr/lib/openmpi/lib/libmpi.so.20.10.3)
==25868==    by 0x8E4CD68: mca_base_framework_open (in
/usr/lib/openmpi/lib/libopen-pal.so.20.10.4)
==25868==    by 0x608C410: ompi_mpi_init (in
/usr/lib/openmpi/lib/libmpi.so.20.10.3)
==25868==    by 0x60B2A97: PMPI_Init_thread (in
/usr/lib/openmpi/lib/libmpi.so.20.10.3)
==25868==    by 0x49E0B0F: PetscInitialize (pinit.c:875)
==25868==    by 0x8049643: main (ex19.c:106)
==25868==
==25868== 17 bytes in 1 blocks are definitely lost in loss record 79 of 189
==25868==    at 0x40356A4: malloc (vg_replace_malloc.c:299)
==25868==    by 0x6201519: strdup (in /usr/lib/libc-2.28.9000.so)
==25868==    by 0x8E4B0FE: mca_base_var_enum_create_flag (in
/usr/lib/openmpi/lib/libopen-pal.so.20.10.4)
==25868==    by 0x8E5DDC0: ??? (in
/usr/lib/openmpi/lib/libopen-pal.so.20.10.4)
==25868==    by 0x8E4CC34: mca_base_framework_register (in
/usr/lib/openmpi/lib/libopen-pal.so.20.10.4)
==25868==    by 0x8E4CCE0: mca_base_framework_open (in
/usr/lib/openmpi/lib/libopen-pal.so.20.10.4)
==25868==    by 0x60DD028: ??? (in /usr/lib/openmpi/lib/libmpi.so.20.10.3)
==25868==    by 0x8E4CD68: mca_base_framework_open (in
/usr/lib/openmpi/lib/libopen-pal.so.20.10.4)
==25868==    by 0x608C410: ompi_mpi_init (in
/usr/lib/openmpi/lib/libmpi.so.20.10.3)
==25868==    by 0x60B2A97: PMPI_Init_thread (in
/usr/lib/openmpi/lib/libmpi.so.20.10.3)
==25868==    by 0x49E0B0F: PetscInitialize (pinit.c:875)
==25868==    by 0x8049643: main (ex19.c:106)
...


Version-Release number of selected component (if applicable):
openmpi-2.1.5

How reproducible:
Always on i686 and armv7hl

Additional info:
Discussion with upstream: https://lists.mcs.anl.gov/pipermail/petsc-dev/2018-October/023619.html

Comment 1 Antonio T. (sagitter) 2018-10-20 17:07:16 UTC
Created attachment 1495980 [details]
OpenMPI Sundials test failure

Other OpenMPI tests failed, on i686 and armv7hl only, with Sundials this time (PETSc support disabled).

See attached file.

Comment 2 Orion Poplawski 2018-11-29 15:39:56 UTC
Antonio - can you retry with openmpi 2.1.6rc1 now in rawhide?  Thanks.

Comment 3 Orion Poplawski 2018-11-30 03:28:19 UTC
Appears to be resolved now.

Comment 4 Antonio T. (sagitter) 2018-12-02 20:46:18 UTC
(In reply to Orion Poplawski from comment #3)
> Appears to be resolved now.

Thank you Orion.