Bug 1746564 - incorrect MPI_TAG_UB, throws "'boost::wrapexcept<boost::mpi::exception>' what(): MPI_Recv: MPI_ERR_TAG: invalid tag"
Summary: incorrect MPI_TAG_UB, throws "'boost::wrapexcept<boost::mpi::exception>' what...
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: openmpi
Version: 31
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
Assignee: Philip Kovacs
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: 1728057
TreeView+ depends on / blocked
 
Reported: 2019-08-28 19:04 UTC by Jean-Noël Grad
Modified: 2019-08-31 20:05 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-08-31 20:05:54 UTC
Type: Bug


Attachments (Terms of Use)
Minimum working example to reproduce the bug, standard error output, Dockerfiles (2.93 KB, application/gzip)
2019-08-28 19:04 UTC, Jean-Noël Grad
no flags Details
Minimum working example without boost (892 bytes, text/x-csrc)
2019-08-29 15:28 UTC, Jean-Noël Grad
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github open-mpi ompi issues 6940 0 None None None 2019-08-29 04:01:03 UTC

Description Jean-Noël Grad 2019-08-28 19:04:56 UTC
Created attachment 1609107 [details]
Minimum working example to reproduce the bug, standard error output, Dockerfiles

Description of problem:
The value of MPI_TAG_UB is 8388608 (2^23) instead of 2147483647 (2^31-1). C++ code containing a call to boost::mpi::reduce() and compiled with the openmpi and boost-openmpi libraries will throw an "invalid tag" exception if 2 or more MPI threads are used.

Version-Release number of selected component (if applicable):
openmpi-devel 4.0.1

How reproducible:
always

Steps to Reproduce:
1. install openmpi-devel boost-devel boost-openmpi-devel
2. compile the attached minimum working example (sample.cpp)
3. run the output binary with two threads

Actual results:
The program crashes with error message "terminate called after throwing an instance of 'boost::wrapexcept<boost::mpi::exception>' what():  MPI_Recv: MPI_ERR_TAG: invalid tag"

Expected results:
The program should print the string "The result is zero one"

Additional info:
When compiling openmpi 4.0.1 and boost 1.69 from sources, the MPI_TAG_UB has the correct value and the code sample does not throw an exception. The correct behavior is also observed in Fedora 30, which has the same boost version but openmpi 3.1.4. This bug was investigated using the fedora:31 Docker image. The Dockerfiles, code sample, bash commands and error output are attached. Our analysis of this bug (https://github.com/espressomd/espresso/issues/2985#issuecomment-523062014) shows it is the root cause for Bug 1728057.

Comment 1 Philip Kovacs 2019-08-29 03:29:02 UTC
I took some time to look at this -- the problem is somewhere in the ucx layer.  Now I don't have f31,
but I do have f32 rawhide and I did observe the max tag = 8388608 problem using the get_tag.cc sample
provided.

I reconfigured openmpi 4.0.1 without ucx and that resolves the problem:

mpirun -np 4 ./get_tag

MPI_TAG_UB = 2147483647
MPI_TAG_UB = 2147483647
MPI_TAG_UB = 2147483647
MPI_TAG_UB = 2147483647

Cheers,

Phil

Comment 2 Nathan Hjelm 2019-08-29 05:28:44 UTC
Not a bug in Open MPI. Read the MPI standard. The MPI implementation is free to pick any upper bound to the tag range. pml/ucx just happens to have a smaller UB than pml/ob1. If boost is using a tag outside the allowed range then this is a bug in boost.

Comment 3 Philip Kovacs 2019-08-29 06:22:37 UTC
At least we understand now the reason the OP is seeing this starting with f31: the addition of ucx support reduced the UB and revealed a bug elsewhere.  The path seems clear now to finding the root cause either in boost upstream or Fedora's packaging of same downstream.

Comment 4 Jean-Noël Grad 2019-08-29 15:28:28 UTC
Created attachment 1609468 [details]
Minimum working example without boost

Comment 5 Jean-Noël Grad 2019-08-29 15:29:08 UTC
Thanks to both of you for these clarifications! I can now reproduce the bug when compiling ucx + openmpi + boost from sources, which finally allowed me to work on a clean debug build in GDB.

The incorrect tag used by boost::mpi::reduce() has value -8388608 and is received from the MPI_Status object generated by an MPI_Recv() call (https://github.com/boostorg/mpi/blob/48879409552179b2d830740d0f44dcbfa8890aec/src/point_to_point.cpp#L88-L90). It is immediately used as the tag argument of another MPI_Recv() call (https://github.com/boostorg/mpi/blob/48879409552179b2d830740d0f44dcbfa8890aec/src/point_to_point.cpp#L94-L97) which returns an error code, causing boost::mpi to throw the exception.

When replacing the body of the boost::mpi::environment::max_tag() method (https://github.com/boostorg/mpi/blob/48879409552179b2d830740d0f44dcbfa8890aec/src/environment.cpp#L169-L183) by `return 8388602;`, the sample program doesn't throw an exception anymore and the value of MPI_Status.MPI_TAG is the expected 8388603 (8388602 + num_reserved_tags). This suggests using MPI_TAG_UB is not possible in openmpi 4.0.1 with ucx, even though the MPI standard 3.1 specifically states MPI_TAG_UB is a valid tag. The send_recv.cpp file attached shows the bare minimum of MPI communication, and fails when the tag is MPI_TAG_UB (note how the received tag got a sign flip):
[user@300292f4c8e3 ~]$ mpicxx -std=c++11 send_tag.cpp
[user@300292f4c8e3 ~]$ mpiexec -n 2 a.out
MPI_TAG_UB = 8388608
Sent 7 with error 0 and MPI tag 8388608
Received 7 with error 0 and MPI tag -8388608

This has all the characteristics of integer overflow on a signed 24bit integer type, which is undefined behavior. This bug already has a fix in 4.0.2 (https://github.com/open-mpi/ompi/pull/6792). When compiling ucx + openmpi + boost from sources with this 4.0.2 fix as a patch on the openmpi 4.0.1 sources, I obtain the expected behavior for my sample.

Comment 6 Philip Kovacs 2019-08-29 16:00:57 UTC
So it was an openmpi bug.  That means we need to bump openmpi to 4.0.2 if we intend to continue building with ucx.

Comment 7 Nathan Hjelm 2019-08-29 16:20:39 UTC
Indeed. UCX devs had on off by one error. We (Open MPI) should have test coverage for sending max tag. Apparently not. It is fixed in 4.0.2 so that is the best path forward.

Comment 8 Jean-Noël Grad 2019-08-29 16:23:24 UTC
Erratum: the bash commands in Comment 5 should read send_recv.cpp instead of send_tag.cpp

Fixing this bug should also fix Bug 1728057, I've just compiled the espresso package with the patched openmpi 4.0.1 library and it passed all the tests in a Docker container with Fedora 31.

Comment 9 Christoph Junghans 2019-08-29 20:58:14 UTC
@Nathan, when can we expect to see openmpi-4.0.2 in rawhide?

Comment 10 Philip Kovacs 2019-08-29 21:05:22 UTC
I have 4.0.2rc1 done already for rawhide.  There is an unrelated problem in rawhide with one of the arch'es, aarch64, but I can push nevertheless.

Comment 11 Philip Kovacs 2019-08-30 03:59:44 UTC
OK, 4.0.2rc1 is now in rawhide.

Comment 12 Christoph Junghans 2019-08-30 16:38:07 UTC
Thanks!

Comment 13 Christoph Junghans 2019-08-30 16:47:50 UTC
Huh?

DEBUG util.py:585:  BUILDSTDERR: Error: transaction check vs depsolve:
DEBUG util.py:585:  BUILDSTDERR: libc.so.6(GLIBC_PRIVATE)(64bit) is needed by openmpi-4.0.2-0.1.rc1.fc32.x86_64
DEBUG util.py:585:  BUILDSTDERR: To diagnose the problem, try running: 'rpm -Va --nofiles --nodigest'.
DEBUG util.py:585:  BUILDSTDERR: You probably have corrupted RPMDB, running 'rpm --rebuilddb' might fix the issue.
DEBUG util.py:734:  Child return code was: 1

Comment 14 Philip Kovacs 2019-08-30 16:59:30 UTC
Odd, I see it.  Perhaps the machine it built on had a problem.  Let me look.

Comment 15 Christoph Junghans 2019-08-30 18:11:10 UTC
Can we get a bump for f31 as well?

Comment 16 Philip Kovacs 2019-08-30 18:21:01 UTC
I see the problem.  The issue appears to be that this open mpi compilation unit below is using glibc's private function __mmap:

openmpi-4.0.2rc1/opal/mca/memory/patcher/memory_patcher_component.c: result = __mmap (start, length, prot, flags, fd, offset)   <== ouch we can't so that!

The configure checks are probably doing compile checks but not link checks and not seeing that __mmap is a private glibc function.

The outcome is a libopen-pal.so.40.20.2 with an unusable symbol:

readelf -s libopen-pal.so.40.20.2 | grep PRIVATE
    20: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND __mmap@GLIBC_PRIVATE (4)      <=== No! We mustn't do this.

The ompi folks source may have already caught this problem, but the 4.0.2rc1 tarball I am using from their download page either has to be
patched or updated.

Comment 17 Philip Kovacs 2019-08-30 18:33:58 UTC
I think the upstream commit for https://github.com/open-mpi/ompi/issues/6853 may help.  I'm going to try that.

Comment 18 Philip Kovacs 2019-08-30 20:42:24 UTC
I'm building the fix now in rawhide.

Comment 19 Philip Kovacs 2019-08-30 21:37:04 UTC
openmpi 4.0.2-0.2.rc1.fc32 is now in rawhide.  It should have no private symbol problems.  Try it when it hits the mirrors and let me know how it goes.

Comment 20 Philip Kovacs 2019-08-31 20:05:54 UTC
4.0.2 is now in an acceptable state, closing this.  I will merge and build for 31 and pass it to Zbignew so he can replace 4.0.1 in bodhi with 4.0.2.


Note You need to log in before you can comment on or make changes to this bug.