Bug 1744603 - lammps-openmpi on yp client node fails to run
Summary: lammps-openmpi on yp client node fails to run
Keywords:
Status: MODIFIED
Alias: None
Product: Fedora EPEL
Classification: Fedora
Component: lammps
Version: epel7
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Richard Berger
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-08-22 14:07 UTC by Thomas J. Baker
Modified: 2022-12-09 00:09 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-02-23 03:02:31 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Thomas J. Baker 2019-08-22 14:07:11 UTC
Description of problem:

The lammps-openmpi component no longer works on our CentOS7/yp authenticated cluster after I did a cluster wide update of all rpm packages (basically pre CentOS 7.5 to 7.6 as of two weeks ago). It had previously worked fine. I don't know that the problem is with lammps, openmpi, or some other part of the OS but I'm hoping the maintainer might be able to give me some advice on how to track down the problem. Web searches have been fruitless and I'm just a lowly sys admin not well versed in mpi nor lammps. The closest thing I've found has to do with openmpi and getpwuid but since other simple mpi tests seem to work, I'm not sure if that's the cause.

Version-Release number of selected component (if applicable):

premise> rpm -q openmpi lammps-openmpi
openmpi-1.10.7-2.el7.x86_64
lammps-openmpi-20190605-4.el7.x86_64
premise> 

How reproducible:

Consistently fails on all yp client nodes but works fine on cluster head node that is the yp master and uses /etc files as yp source.

Steps to Reproduce:
1.
2.
3.

Actual results:

mpirun lmp_openmpi -sf omp -pk omp 12
do_ypcall: clnt_call: RPC: Server can't decode arguments
do_ypcall: clnt_call: RPC: Server can't decode arguments
YPBINDPROC_DOMAIN: Domain not bound
YPBINDPROC_DOMAIN: Domain not bound
do_ypcall: clnt_call: RPC: Server can't decode arguments
do_ypcall: clnt_call: RPC: Server can't decode arguments
do_ypcall: clnt_call: RPC: Server can't decode arguments
YPBINDPROC_DOMAIN: Domain not bound
-------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
-------------------------------------------------------
YPBINDPROC_DOMAIN: Domain not bound
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[3382,1],0]
  Exit code:    64
--------------------------------------------------------------------------
Expected results:


Additional info:

Any help appreciated.

Comment 1 Christoph Junghans 2019-08-22 21:39:26 UTC
Can you try to compile and run a simple "hello world" mpi program? (see https://mpitutorial.com/tutorials/mpi-hello-world/).

Comment 2 Thomas J. Baker 2019-08-26 11:42:54 UTC
Yes, I tried several of those exact examples and they all worked fine.

Comment 3 Christoph Junghans 2020-02-15 02:59:31 UTC
Does this still persist?

Comment 4 Thomas J. Baker 2020-07-15 18:37:13 UTC
Sorry, I did not receive email about you replying. Coming back around to the problem. Now running 7.7 and the problem persists.

Comment 5 Thomas J. Baker 2020-07-18 12:47:53 UTC
We've got lammps-20190807-2.el7.x86_64 installed and I just discovered that the openmpi3 version works as expected while the openmpi one fails as described. Both produce the DOMAIN NOT BOUND but the openmpi3 one works.

Comment 6 Fedora Admin user for bugzilla script actions 2022-12-09 00:09:21 UTC
This package has changed maintainer in Fedora. Reassigning to the new maintainer of this component.


Note You need to log in before you can comment on or make changes to this bug.