Description of problem: The lammps-openmpi component no longer works on our CentOS7/yp authenticated cluster after I did a cluster wide update of all rpm packages (basically pre CentOS 7.5 to 7.6 as of two weeks ago). It had previously worked fine. I don't know that the problem is with lammps, openmpi, or some other part of the OS but I'm hoping the maintainer might be able to give me some advice on how to track down the problem. Web searches have been fruitless and I'm just a lowly sys admin not well versed in mpi nor lammps. The closest thing I've found has to do with openmpi and getpwuid but since other simple mpi tests seem to work, I'm not sure if that's the cause. Version-Release number of selected component (if applicable): premise> rpm -q openmpi lammps-openmpi openmpi-1.10.7-2.el7.x86_64 lammps-openmpi-20190605-4.el7.x86_64 premise> How reproducible: Consistently fails on all yp client nodes but works fine on cluster head node that is the yp master and uses /etc files as yp source. Steps to Reproduce: 1. 2. 3. Actual results: mpirun lmp_openmpi -sf omp -pk omp 12 do_ypcall: clnt_call: RPC: Server can't decode arguments do_ypcall: clnt_call: RPC: Server can't decode arguments YPBINDPROC_DOMAIN: Domain not bound YPBINDPROC_DOMAIN: Domain not bound do_ypcall: clnt_call: RPC: Server can't decode arguments do_ypcall: clnt_call: RPC: Server can't decode arguments do_ypcall: clnt_call: RPC: Server can't decode arguments YPBINDPROC_DOMAIN: Domain not bound ------------------------------------------------------- Primary job terminated normally, but 1 process returned a non-zero exit code.. Per user-direction, the job has been aborted. ------------------------------------------------------- YPBINDPROC_DOMAIN: Domain not bound -------------------------------------------------------------------------- mpirun detected that one or more processes exited with non-zero status, thus causing the job to be terminated. The first process to do so was: Process name: [[3382,1],0] Exit code: 64 -------------------------------------------------------------------------- Expected results: Additional info: Any help appreciated.
Can you try to compile and run a simple "hello world" mpi program? (see https://mpitutorial.com/tutorials/mpi-hello-world/).
Yes, I tried several of those exact examples and they all worked fine.
Does this still persist?
Sorry, I did not receive email about you replying. Coming back around to the problem. Now running 7.7 and the problem persists.
We've got lammps-20190807-2.el7.x86_64 installed and I just discovered that the openmpi3 version works as expected while the openmpi one fails as described. Both produce the DOMAIN NOT BOUND but the openmpi3 one works.
This package has changed maintainer in Fedora. Reassigning to the new maintainer of this component.