Bug 1321154 - numa enabled torque don't work
Summary: numa enabled torque don't work
Keywords:
Status: ON_QA
Alias: None
Product: Fedora EPEL
Classification: Fedora
Component: torque
Version: el6
Hardware: Unspecified
OS: Unspecified
low
medium
Target Milestone: ---
Assignee: David Brown
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-03-24 19:33 UTC by nucleo
Modified: 2017-10-17 02:33 UTC (History)
15 users (show)

Fixed In Version: torque-4.2.10-10.fc24 torque-4.2.10-10.el7 torque-4.2.10-10.fc23 torque-4.2.10-10.fc22
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-05-16 14:55:46 UTC


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Bugzilla 1231148 None None None Never

Internal Links: 1231148

Description nucleo 2016-03-24 19:33:31 UTC
Description of problem:
After updating from torque-4.2.10-5.el6 to torque-4.2.10-9.el6 pbs_mom service don't stat.

Version-Release number of selected component (if applicable):
torque-mom-4.2.10-9.el6.x86_64

Actual results:
pbs_mom.9607;Svr;pbs_mom;LOG_ERROR::No such file or directory (2) in read_layout_file, Unable to read the layout file in /var/lib
/torque/mom_priv/mom.layout

If I create empty file /var/lib/torque/mom_priv/mom.layout then pbs_mom service starts but never connects to to torque server, so node shown as down.

Expected results:
pbs_mom service should start and work correctly after update without creating any additional files such as mom.layout.

Additional info:
After downgrading to torque-4.2.10-5.el6 pbs_mom works fine without mom.layout file.

Comment 1 Chen Chen 2016-03-25 02:56:15 UTC
According to the document[1] from AdaptiveComputing, the mom.layout is either manually created, or by using a contrib perl script "mom_gencfg".
Maybe we should find a way to make single and numa configuration to co-exist.

[1] http://docs.adaptivecomputing.com/torque/4-2-10/help.htm#topics/1-installConfig/buildingWithNUMA.htm%3FTocPath%3D1.0%2520Installation%2520and%2520configuration|1.7%2520TORQUE%2520on%2520NUMA%2520systems|_____2

Comment 2 Chen Chen 2016-03-25 03:03:04 UTC
Also, if numa-support is breaking "minor update friendly" of EPEL philosophy, I suggest removing it.

Comment 3 Gavin Nelson 2016-04-06 17:04:57 UTC
Please remove the NUMA support from this package group, or create an alternate package group.   My cluster has been dead for almost 2 weeks and the scientists are getting cranky.  This feature does not play well with the MAUI scheduler and, apparently, not at all with the built-in scheduler (http://www.clusterresources.com/pipermail/torqueusers/2013-September/016136.html). 
Requiring this feature means having to introduce a whole host of changes to the Torque environment as well as forcing recompile of OpenMPI (last I checked epel version of openmpi does not have Torque support) and MAUI, which then means recompiling all the analysis applications, etc...
I've tried...I really have.  I even tried rebuilding the package group from the src rpm, but when I remove the enable-numa switch from the torque.spec file it still builds with numa support (not sure what I'm missing there).

My first (anguished) post here, so please excuse my noobness.

Comment 4 David Brown 2016-04-06 17:21:50 UTC
Okay so didn't realize you were having this much of an issue, if you have a version of the torque package that you know works. There's a plugin for yum called yum-plugin-versionlock in that package there is some configs you can setup to lock torque at the version you know works for you.

Also, you can yum downgrade torque* to get to a previous version that hopefully works better for you.

I'm working through some tests to try and reproduce the situation you are trying to describe but it is taking some time as I'm volunteering most of my time for this and my virtual environments where I test didn't take this situation into account.

Comment 5 Gavin Nelson 2016-04-06 17:35:15 UTC
Thanks for working on this David!

yum downgrade was the first thing I tried but all I get is "Only Upgrade available on package", etc...  Not sure what I messed up there.

I'll look into the plugin you mentioned.

The "problem" (for me anyway) got ugly (i.e. after I worked out the various mom.layout and cpuset issues) when openmpi started barfing on the various shared memory configuration issues.  I thought I had worked through most of those over the last few days, but now I just can't get MAUI to send jobs to more than one physical node; all jobs run on a single node regardless of how it's specified in the PBS script.  MPI is doing the allocation correctly, but then MAUI (and/or the pbs_mom process) just ignores it...

Comment 6 nucleo 2016-04-06 17:47:00 UTC
NUMA support enabled in 4.2.10-6, so last working version is 4.2.10-5.
It can be downloaded here 
https://kojipkgs.fedoraproject.org//packages/torque/4.2.10/5.el6/

Older packages For other EPEL and Fedora releases can be found here
https://kojipkgs.fedoraproject.org//packages/torque/4.2.10/

Comment 7 Gavin Nelson 2016-04-06 17:55:26 UTC
Thanks nucleo!  Very educational.

Comment 8 David Brown 2016-04-09 02:32:03 UTC
Okay, I got some time to test things out.

Just to reference for everyone involved, I think I mentioned this on another bug and on the torque users mailing list. I use chef to do some testing to build a virtual cluster and setup torque https://github.com/dmlb2000/torque-cookbook. Check out the templates directory, there are several files that need to be rendered correctly to make things work. For the numa support I had to change the server's nodes file and each mom got the mom.layout file.

I've tested multiple CPUs with multiple nodes (2x2) and am able to run MPI jobs just fine. However, the RHEL/CentOS version of openmpi is built without torque support. This means that you have to setup your hostsfile and specify the `-np` option to mpirun in order to use OpenMPI in a run and make it work. 

#PBS -l nodes=2:ppn=2
mpirun -hostfile hostfile -np 4 ./mpi_hello

As, MAUI is not in EPEL I can't really setup and support a configuration of that and I consider it out of scope of support from EPEL's point of view. As I don't have a version of MAUI to target I can't ensure interoperability between the two pieces of software.

If you are having issues building and running torque with MAUI or MOAB you should ask the user mailing list as well to get help.

As to the status of the original bug I could include a basic mom.layout file. The one from the chef cookbook for example. However, this would have to be changed for most installations as that just flattens the cores on the node.

Comment 9 Fedora Update System 2016-04-09 18:19:16 UTC
torque-4.2.10-10.el6 has been submitted as an update to Fedora EPEL 6. https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2016-7a55539098

Comment 10 Fedora Update System 2016-04-09 18:19:48 UTC
torque-4.2.10-10.el7 has been submitted as an update to Fedora EPEL 7. https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2016-596ccc8373

Comment 11 Fedora Update System 2016-04-09 18:20:32 UTC
torque-4.2.10-10.fc24 has been submitted as an update to Fedora 24. https://bodhi.fedoraproject.org/updates/FEDORA-2016-43b6ce44b3

Comment 12 Fedora Update System 2016-04-09 18:21:04 UTC
torque-4.2.10-10.fc23 has been submitted as an update to Fedora 23. https://bodhi.fedoraproject.org/updates/FEDORA-2016-b21f08b188

Comment 13 Fedora Update System 2016-04-09 18:21:43 UTC
torque-4.2.10-10.fc22 has been submitted as an update to Fedora 22. https://bodhi.fedoraproject.org/updates/FEDORA-2016-830fdb2304

Comment 14 Fedora Update System 2016-04-09 21:22:03 UTC
torque-4.2.10-10.fc24 has been pushed to the Fedora 24 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-43b6ce44b3

Comment 15 Fedora Update System 2016-04-10 15:19:50 UTC
torque-4.2.10-10.el7 has been pushed to the Fedora EPEL 7 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2016-596ccc8373

Comment 16 Fedora Update System 2016-04-10 15:20:00 UTC
torque-4.2.10-10.fc22 has been pushed to the Fedora 22 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-830fdb2304

Comment 17 Fedora Update System 2016-04-10 15:20:11 UTC
torque-4.2.10-10.el6 has been pushed to the Fedora EPEL 6 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2016-7a55539098

Comment 18 Fedora Update System 2016-04-10 15:48:25 UTC
torque-4.2.10-10.fc23 has been pushed to the Fedora 23 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-b21f08b188

Comment 19 nucleo 2016-04-10 16:00:54 UTC
After installation update torque-4.2.10-10.el6.x86_64 service pbs_mom starts but node shown as down.
pbs_server log:
PBS_Server.6045;Svr;PBS_Server;LOG_ERROR::get_numa_from_str, Node isn't declared to be NUMA, but mom is reporting

Comment 20 Chen Chen 2016-04-11 09:15:40 UTC
I solved the error by changing "var/lib/torque/server_priv/nodes" into:

HOSTNAME np=NP num_node_boards=1
(substitute HOSTNAME and NP to the correct number on your host)

Then stop and restart pbs_server and pbs_mom. This nodes file mimics that you are a NUMA node with only one subnode.

Comment 21 Troels Arvin 2016-04-12 14:18:53 UTC
Using the packages in epel-testing (4.2.10-10) means that Torque/PBS works on my newly installed system. Before, with version 4.2.10-9, pbs_mom did not start.

I suggest that 4.2.10-10 be pushed to the main EPEL repos.

Comment 22 Steve D 2016-04-18 21:11:41 UTC
I found using the epel-testing packages ver 4.2.10-10 & the numa configuration settings for non-numa nodes (<nodename> np=4 num_node_boards=1) allowed submitting jobs but pbsnodes listed nodes as ncpus=0 and each node would only run one job concurrently.

Downgrading to 4.2.10-5 solved this issue, and I can run concurrent jobs on nodes.

Please fork the server builds into:
torque-server
torque-server-numa

if it is not possible to provide proper support for both configurations in one package.

Comment 23 Kevin Fenzi 2016-04-20 18:09:38 UTC
So, could we use the config from comment #20 to make the default/existing non numa case work and those that want numa could then adjust that file for their needs? 

IMHO it's much better for the people wanting the new functionality to have to edit config that people with existing working installs to have to.

Comment 24 Chen Chen 2016-04-21 04:09:24 UTC
(In reply to Steve D from comment #22)
> I found using the epel-testing packages ver 4.2.10-10 & the numa
> configuration settings for non-numa nodes (<nodename> np=4
> num_node_boards=1) allowed submitting jobs but pbsnodes listed nodes as
> ncpus=0 and each node would only run one job concurrently.

Hmm... Reproduced it, so my walkaround is invalid.

> Please fork the server builds into:
> torque-server
> torque-server-numa

Agree. This is the best solution. make -numa and vanilla conflict with each other, and both provides torque-server. Although I don't know how to fit both builds into a single spec file, as well as whether epel permits new packages rolling in.

(In reply to Kevin Fenzi from comment #23)
> So, could we use the config from comment #20 to make the default/existing
> non numa case work and those that want numa could then adjust that file for
> their needs? 

numa arch is decided when you passed --enable-numa to the ./configure. They probably could not co-exist in the current version.

Comment 25 Chen Chen 2016-04-21 04:17:01 UTC
on second thought, the package which needs to be folked is torque-mom, since
> Node isn't declared to be NUMA, but mom is reporting

Comment 26 nucleo 2016-05-03 18:03:00 UTC
Looks like more and more people have problems with torque packages built with enabled NUMA:

http://www.supercluster.org/pipermail/torqueusers/2016-May/018658.html

> I recently upgraded packages and the torque packages were updated to the latest rpm versions. However, I am unable to get the nodes to active state.

Comment 27 David Brown 2016-05-13 15:38:07 UTC
Okay, after some long deliberation in my head about what to do  with this and some digging into how to support both configurations in the various environments here's my suggestions...

Forking the build for torque is bad for a couple of reasons
 A. Torque is designed to be a single build for the entire cluster of machines, having multiple builds for sched, mom, etc invites more confusion on users and would result in just more issues. Users would need to know that all torque-*-numa packages should be installed not anything else on every machine in their cluster.
 B. Torque has many different options that make builds incompatible, forking based on numa just invites forking on blcr, hwloc, pam, readline, tcl/tk, etc... and the combinations just explode...

Most of the issues seem to be around EL6 and not EL7 ... Would supporting the numa build in EL7 and reverting EL6 be palatable for everyone?

The only other option I would see is more management overhead for EPEL as it involves multiple repositories with various builds that have different upgrade, configuration and change policies. However, this is higher than I can reach right now.

Comment 28 Troels Arvin 2016-05-13 17:29:15 UTC
I'm seeing the NUMA-trouble in an EL7 setting, so I'm not particularly fond of David Brown's suggestion in comment 27.

Comment 29 Steve D 2016-05-13 20:16:08 UTC
I concur with Comment 28, running EL7 here.

It seems that since NUMA was not included in previous EL builds, that NUMA configuration options should be removed from current builds for consistency (support for Comment 2). Also if Comment 3 is accurate in the amount of packages needing rebuilding and re-configured this seems like a deal-breaker anyway.

This way if there is a lot of feedback from the community that NUMA support needs to be included, a separate set of packages will be built for it if the additional capacity is available on maintainers side.

Lastly, I wonder if the long-term solution should actually be to request upstream to support a run time configuration setting to disable NUMA support for nodes rather than as a compile only option. But this would still require the build changes mentioned in Comment 3 for other non-torque specific packages.

Comment 30 David Brown 2016-05-13 23:54:24 UTC
(In reply to Steve D from comment #29)
> I concur with Comment 28, running EL7 here.

Damn, I was hoping...

> It seems that since NUMA was not included in previous EL builds, that NUMA
> configuration options should be removed from current builds for consistency
> (support for Comment 2). Also if Comment 3 is accurate in the amount of
> packages needing rebuilding and re-configured this seems like a deal-breaker
> anyway.

Under the current infrastructure and support policies this seems to be the only option...

> This way if there is a lot of feedback from the community that NUMA support
> needs to be included, a separate set of packages will be built for it if the
> additional capacity is available on maintainers side.

I'd rather put my effort toward pushing a different model of support for these kind of packages. The idea being it would allow me to support things in EPEL more like the way packages flow through Fedora into RHEL.

> Lastly, I wonder if the long-term solution should actually be to request
> upstream to support a run time configuration setting to disable NUMA support
> for nodes rather than as a compile only option. But this would still require
> the build changes mentioned in Comment 3 for other non-torque specific
> packages.

The issue with that (for at least torque) is the maintainers have moved on and are doing major development on torque 6 (yes, two major versions) rather than making feature requests for this old version... 

I'm currently playing around in copr, see if I can setup a system to support all workflows without forking the build into multiple packages with different names. Though this would be something to discuss with other EPEL folks on the mailing list, see if there's other EPEL packages that could take advantage of the model.

Comment 31 Fedora Update System 2016-05-14 23:31:59 UTC
torque-4.2.10-10.fc24 has been pushed to the Fedora 24 stable repository. If problems still persist, please make note of it in this bug report.

Comment 32 nucleo 2016-05-14 23:40:08 UTC
Bug is not actually fixed, so reopening.

Comment 33 Fedora Update System 2016-05-15 02:42:38 UTC
torque-4.2.10-10.el7 has been pushed to the Fedora EPEL 7 stable repository. If problems still persist, please make note of it in this bug report.

Comment 34 David Brown 2016-05-15 04:33:07 UTC
Not really closed as some haven't accepted the fix...

Comment 35 Fedora Update System 2016-05-15 05:32:14 UTC
torque-4.2.10-10.fc23 has been pushed to the Fedora 23 stable repository. If problems still persist, please make note of it in this bug report.

Comment 36 Fedora Update System 2016-05-16 14:55:39 UTC
torque-4.2.10-10.fc22 has been pushed to the Fedora 22 stable repository. If problems still persist, please make note of it in this bug report.

Comment 37 Gerben Roest 2016-05-31 09:25:00 UTC
I can confirm comment 22 on CentOS 7: Using a single-cpu compute node with "nodes=1" in mom.layout and "np=8  num_node_boards=1" in server_priv/nodes, the machine shows up in "pbsnodes -a" as having "ncpus=0".
However, I can run 8 jobs (sleep 10) concurrently on this np=8 node. If you want to send the job to a specific node you have to specify the name (including -0) as shown with "pbsnodes -a"

Comment 38 Fedora Update System 2017-08-17 17:05:21 UTC
torque-4.2.10-11.el7 has been submitted as an update to Fedora EPEL 7. https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2017-6658d64670

Comment 39 Fedora Update System 2017-08-18 20:23:36 UTC
torque-4.2.10-11.el7 has been pushed to the Fedora EPEL 7 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2017-6658d64670

Comment 40 antofthy 2017-10-06 06:25:29 UTC
After installing EPEL Testing Repo    torque-4.2.10-11.el7

However I found that all the nodes were 'down'  even though everything appears to be running, with no errors in the error logs.

After a lot of trials, errors and reseach, I eventually (on a whim) I decided to remove the "num_node_boards=1" entry from the "torque/server_priv/nodes" file and restart the server & scheduler.   Suddenly the nodes were "free" and my initial test job ran.

Perhaps the EPEL-Test Torque 4.2.10-11  does not contain Numa?

ALL later tests (with OpenMPI - RHEL SRPM 1.10.6-2 re-compiled "--with-tm")  is now responding to the Torque mode allocation correctly and is no longer simply running all the jobs on the first node.

That is    $PBS_NODEFILE  ,  pbsdsh hostname  and  mpirun hostname  are all in agreement.

Phew...

Comment 41 Jesse 2017-10-13 13:06:40 UTC
I had tried installing torque using the 4.2.10-10 version and had the same issue where pbs_mom would not bring the nodes up because of numa detected. I found this page, enabled epel-testing and did a yum update torque* on the frontend and backend nodes, presto everything is working again happily with maui/torque/munge/trqauth/kickstart, as it was before my leap from Centos5 to Centos7. Now I'm going to give this a go on a RHEL7 Stacki cluster!  

Thanks!

Comment 42 Jesse 2017-10-17 02:03:49 UTC
Strange thing, I'm having a problem with 4.2.10-11 where my ulimit -l returns 'unlimited' from the terminal the way I set it to in limits.conf on all my nodes, yet if I run a qsub job that echoes ulimit -l into a txt file, it gives me '64'. So torque ignores whatever ulimit is set via pam.d and keeps the max file lock value at 64! As a result, this makes my jobs fail with:

ipath_userinit: mmap of rcvhdrq failed: Resource temporarily unavailable
--------------------------------------------------------------------------
PSM was unable to open an endpoint. 

Why is torque insisting on this value? A few forums say the fix is to set ulimit -l unlimited inside the /etc/init.d/pbs_mom script, but whats the equivalent in Centos7 ? Is their a pbs_mom file in my /var/lib/torque where I can set this on all nodes?

Comment 43 antofthy 2017-10-17 02:33:40 UTC
In CentOS 7 (and Redhat) the launcher is systemd 'service script'
  /usr/lib/systemd/system/pbs_mom.service
This defined when and what resources the daemon needs before systemd starts it.
specifically that the syslog, networking, and trqauthd daemon is running.

Then all it does is run /usr/sbin/pbs_mom  -- nothing special.


If nothing else you could wrapper pbs_mom with a script to set the ulimit before exec'ing to the real pbs_mom.


However this is NOTHING to do with the bug. And probaby should have been posted on some other forum.


Note You need to log in before you can comment on or make changes to this bug.