Bug 676879 (mpiexec)
Summary: | Review Request: mpiexec - MPI job launcher that uses the PBS task interface directly | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Christos Triantafyllidis <christos.triantafyllidis> |
Component: | Package Review | Assignee: | Nobody's working on this, feel free to take it <nobody> |
Status: | CLOSED NOTABUG | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | medium | Docs Contact: | |
Priority: | unspecified | ||
Version: | rawhide | CC: | ctrianta, dledford, fedora-package-review, fenlason, mrunge, notting, sergiobelkin, steve.traylen, susi.lehtola, tomspur |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2013-04-30 16:32:51 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 201449 |
Description
Christos Triantafyllidis
2011-02-11 16:51:04 UTC
Hm... this is my first package so (if accepted) i'll need sponsorship... I suggest that you read these two documents and follow the advice therein: http://fedoraproject.org/wiki/Join_the_package_collection_maintainers http://fedoraproject.org/wiki/How_to_get_sponsored_into_the_packager_group Hi Christos, I am (still) not member of packaging group, however I hope you find the following points useful: 1. Fix the Group. 2. Fix the Source URL 3. Fix the license: Enter the right "Short Name" as is listed on http://fedoraproject.org/wiki/Licensing:Main#Good_Licenses. Greets Many thanks Sergio! (In reply to comment #3) > Hi Christos, > > I am (still) not member of packaging group, however I hope you find the > following points useful: > > 1. Fix the Group. Fixed! I searched for what others are using and "Applications/Engineering" seems to be the most popular for mpiexec. > 2. Fix the Source URL Done. > 3. Fix the license: Enter the right "Short Name" as is listed on > http://fedoraproject.org/wiki/Licensing:Main#Good_Licenses. Ooops... Done. > > Greets I've also updated it to the latest upstream version. The updated spec is in same location: http://svn.hellasgrid.gr/svn/code.grid.auth.gr/mpiexec/trunk/mpiexec.spec Updated SRPM: http://koji.afroditi.hellasgrid.gr/packages/mpiexec/0.84/1_torque_2.3.13.el5/src/mpiexec-0.84-1_torque_2.3.13.el5.src.rpm Many thanks once more for reviewing this. BTW what is the best way to note that this is build against torque 2.3.13 (currently i'm using release for this). I guess that one could build it agains other queuing systems (if they are in fedora/epel repos). Christos How does this relate to mpich2 which contains: /usr/lib/mpich2/bin/mpiexec and openmpi which contains /usr/lib/openmpi/bin/mpiexec http://fedoraproject.org/wiki/PackagingDrafts/MPI may have some clues. As for your last question about signifying which torque version you build against then this is implicit since the package in a particular EPEL or Fedora release and so is built against which ever torque is in that platform. You don't need the "toruqe_XXX" in the release. Also the resulting requires $ rpm -qp --requires mpiexec-0.84-1_torque_2.3.13.fc14.x86_64.rpm | grep torque libtorque.so.2()(64bit) should be enough to tie version. The pbs location changes between EPEL5 and 6 and in Fedora so I recommend some conditionals: http://fedoraproject.org/wiki/DistTag#Conditionals However this review is by default for rawhide so it must be correct for that. I see there is both a GPL LICENSE file but also a "LICENSE.mvapich" file which is probably BSD. Is this dual licensed or something ? Can you clarify. Steve. Hey Steve, thanks for getting this! Find my answers/comments inline. (In reply to comment #5) > How does this relate to > > mpich2 which contains: > /usr/lib/mpich2/bin/mpiexec > and > openmpi > which contains > /usr/lib/openmpi/bin/mpiexec > > > http://fedoraproject.org/wiki/PackagingDrafts/MPI > > may have some clues. > I would say that this is yet another mpiexec :). The benefit of using this package is that it integrates with the PBS which means that it takes advantage of features like correct accounting for MPI jobs. Another issue that this solves is that it is compatible with many MPI implementations (including openmpi and mpich2). Check this as the vendor explains why there are so many mpiexecs: http://www.osc.edu/~djohnson/mpiexec/index.php#Too_many_mpiexecs > As for your last question about signifying which torque version you build > against > then this is implicit since the package in a particular EPEL or Fedora release > and > so is built against which ever torque is in that platform. You don't need the > "toruqe_XXX" > in the release. > > Also the resulting requires > $ rpm -qp --requires mpiexec-0.84-1_torque_2.3.13.fc14.x86_64.rpm | grep > torque > libtorque.so.2()(64bit) > > should be enough to tie version. Actually my concern here is that this mpiexec supports other PBS implementations (i.e. OpenPBS or PBSPro). Currently they are not in fedora/epel but what if they appear later? Is it safe/OK to remove this from the release for now and think about it if/when they may appear? > > The pbs location changes between EPEL5 and 6 and in Fedora so I recommend some > conditionals: > http://fedoraproject.org/wiki/DistTag#Conditionals > However this review is by default for rawhide so it must be correct for that. This sounds trivial. Is it sufficient to distinguish between EPEL5, EPEL6 and Fedora or should i also distinguish torque release? Even better is there anyway that i can get this path from currently installed torque version? > > I see there is both a GPL LICENSE file but also a "LICENSE.mvapich" file which > is > probably BSD. Is this dual licensed or something ? Can you clarify. According to the last line of vendor's description page: http://www.osc.edu/~djohnson/mpiexec/index.php#Description "Mpiexec is free software and is licensed for use under the GNU General Public License, version 2." Is this sufficient or should i clarify this with vendor? > > Steve. Many thanks once more for reviewing this. Christos (In reply to comment #6) > > > > I see there is both a GPL LICENSE file but also a "LICENSE.mvapich" file which > > is > > probably BSD. Is this dual licensed or something ? Can you clarify. > > According to the last line of vendor's description page: > http://www.osc.edu/~djohnson/mpiexec/index.php#Description > > "Mpiexec is free software and is licensed for use under the GNU General Public > License, version 2." > > Is this sufficient or should i clarify this with vendor? > Closer look revealed that the latest vendor version (0.84) includes files under different license (BSD no advertising, 3 clause). Given that these sources are not possible to be split to a different package, i'll change the license to the multiple scenario (GPLv2 and BSD). I've also found that the vendor doesn't install the license files (as well as changelog and readme) at documents area which i'll add in the next release. So i'm waiting your reply on the naming (whether to add "torque" somewhere or not) and i'll build the new pkg/spec. Christos I don't think torque should be present in the name anywhere , this is implicit with the library requires. The other significant batch system is of course gridengine in Fedora. I also think it may be worth contacting the MPI group. Should this mpiexec use the "modules" technology that the other mpiexecs use? This was my real question. Maybe worth adding the maintainers of those packages to comment. Steve. The updated spec is in same location: http://svn.hellasgrid.gr/svn/code.grid.auth.gr/mpiexec/trunk/mpiexec.spec Updated SRPM: http://koji.afroditi.hellasgrid.gr/packages/mpiexec/0.84/2.el5/src/mpiexec-0.84-2.el5.src.rpm Changes since last release: - Updated License (GPLv2 and BSD) - Added documentation files (including licenses) - Fixed the pbs path (it should refer to the installation path, not the var folder thus it is not related to distribution) - Removed the torque version from release number The only standing issue is whether to use /usr/bin for installation or something like /usr/libexec/mpiexec/ and include a module file to load it in PATH. I've contacted people who already maintain a MPI library package (either mpich2 or openmpi) in Fedora/EPEL in order to get their opinion on this. Christos It's more than a week later and still no answer from people maintaining other MPI library packages (namely mpich2 and openmpi). Given that the only (?) standing issue was the lack of usage of environment-modules i added it. The new SPEC file is at the usual location: http://svn.hellasgrid.gr/svn/code.grid.auth.gr/mpiexec/trunk/mpiexec.spec Updated SRPM can be found at: http://koji.afroditi.hellasgrid.gr/packages/mpiexec/0.84/3.el5/src/mpiexec-0.84-3.el5.src.rpm Changes since last release: - Added use of environment modules - bin and man files are no longer in default locations. Additionally i did some scratch builds at Fedora's koji: dist-f14: http://koji.fedoraproject.org/koji/taskinfo?taskID=2885049 dist-f15: http://koji.fedoraproject.org/koji/taskinfo?taskID=2885064 dist-rawhide: http://koji.fedoraproject.org/koji/taskinfo?taskID=2885067 dist-5E-epel: http://koji.fedoraproject.org/koji/taskinfo?taskID=2885142 Finally i run rpmlint on SPEC and SRPMs: [ctria@toolbox SPECS]$ rpmlint mpiexec.spec 0 packages and 1 specfiles checked; 0 errors, 0 warnings. [ctria@toolbox SRPMS]$ rpmlint mpiexec-0.84-3.* mpiexec.src: W: spelling-error %description -l en_US mpirun -> Empirin, umpire, Epirus mpiexec.src: W: spelling-error %description -l en_US rsh -> rah, rs, sh mpiexec.src: W: spelling-error %description -l en_US mpirun -> Empirin, umpire, Epirus mpiexec.src: W: spelling-error %description -l en_US rsh -> rah, rs, sh mpiexec.src: W: spelling-error %description -l en_US mpirun -> Empirin, umpire, Epirus mpiexec.src: W: spelling-error %description -l en_US rsh -> rah, rs, sh mpiexec.src: W: spelling-error %description -l en_US mpirun -> Empirin, umpire, Epirus mpiexec.src: W: spelling-error %description -l en_US rsh -> rah, rs, sh 4 packages and 0 specfiles checked; 0 errors, 8 warnings. Which i think are perfectly fine. I hope that i don't miss anything else but if so please shoot it. Christos Sorry to take so long, but sometimes you're just busy. So, here's what I see from reading up on the mpiexec website. First, this doesn't necessarily work with lots of batch systems. It works with OpenPBS, PBS Pro, and Torque. However, these are all three forks of OpenPBS where OpenPBS appears to be dead, PBS Pro is (I guess) some group taking OpenPBS and delivering ongoing service and support around it, and Torque is an actively developed fork of OpenPBS. Second, it's not intended to be multi-MPI friendly. Not really anyway. I know the web page talks about a lot of MPI packages, but in truth, there are only a few MPI families, with lots of forks along those families. There is the mpich family, which includes mpich, mpich2, mvapich, mvapich2, intel mpi, etc. Then there is the lam family, which includes lam and openmpi. I don't know of any other open source mpi families that are still alive. There might be other closed source mpis out there, but we don't care about those. In any case, the mpiexec website basically calls out that lam/openmpi get things right on their own so there is no need to use mpiexec there and recommends that you don't (side note: this does not surprise me in the least, the mpich family of job starting daemons has always been nothing more than a bunch of scripts calling rsh, hardly what I would call robust or well designed, more like a quick and dirty job to get things running in the early days and then they never went back and did things right later). So, for all the talk on the web site about how many mpis this supports, and how many PBSes this supports, it really only supports one mpi family and one pbs family. Given that, this *absolutely* does not belong in the main path. I know that's already been fixed, but I'm putting this here so that someone doesn't get the idea in the future to undo that fix. Now, what's more, is I'm not entirely certain that this will work transparently with different mpich mpis from a single build. You have to specify the default communication method in the configure script, as well as a few other options. I haven't looked into it, but I know that mvapich and mvapich2 are enough different from mpich and mpich2 that I'm not certain that the same build parameters will work with both. If it doesn't, then you might have to build it more than once in the spec file with different options, create an mpiexec base package, and then mpiexec-%{mpiname} sub packages that have the files specific to that particular mpi implementation. As an example, if needed, you could create an mpiexec shell script and place it in the directory specified in your environment modules file. This shell script could then execute %{mpidir}/bin/mpiexec-pbs. You would then place the mpiexec binary this build process spits out into %{mpidir}/bin/mpiexec-pbs for each of the mpis you intend to support and place the files into subpackages of mpiexec specific to each mpi implementation. That way, if you need different options for different mpis, it can be done. However, as I haven't tried to use this program, I don't know if this is even necessary. Before the package goes into Fedora as is though, this needs to be tested. It's much easier to fix this before it hits repos than it is after. Hi Doug, i would agree with most of the things you are writing but i have the following comments: a) As stated in package description, this package supports only the PBS batch system family. Given that Fedora has only torque in its repositories i would say that this package supports all PBS batch systems available. b) Regarding MPI libraries i totally agree that different implementation use different parameters, namely for packages that are available in fedora at the moment we have the following libraries: - MPICH2 - OpenMPI From those, OpenMPI (as already commented) provides a mpiexec that works as far as OpenMPI is build against the batch system (namely torque). So i find much more convenient to have as default the parameters that MPICH needs as this is actually the one that creates the need for "yet another mpiexec". Note that this is the DEFAULT communication method, meaning that can be overridden at any time. I may not know the Fedora packaging internals that well but i clearly find it a bad idea to have the same code installed multiple times just to avoid the usage of a parameter. Are you sure this is actually needed to include it to the distribution? c) The installation should be totally transparent to anyone who don't intent to use it as it is enabled by the relevant environment module. For those who intent to use it, i guess it is not that inconvenient to add their runtime parameters if needed. d) This build (in specific the 0.83 el5 one) has been used extensively in our Grid cluster on production quality without a single error report. Finally, i'm not a mpiexec fan... we (as a grid site) are using it and with current config this seems to make our users/community happy. Given that i'm building this package for our infrastructure i thought it may be useful for others. If the resulted package is not usable for us i don't see any reason to maintain it for fedora :). Regards, Christos Hi, it's long since the last reply on this bug. I understand that there is no interest/need for this package from the Fedora/EPEL community. Should I close this bug as "won't fix" or leave it open till someone may start to like it? Christos Christos, are you still interested? If yes, you should do some informal reviews and list the bugzilla-numbers here (as a reference for your potential sponsor). You need to do something to prove a sponsor your packaging-knowledge. Corresponding links are in comment 2 and still valid. Hi Matthias, i'm still interested although i'm not sure if latest version here is still working given the major changes in torque side (upgrade from 2.3 to 2.5). Given that my setup is still at 2.3 i don't see any urgent reason to put effort in updating this if there is no reviewer for it. If you are willing to review it and sponsor me i would happily do the needed (if any) changes and re-build it. I have done a few reviews or simply commented on other package requests in past: https://bugzilla.redhat.com/show_bug.cgi?id=682553 https://bugzilla.redhat.com/show_bug.cgi?id=683587 https://bugzilla.redhat.com/show_bug.cgi?id=772485 And i have submitted another package too: https://bugzilla.redhat.com/show_bug.cgi?id=772406 Let me know if you think that i have to do more reviews. Regards, Christos Christos, Ok, great. Since I'm no sponsor, we need to wait (or ask sponsors), if your comments in those reviews are sufficient for them. What's your fedora account name? Hi Matthias, i'll try to do some more review (time allowed) in meanwhile. My FAS username is ctria. Christos Hmm, this seems a rather nontrivial review, at least properly done. As noted above by Doug in comment #11, the applicability of the package is rather limited. I find it hard to imagine that anyone would run a cluster with current Fedora (or RHEL) and use an antique queueing system that doesn't have out-of-the-box support for MPI, as there are other queue managers (and MPI libraries) which handle this without problems. This being said, if there is enough interest, the package can of course be included in Fedora. But for a good review to take place, the reviewer should also test if the program works (which, admittedly, doesn't always happen). Although I'm a heavy user of clusters and queue systems (and have had a hand in writing the Fedora MPI guidelines), I lack the knowledge to properly review this package. IMHO, this package is a curiosity, and can be said to be obsolete. If you (or someone else) ends up packaging this in Fedora, the name needs to be changed, 'mpiexec' is just too general. To reflect the use case of the package, something like pbs-mpiexec or torque-mpiexec (or -mpich) would be far more suitable and less prone to cause problems. Hi Jussi, thanks for your comments. Well i'm not that interested on the fedora branches as i am in EPEL ones. I totally understand your point but i can definitely say that there many (grid) sites that are running torque thus there would be some use for them. And fedora still ships this "obsolete" torque. Regarding whether this package is obsolete, well as long as torque and MPICH are maintained, and used in production i wouldn't say that this is the case. Now regarding naming i really don't care, functionality matters on my side, if you think that renaming it to "something"-mpiexec or mpiexec-"something" will help, i'm with you. I just used mpiexec as this is the name that vendor used. But for sure naming it "something"-mpiexec will not be the first guess that someone will try if he/she wants to install this package as vendor uses plain "mpiexec" (wrongly in my opinion too). Anyway i see that there is no will to push this forward, although there is no policy (that at least i'm aware of) against it, so feel free to close this ticket. Regards, Christos |