Bug 675495 - Review Request: parallel - Shell tool for executing jobs in parallel
Summary: Review Request: parallel - Shell tool for executing jobs in parallel
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: Package Review
Version: rawhide
Hardware: All
OS: Linux
medium
low
Target Milestone: ---
Assignee: Susi Lehtola
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-02-06 00:13 UTC by Golo Fuchert
Modified: 2011-10-03 18:04 UTC (History)
7 users (show)

Fixed In Version: parallel-20110722-2.fc16
Clone Of:
Environment:
Last Closed: 2011-10-03 18:04:47 UTC
Type: ---
Embargoed:
susi.lehtola: fedora-review+
gwync: fedora-cvs+


Attachments (Terms of Use)

Description Golo Fuchert 2011-02-06 00:13:33 UTC
SPEC URL: http://golotop.de/parallel.spec
SRPM URL: http://golotop.de/parallel-20110205-1.fc14.src.rpm
Description:
GNU Parallel is a shell tool for executing jobs in parallel using one or more
machines. A job is typically a single command or a small script that has to be
run for each of the lines in the input. The typical input is a list of files, a
list of hosts, a list of users, or a list of tables.

If you use xargs today you will find GNU Parallel very easy to use. If you
write loops in shell, you will find GNU Parallel may be able to replace most of
the loops and make them run faster by running jobs in parallel. If you use ppss
or pexec you will find GNU Parallel will often make the command easier to read.

GNU Parallel also makes sure output from the commands is the same output as you
would get had you run the commands sequentially. This makes it possible to use
output from GNU Parallel as input for other programs.

Koji scratch build:
http://koji.fedoraproject.org/koji/taskinfo?taskID=2764450

Comment 1 Ville Skyttä 2011-02-06 09:22:42 UTC
This would conflict with moreutils:

$ repoquery -f /usr/bin/parallel
moreutils-0:0.40-1.fc14.x86_64

See "Binary name conflicts" and "Approaching upstream" at https://fedoraproject.org/wiki/Packaging:Conflicts

/usr/bin/parallel was added in moreutils 0.36, July 2009 according to debian/changelog in its tarball.  The oldest GNU parallel release tarball I could find is 20100424 (that's when I gather it became official GNU software) but the first commit in Savannah git for it is from 2007-09-10.

Comment 2 Golo Fuchert 2011-02-06 11:41:29 UTC
Thanks for the feedback, obviously I missed that package.
However, according to [1] it is not the same program.
Since both upstream projects seem to be aware of the "confusion" [1,2] already, I wouldn't expect that they will change the naming right now.
I will contact both projects for the latest status. If they refuse, is there anything that can be done other than waiting?

Thank you!

[1] http://www.gnu.org/software/parallel/history.html
[2] http://kitenet.net/~joey/code/moreutils/discussion/

Comment 3 Ville Skyttä 2011-02-06 12:34:49 UTC
The only sane two options I can think of right now are either to wait upstreams to resolve the issue, or to ship GNU parallel with the executable, man page and references to the executable in the man page renamed to something.  If renaming, I suggest gparallel because that's already done in the wild to some extent (see below), and many GNU utilities have the g prefix anyway to distinguish them from other similar tools with the same name.

I did some research what other distributions do:

Gentoo takes the gparallel approach:
http://sources.gentoo.org/cgi-bin/viewvc.cgi/gentoo-x86/sys-process/parallel/parallel-20110122.ebuild?revision=1.1&view=markup

FreeBSD splits parallel from moreutils into moreutils-parallel:
http://www.freebsd.org/cgi/cvsweb.cgi/ports/sysutils/parallel/Makefile?rev=1.8
http://www.freebsd.org/cgi/cvsweb.cgi/ports/sysutils/moreutils-parallel/Makefile?rev=1.1
http://www.freebsd.org/cgi/query-pr.cgi?pr=152973

Debian, Ubuntu, Mandriva and openSUSE don't seem to have GNU parallel packages available.

Comment 4 Golo Fuchert 2011-02-08 20:51:03 UTC
This is a tough one. What I learned so far:
- The developer of GNU parallel says that the first version of his program 
  would date back to 2001, but version control was introduced much later. He 
  argues that renaming on a per-distribution basis would cause problems between
  machines running different distributions, since GNU parallel runs itself on
  remote machines while distributing the work load.
- The problem for the parallel of moreutils seems to be that a lot of other 
  things depend on this. So they are not wanting to rename either.
- The Fedora packager of moreutils thinks that splitting up moreutils into a 
  core package and a parallel sub-package is a bad idea, since it might be
  the case that on a multiuser system both packages ought to be used.
- Personally, I really don't like the idea of a conflict between GNU parallel 
  and the moreutils package, since moreutils is a collection of different tools
  and I can think of users that want to use tools from that, but the GNU
  implementation of parallel.

So, what is the least bad solution? I don't know yet...

Comment 5 Golo Fuchert 2011-02-08 21:31:40 UTC
Just to prevent confusion: In the third point (splitting up) with "both packages" I meant GNU parallel and moreutils parallel, using them together would only be possible by renaming.

Comment 6 Golo Fuchert 2011-02-10 18:46:26 UTC
Finally, we may have found a solution!
The starting point is that there won't be an upstream renaming soon. There were good arguments against splitting the moreutils package or renaming on a per-distribution base (see comment #4). In a discussion with the developer of GNU parallel the following proposal came up:
The moreutils package is split into moreutils and moreutils-parallel. The GNU parallel package would then conflict with the moreutils-parallel package.
To prevent any problems with the wrong version of parallel being installed, GNU parallel implements a compatibility mode, so that it is called exactly like moreutils' parallel. Thus, every script written for moreutils' implementation would still run as expected. This compatibility mode would be turned _on_ by default in Fedora, so the user has to make parallel "incompatible" purely deliberately by editing a config file. The compatibility flag could be set system wide (/etc/parallel/config) and may be overwritten by every single user (~/.parallel/config). So, to brake compatibility _by chance_, the user would have to 1.) install the package of GNU parallel and 2.) deactivate the compatibility mode. This seems very unlikely to me.

The packager of moreutils agreed to this proposal. So, if there are no objections I will wait for the next version of GNU parallel with the compatibility mode included, change the spec file to conflict with the then existing moreutils-parallel and post the new files in this ticket. I guess it would be best to make those changes for rawhide then, leaving F13 and F14 unaffected.

In every case I would like to thank Ole Tange from GNU parallel and Marc Bradshaw (Fedora packager for moreutils) for their efforts.

Comment 7 Ville Skyttä 2011-02-10 19:43:43 UTC
Sounds like that would work.

Comment 8 Marc Bradshaw 2011-03-11 00:34:37 UTC
I am wondering (as maintainer for moreutils) how the split for moreutils-parallel should be done.

Most importantly, the upgrade for current users the upgrade should be seamless, moreutils-parallel should be installed by yum update automatically.

However, users should then be free to switch to gnu parallels if they desire, and possibly even uninstall parallels entirely.

Comment 9 Ville Skyttä 2011-03-11 19:34:48 UTC
Similar cases have in the past been successfully taken care of by making the new subpackage obsolete the old main package.  So something like this in the -parallel subpackage should do the trick, assuming 0.40-2%{?dist} is the last moreutils that included parallel:

Obsoletes: %{name} < 0.40-3
Requires: %{name} = %{version}-%{release}

The Requires or the "= %{version}-%{release}" part of it might be superfluous if the -parallel subpackage doesn't actually require the main package, please check if the upgrade works as intended without it with yum, i.e. so that the moreutils main package won't get uninstalled.

Comment 10 Golo Fuchert 2011-05-22 12:02:46 UTC
Update:
- moreutils-parallel is in devel
- GNU parallel features the compatibility mode
- new spec file and srpm for Fedora with Conflicts: moreutils-parallel, new 
  version of GNU parallel, and compatibility mode turned on by default. I added
  a comment about the compatibiltiy to the description, so that users know what
  to install when searching for parallel with yum.

Just as a reminder: It is not planned to push parallel to the currently stable branches, but to rawhide only to test if everything works as expected.

SPEC URL: http://golotop.de/parallel.spec
SRPM URL: http://golotop.de/parallel-20110522-1.fc14.src.rpm

Comment 11 Golo Fuchert 2011-08-22 21:10:06 UTC
Update:
- Updated to newest version 20110722

SPEC URL: http://golotop.de/parallel.spec
SRPM URL: http://golotop.de/parallel-20110722-1.fc15.src.rpm

Comment 12 Susi Lehtola 2011-09-08 21:00:11 UTC
Sources and patches are conventionally prefixed with %{name} so that they don't get mixed up in the rpm buildroot. Although this is no longer really an issue, thanks to mock, you might consider a rename.

**

Your BuildRoot tag is obsolete. Please upgrade to a current version listed in
http://fedoraproject.org/wiki/EPEL/GuidelinesAndPolicies#BuildRoot_tag

Or, if you are not intending to ship on EPEL-4 or EPEL-5, you can just get rid of
- the BuildRoot tag
- the rm -rf at the beginning of %install
- the %clean section
- defattr() clauses in %files

**

rpmlint output:
parallel.noarch: W: spelling-error %description -l en_US xargs -> Argos
parallel.noarch: W: spelling-error %description -l en_US ppss -> poss, piss, pass
parallel.noarch: W: spelling-error %description -l en_US pexec -> exec, p exec, expect
parallel.noarch: W: spelling-error %description -l en_US moreutils -> mutilators
parallel.src: W: name-repeated-in-summary C parallel
parallel.src: W: spelling-error %description -l en_US xargs -> Argos
parallel.src: W: spelling-error %description -l en_US ppss -> poss, piss, pass
parallel.src: W: spelling-error %description -l en_US pexec -> exec, p exec, expect
parallel.src: W: spelling-error %description -l en_US moreutils -> mutilators
parallel.src: W: strange-permission parallel.spec 0600L
2 packages and 0 specfiles checked; 0 errors, 11 warnings.

Well, these are all spurious.

**

MUST: The spec file for the package is legible and macros are used consistently. NEEDSWORK
- You are mixing macro styles: $RPM_BUILD_ROOT vs %{buildroot}. Please choose a style and stick with it.

MUST: The package must be named according to the Package Naming Guidelines. OK
MUST: The spec file name must match the base package %{name}. OK
MUST: The package must be licensed with a Fedora approved license and meet the  Licensing Guidelines. OK
MUST: The License field in the package spec file must match the actual license. OK

MUST: The sources used to build the package must match the upstream source, as provided in the spec URL. OK
- License is GPLv3+.

MUST: The package MUST successfully compile and build into binary rpms. OK
MUST: The spec file MUST handle locales properly. N/A
MUST: Optflags are used and time stamps preserved. OK
MUST: Packages containing shared library files must call ldconfig. N/A
MUST: A package must own all directories that it creates or require the package that owns the directory. OK
MUST: Files only listed once in %files listings. OK
MUST: Debuginfo package is complete. N/A
MUST: Permissions on files must be set properly. OK
MUST: Large documentation files must go in a -doc subpackage. N/A
MUST: All relevant items are included in %doc. Items in %doc do not affect runtime of application. OK
MUST: Header files must be in a -devel package. N/A
MUST: Static libraries must be in a -static package. N/A
MUST: If a package contains library files with a suffix then library files ending in .so must go in a -devel package. N/A
MUST: In the vast majority of cases, devel packages must require the base package using a fully versioned, architecture dependent dependency. N/A
MUST: Packages does not contain any .la libtool archives. N/A
MUST: Desktop files are installed properly. N/A

MUST: No file conflicts with other packages and no general names. OK
- Although, I am a bit bothered by %{_bindir}/sql...

SHOULD: %{?dist} tag is used in release. OK
SHOULD: If the package does not include license text(s) as separate files from upstream, the packager should query upstream to include it. OK
SHOULD: The package builds in mock. OK
EPEL: Clean section exists. OK
EPEL: Buildroot cleaned before install. OK
EPEL: Packages containing pkgconfig(.pc) files must 'Requires: pkgconfig'. N/A

**

Please fix the style issue before git import. This package has been

APPROVED


PS. Since you took the time to list all the files in %{_bindir}, I'd appreciate it if you did the same thing for their man pages.

Comment 13 Hans Ecke 2011-09-08 22:50:24 UTC
(In reply to comment #12)
> Or, if you are not intending to ship on EPEL-4 or EPEL-5

As an interested bystander: I'd encourage you to add this package to EPEL 4 and 5 as well. There is a huge number of RHEL4/5 machines out there, and it would be nice if this package were available for them as well.

Thank you for your work

Hans

Comment 14 Golo Fuchert 2011-09-09 22:21:43 UTC
Thank you very much for the review Jussi!
I applied all the changes you suggested.

Concerning packaging for RHEL: Right now this package should only be included in Fedora, since I think there will be the same problems with moreutils, but I will have a look at it.

Newest versions:
SPEC URL: http://golotop.de/parallel.spec
SRPM URL: http://golotop.de/parallel-20110722-2.fc15.src.rpm

Comment 15 Golo Fuchert 2011-09-09 22:30:00 UTC
New Package SCM Request
=======================
Package Name: parallel
Short Description: Shell tool for executing jobs in parallel
Owners: golfu
Branches: f16
InitialCC:

Comment 16 Gwyn Ciesla 2011-09-10 14:57:32 UTC
Git done (by process-git-requests).

Comment 17 Fedora Update System 2011-09-16 21:57:31 UTC
parallel-20110722-2.fc16 has been submitted as an update for Fedora 16.
https://admin.fedoraproject.org/updates/parallel-20110722-2.fc16

Comment 18 Fedora Update System 2011-09-17 19:35:01 UTC
parallel-20110722-2.fc16 has been pushed to the Fedora 16 testing repository.

Comment 19 Pavel Alexeev 2011-09-19 08:36:10 UTC
It have conflicts:

Transaction Check Error:
  file /usr/bin/parallel from install of parallel-20110722-2.fc17.noarch conflicts with file from package moreutils-0.40-2.fc15.i686
  file /usr/share/man/man1/parallel.1.gz from install of parallel-20110722-2.fc17.noarch conflicts with file from package moreutils-0.40-2.fc15.i686

Comment 20 Golo Fuchert 2011-09-19 16:23:26 UTC
What happened there, why is parallel f17 checked against moreutils f15?
The package moreutils was changed because of that very issue (see above) and it should be resolved for f16+. That is the reason why there wont be a gnu parallel package for f15. Is this a false positive then or does it require any action?

Comment 21 Pavel Alexeev 2011-09-21 19:03:19 UTC
Sorry, it is my fault. I had not read discussion.

Comment 22 Fedora Update System 2011-10-03 18:04:37 UTC
parallel-20110722-2.fc16 has been pushed to the Fedora 16 stable repository.


Note You need to log in before you can comment on or make changes to this bug.