Bug 1424482

Summary: Review Request: sge - Son of Grid Engine - Distributed Computing Management software
Product: [Fedora] Fedora Reporter: Orion Poplawski <orion>
Component: Package ReviewAssignee: Nobody's working on this, feel free to take it <nobody>
Status: CLOSED NOTABUG QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: medium    
Version: rawhideCC: bcotton, dave.love, gedetil, julien.nicoulaud, orion, package-review, sidney
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-02-15 00:45:17 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 201449    
Attachments:
Description Flags
alternate sge-rpath.patch that more fully fixes library path problem none

Description Orion Poplawski 2017-02-17 17:17:21 UTC
Spec URL: https://www.cora.nwra.com/~orion/fedora/sge.spec
SRPM URL: https://www.cora.nwra.com/~orion/fedora/sge-8.1.9-2.fc26.src.rpm
Description:
In a typical network that does not have distributed resource management
software, workstations and servers are used from 5% to 20% of the time.
Even technical servers are generally less than fully utilized. This
means that there are a lot of cycles that can be used productively if
only users know where they are, can capture them, and put them to work.

Son of Grid Engine finds a pool of idle resources and harnesses it
productively, so an organization gets as much as five to ten times the
usable power out of systems on the network. That can increase utilization
to as much as 98%.

Son of Grid Engine software aggregates available compute resources and
delivers compute power as a network service.

Fedora Account System Username: orion

https://koji.fedoraproject.org/koji/taskinfo?taskID=17919214

This is replacing the current gridengine package.

Comment 1 Orion Poplawski 2017-04-13 21:30:30 UTC
Spec URL: https://www.cora.nwra.com/~orion/fedora/sge.spec
SRPM URL: https://www.cora.nwra.com/~orion/fedora/sge-8.1.9-4.el7.src.rpm

* Thu Apr 13 2017 Orion Poplwski <orion.com> - 8.1.9-4
- Use templated service files

* Tue Apr 4 2017 Orion Poplwski <orion.com> - 8.1.9-3
- Build gss utils

Comment 2 Dave Love 2017-04-17 16:50:34 UTC
Sorry, I'd missed this.  I should take it, but I probably can't be very responsive in the near future.

Why is this a separate project from gridengine?  I'd have thought that should just be based on newer sources, and I think an old installation could be upgraded.

I need to try to get a release out with accumulated changes, but there's at least on regression in 8.1.9 which probably deserves patching in the packaging.  I've long been meaning to merge the Fedora spec with the /opt-based one too.

Comment 3 Dave Love 2017-06-01 13:31:04 UTC
Could you say why this is a new package, not an update of gridengine?
There's some stuff from fedora-review I don't understand, but it seems worth establishing the name and possible update recipe first.

Comment 4 Sidney Markowitz 2017-08-05 02:12:12 UTC
(In reply to Dave Love from comment #3)
gridengine is based on Open Grid Scheduler http://gridscheduler.sourceforge.net/ which forked the last open release of Sun Grid Engine, 6.2u5 release 2009, and has not been updated since version 2011.11.p1 was released in 2012. The gridengine rpms for Fedora all use the original 2011.11.p1 source tarball plus a few patches.

This package is based on Son of Grid Engine https://arc.liv.ac.uk/trac/SGE which forked from the last open release of Univa (8.0.0, released 2012) which itself forked from Sun Grid Engine 6.2u5. Son of Grid Engine started their version numbering from the 8.0.0 used by Univa when they forked. The current version of Son of Grid Engine, whose source tarball is in this package, is 8.1.9.

They have proceeded separately from the original 2009 fork and they have different names and version numbering.

In my opinion it would be extremely confusing to try to keep this package with the same name and versioning as gridengine-2011.11p1 when it is being built from the sources of SGE 8.1.9.

Comment 5 Julien Nicoulaud 2017-08-05 11:18:34 UTC
I could test the package and see no issue with it.

Especially, I appreciate:
 - Clean management for multiple cells, which is not the case in other SGE RPMs I have seen
 - A libdrmaa.so symlink is provided, some other RPM packaging only provide libdrmaa.so.1.0, which breaks the JGDI API as the Java code explicitly looks for libdrmaa.so. Please keep it that way.

Some small details:
 - in /etc/profile.d/sge.*, is it possible to export SGE_QMASTER_PORT as well ? This is needed by tools that interface with SGE via its API.
 - Some stuff is named "sge" (eg: service files, profile.d files), some "gridengine" (eg: root dir, spool dir, lib dir, sysconfig). Maybe it would be more consistent to just name everything "sge" (since environment variables start with SGE_ anyway).

About the naming, maybe this package could have been named "soge" to avoid any sort of confusion. But since SoGE is the only remaining alive open source fork of SGE, "sge" is also fine.

Comment 6 Sidney Markowitz 2017-08-06 03:51:11 UTC
Created attachment 1309545 [details]
alternate sge-rpath.patch that more fully fixes library path problem

I was not able to run the inst_sge configuration after installing this rpm. The problem was that sge-rpath.patch is incomplete fix to its problem. The error I get is

./utilbin/lx-amd64/spooldefaults: error while loading shared libraries: libspoolc.so: cannot open shared object file: No such file or directory

and ldd spooldefaults confirms that it is not linked properly to let it find libspoolc.so.

This is related to https://arc.liv.ac.uk/trac/SGE/ticket/1494 which is marked as closed with a more complete patch, but the patch even though committed earlier somehow is not included in the 8.1.9 tarball.

It could be because that patch has a bug, which I have now mentioned in a comment there.

If I replace sge-rpath.patch in the src rpm with the changeset from that bug report without the buggy section, I am able to install and run it. I am attaching the sge-rpath.patch file that I used.

Comment 7 Orion Poplawski 2017-08-30 01:14:07 UTC
*** Bug 1469764 has been marked as a duplicate of this bug. ***

Comment 8 Dave Love 2017-09-13 14:09:03 UTC
(In reply to Sidney Markowitz from comment #4)
> In my opinion it would be extremely confusing to try to keep this package
> with the same name and versioning as gridengine-2011.11p1 when it is being
> built from the sources of SGE 8.1.9.

I've managed to bury comments while I was on holiday. I'm quite
familiar with the history, but I wanted comments from Orion
specifically as an FPC person. I don't see why it is confusing to
change the basis any more than it was from the Sun distribution. As
far as I know, SGE is sufficiently compatible that you could do an
online upgrade, although there's a fundamental problem with upgrading
tightly coupled distributed systems like this. If that's not so, it's
probably possible to fix the issue simply.

The current distribution should at least be retired for security
reasons. SoGE is basically dead too, but I can probably still address
at least security problems.

Comment 9 Dave Love 2017-09-13 14:14:14 UTC
(In reply to Julien Nicoulaud from comment #5)
>  - A libdrmaa.so symlink is provided, some other RPM packaging only provide
> libdrmaa.so.1.0, which breaks the JGDI API as the Java code explicitly looks
> for libdrmaa.so. Please keep it that way.

I don't know what packaging that is, but it's required.

> Some small details:
>  - in /etc/profile.d/sge.*, is it possible to export SGE_QMASTER_PORT as
> well ? This is needed by tools that interface with SGE via its API.

That's in my source, but there's a hook to set things anyhow.

>  - Some stuff is named "sge" (eg: service files, profile.d files), some
> "gridengine" (eg: root dir, spool dir, lib dir, sysconfig). Maybe it would
> be more consistent to just name everything "sge" (since environment
> variables start with SGE_ anyway).

I don't see a particular reason to change it, especially if it's going to allow upgrades.

Comment 10 Dave Love 2017-09-13 14:16:46 UTC
(In reply to Sidney Markowitz from comment #6)
> Created attachment 1309545 [details]
> alternate sge-rpath.patch that more fully fixes library path problem
> 
> I was not able to run the inst_sge configuration after installing this rpm.
> The problem was that sge-rpath.patch is incomplete fix to its problem. The
> error I get is
> 
> ./utilbin/lx-amd64/spooldefaults: error while loading shared libraries:
> libspoolc.so: cannot open shared object file: No such file or directory
> 
> and ldd spooldefaults confirms that it is not linked properly to let it find
> libspoolc.so.
> 
> This is related to https://arc.liv.ac.uk/trac/SGE/ticket/1494 which is
> marked as closed with a more complete patch, but the patch even though
> committed earlier somehow is not included in the 8.1.9 tarball.

There's a non-trivial issue with sorting out an upgrade path.
The internal rpath is actually fine anyhow, according to the fedora packaging standard.

Comment 11 Dave Love 2017-09-13 14:19:11 UTC
By the way, the source tarballs are no longer available from Liverpool, and I haven't been able to get that fixed.
I'll put source up at Sourceforge soon, and will have to put everything else there, unfortunately.

Comment 12 Sidney Markowitz 2018-02-25 01:18:09 UTC
It has been quite a while and this is till open waiting for review, but I have found another bug in the rpm package under review.

There is a bug fix in gridengine-2011.11p1-22.fc20.src.rpm  that fixes a race condition which can cause the qmaster service to fail to start up.

The bug is https://bugzilla.redhat.com/show_bug.cgi?id=1082129

The equivalent fix for it in this RPM is, in qmaster.service, insert a line after the ExecStart line

      PIDFile=/var/spool/gridengine/%I/qmaster/qmaster.pid

I tested this on a system that consistently failed due to this race condition and that fixed it.

Any word on when this will be reviewed?

Comment 13 Orion Poplawski 2020-04-25 03:31:48 UTC
It a new package for a couple reasons:

- avoid epoch for new versioning scheme
- it is a fork with a different name - people do refer to it as SGE

For posterity if nothing else:

Spec URL: https://www.cora.nwra.com/~orion/fedora/sge.spec
SRPM URL: https://www.cora.nwra.com/~orion/fedora/sge-8.1.9-2.fc26.src.rpm

* Fri Apr 24 2020 Orion Poplwski <orion> - 8.1.9-12
- Add PIDFile to sge_qmaster@service
- Add patch to fix multiple definition errors
- Fix jni jar symlink

* Mon Apr  6 2020 Orion Poplwski <orion> - 8.1.9-11
- Add Restart=on-failure to service units

* Tue Sep 17 2019 Orion Poplwski <orion> - 8.1.9-10
- Fixup permissions

* Thu Jun 28 2018 Orion Poplwski <orion> - 8.1.9-9
- Start services after autofs.service and remote-fs.target

* Fri Apr 27 2018 Orion Poplwski <orion> - 8.1.9-8
- Drop interix loadsensor and ksh dep
- Do not retsart services - kills jobs

* Wed Apr 25 2018 Orion Poplwski <orion> - 8.1.9-7
- Setup cgroup directory in sge_execd@.service

* Wed Mar 14 2018 Orion Poplwski <orion> - 8.1.9-6
- Add patch to not call ldd from util/arch

* Wed May 24 2017 Orion Poplwski <orion.com> - 8.1.9-5
- Move ruby stuff to drmaa4ruby sub-package

Comment 14 Orion Poplawski 2020-04-25 03:45:22 UTC
Spec URL: https://www.cora.nwra.com/~orion/fedora/sge.spec
SRPM URL: https://www.cora.nwra.com/~orion/fedora/sge-8.1.9-13.fc33.src.rpm

* Fri Apr 24 2020 Orion Poplwski <orion> - 8.1.9-13
- export SGE_QMASTER_PORT
- Update rpath patch

Comment 15 Package Review 2021-04-26 00:45:21 UTC
This is an automatic check from review-stats script.

This review request ticket hasn't been updated for some time, but it seems
that the review is still being working out by you. If this is right, please
respond to this comment clearing the NEEDINFO flag and try to reach out the
submitter to proceed with the review.

If you're not interested in reviewing this ticket anymore, please clear the
fedora-review flag and reset the assignee, so that a new reviewer can take
this ticket.

Without any reply, this request will shortly be resetted.

Comment 16 Package Review 2021-06-04 00:46:04 UTC
This is an automatic action taken by review-stats script.

The ticket reviewer failed to clear the NEEDINFO flag in a month.
As per https://fedoraproject.org/wiki/Policy_for_stalled_package_reviews
we reset the status and the assignee of this ticket.

Comment 17 Ben Cotton 2022-01-14 15:12:30 UTC
Orion, this review request has been waiting a long time. Are you still interested in adding Son of Grid engine to Fedora? If you are, I'll pick up the review. If not, we can close this request.

Comment 18 Package Review 2022-02-15 00:45:17 UTC
This is an automatic action taken by review-stats script.

The ticket submitter failed to clear the NEEDINFO flag in a month.
As per https://fedoraproject.org/wiki/Policy_for_stalled_package_reviews
we consider this ticket as DEADREVIEW and proceed to close it.