Bug 1082129 - sge_qmaster may fail when started via systemd
Summary: sge_qmaster may fail when started via systemd
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: gridengine
Version: 19
Hardware: Unspecified
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: Orion Poplawski
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-03-28 18:08 UTC by Mike Grant
Modified: 2014-09-27 09:49 UTC (History)
1 user (show)

Fixed In Version: gridengine-2011.11p1-22.fc19
Clone Of:
Environment:
Last Closed: 2014-09-27 09:42:33 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Mike Grant 2014-03-28 18:08:24 UTC
Description of problem:

The gridengine sge_qmaster does not create a PID file on startup (e.g. /run/sgemaster.pid).  The current systemd configuration runs it as a SysV-style forking daemon (Type=forking in sgemaster.service).  Without a PID file, systemd tries to guess the main PID of a forking service, but this process is not reliable (search GuessMainPID in http://www.freedesktop.org/software/systemd/man/systemd.service.html).

On one of our servers, this reliably segfaulted (others work ok).  After significant effort tracing the error, I found it to be that the incorrect main PID was identified, systemd then thought the daemon was finished and sent a SIGTERM to the other processes to tidy up.  Unfortunately this signal arrived during initialisation and the daemon blew up with a seg fault due to insufficiently defensive coding.

This is a combination of a misidentification of the main PID and a race condition on receiving a SIGTERM during initialisation.  I'm unsure of the systemd algorithm for guessing the main PID, but it probably just waits for a couple of forks or a short time period, then picks the current top process, so this may be racey too.  The guessing process is explicitly described as unreliable in the systemd documentation.


Version-Release number of selected component (if applicable):
 gridengine-qmaster-2011.11p1-15.fc19.x86_64

How reproducible:
 100% on one server, 0% on another!

Steps to Reproduce:
1. create a default Grid Engine master:
  dnf install gridengine-qmaster
  cd /usr/share/gridengine/
  ./install_qmaster  # pretty much just take defaults
2. systemctl start sgemaster.service

Actual results:
 Depends on luck - can hit a race condition (=segfault) or a controlled shutdown due to misidentification of the main process (=master is stopped cleanly), or be lucky and have it start up.

Expected results:
 Daemon starts up.

Additional info:
 Aside from being luckier, I found two fixes to the problem.

One can patch the daemons to write out a PID file and add a PIDFile option to the systemd service unit file.  I can provide an example patch for sge_qmaster if that's helpful.

Less invasively, one can add the following lines to /etc/sysconfig/gridengine, which prevent the daemons from forking:
------
# prevent SGE from daemonising qmaster, shadowd, execd
# required for systemd to control this as a "Type=simple" service
# see bugzilla #????
SGE_ND=true
-----
and change the unit file type to "Type=simple", where system expects the process to continue in the foreground rather than daemonising.  This does result in minor spam to /var/log/messages (about 20 lines every 3 mins on my system).  One has to change the unit files for sgemaster.service, sge_shadowd.service and sge_execd.service.

Lennart Poettering recommends the foreground approach in another discussion here - http://lists.freedesktop.org/archives/systemd-devel/2011-June/002677.html

Comment 1 Orion Poplawski 2014-07-14 19:56:24 UTC
Does:

PIDFile=/var/spool/gridengine/${SGE_CELL}/qmaster/qmaster.pid

work?  Not sure if you can use environment variables in PIDFile.  If not I may just put the default there and a note to change it if needed in /etc/sysconfig/gridengine.

Modifying /etc/sysconfig/gridengine doesn't seem viable as it is %config(noreplace), and I don't like there being stuff sent regularly to the log. It's a bummer that they are assuming the SGE_ND is only for debugging.

Comment 2 Orion Poplawski 2014-07-14 20:07:10 UTC
Hmm, no, looks like PIDFile can't parse variables.

Comment 3 Mike Grant 2014-07-15 11:50:43 UTC
The log spam is a bit annoying.  I took a peek at the code and it seems it's not possible to tune it down much with $SGE_DEBUG_LEVEL, as the regular spams seem to rely purely on $SGE_ND.  Some patching could solve this, of course.

It might be possible to relatively easily patch in an extra environment variable to separate out the forking from the debug.  The critical function (for the qmaster) seems to be sge_daemonize_qmaster() around line 180 in SOURCES/GE2011.11p1/source/daemons/qmaster/sge_qmaster_threads.c and would simply need to check for the existence of something like $SGE_DONT_FORK and return.  I'm not sure what the other consequences of this might be ;)  It's also a little bit nasty since arguably that's what SGE_ND (no daemonize) is supposed to mean!  This would still require a change to the sysconfig file though.

Comment 4 Fedora Update System 2014-09-05 01:29:55 UTC
gridengine-2011.11p1-22.fc20 has been submitted as an update for Fedora 20.
https://admin.fedoraproject.org/updates/gridengine-2011.11p1-22.fc20

Comment 5 Fedora Update System 2014-09-05 01:32:32 UTC
gridengine-2011.11p1-22.fc19 has been submitted as an update for Fedora 19.
https://admin.fedoraproject.org/updates/gridengine-2011.11p1-22.fc19

Comment 6 Fedora Update System 2014-09-09 22:09:39 UTC
Package gridengine-2011.11p1-22.fc20:
* should fix your issue,
* was pushed to the Fedora 20 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing gridengine-2011.11p1-22.fc20'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2014-10362/gridengine-2011.11p1-22.fc20
then log in and leave karma (feedback).

Comment 7 Fedora Update System 2014-09-27 09:42:33 UTC
gridengine-2011.11p1-22.fc20 has been pushed to the Fedora 20 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 8 Fedora Update System 2014-09-27 09:49:25 UTC
gridengine-2011.11p1-22.fc19 has been pushed to the Fedora 19 stable repository.  If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.