Bug 801287
Summary: | service cumin start missing pid file | ||
---|---|---|---|
Product: | Red Hat Enterprise MRG | Reporter: | Stanislav Graf <sgraf> |
Component: | cumin | Assignee: | Trevor McKay <tmckay> |
Status: | CLOSED ERRATA | QA Contact: | Peter Belanyi <pbelanyi> |
Severity: | unspecified | Docs Contact: | |
Priority: | low | ||
Version: | Development | CC: | athomas, ltoscano, matt, mkudlej, tmckay |
Target Milestone: | 2.3 | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | cumin-0.1.5388-2 | Doc Type: | Bug Fix |
Doc Text: |
Cause
Cumin did not make use of a pid file.
Consequence
The missing pid file makes determining the true status of the cumin service more difficult.
Fix
Cumin now uses the /var/run/cumin.pid file.
Result
The pid file is created when the service is started and deleted when the service is stopped by initd. If the cumin service is not running but /var/run/cumin.pid exists, it is evidence of a program crash.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2013-03-06 18:42:32 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Stanislav Graf
2012-03-08 07:58:51 UTC
Fixed in revision 5382. Cumin now uses a pidfile (/var/run/cumin.pid). On service start, the initd script creates a blank pid file owned by the 'cumin' user. The cumin master script fills in the pid value when it starts if the "--p" option is passed. The pidfile is deleted by the initd script on 'service stop' after the cumin master script exits. To handle synchronization between initd and /usr/bin/cumin, a new /usr/sbin/cumin-checkpid executable has been added along with $CUMIN_HOME/log/.*.init files for all of the cumin processes (named by config section). Cumin processes write startup status to the $CUMIN_HOME/log/.*.init files. The /usr/bin/cumin-checkpid script checks for the pid value in /var/run/cumin.pid and a status value in the $CUMIN_HOME/log/.master.init file. The cumin master script writes its status value after all of the subprocesses have passed their init checks and written their own files, or there has been a failure. The initd script calls /usr/bin/cumin-checkpid to find out the status of the service start and to wait for the cumin process to end on a service stop. The double-start of cumin for init checks has been eliminated. The initd script will delete the pidfile on a failed startup after the cumin master script has exited. Therefore, if the cumin service is not running and there is a pidfile left over, it is the result of an unexpected crash (and not normal startup checks). Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Cause Cumin did not make use of a pid file. Consequence The missing pid file makes determining the true status of the cumin service more difficult. Fix Cumin now uses the /var/run/cumin.pid file. Result The pid file is created when the service is started and deleted when the service is stopped by initd. If the cumin service is not running but /var/run/cumin.pid exists, it is evidence of a program crash. A note on testing: The /etc/sysconfig/cumin file can be used to create startup failures in the master script and in the cumin child processes (cumin-data, cumin-web, and cumin-report). Setting bad options and arguments here will cause them to exit during init checks so that creation/deletion of the pidfile can be seen (and synchronization of initd and the service exit). The /etc/sysconfig/cumin file may contain a line like this. It defines the options passed to /usr/bin/cumin: CUMIN_OPTIONS="--web-options='--section=web4'" In this case, we are defining --web-options to be passed as extra options to every cumin-web instance. Note the use of quotes, very important. A line like the one above can be used to make the cumin-web instances fail init checks because of a missing config section. Here are a few more: # make the data instances error CUMIN_OPTIONS="--data-options='--some_bad_option'" # make the report instance fail (if running) CUMIN_OPTIONS-"--report-options='--some_bad_option'" # make the master script itself error CUMIN_OPTIONS="--some_bad_option" CUMIN_OPTINS="extra_args" fyi, here is an interesting case. It took cumin-web too long to shutdown so eventually cumin-web did a sysexit and everything stopped. But, the init script timed out and left the pidfile. "service status" indicates that maybe shutdown didn't happen correctly. This is the expected behavior in this scenario. # service cumin stop Stopping cumin: [ OK ] Timed out, cumin may not have stopped completely. # more /var/run/cumin.pid 6023 # service cumin status cumin dead but pid file exists from web.log: ------------- 6026 2012-10-03 14:04:28,223 INFO Shutdown thread timed out, exiting from master.log (note the timestamps on the last two entries) --------------- 6023 2012-10-03 13:47:48,326 INFO Started subprocess (pid 6026): cumin-web --section=web --es=exit --tm=5 --daemon 6023 2012-10-03 13:47:48,344 INFO Started subprocess (pid 6027): cumin-data --section=data.grid --es=exit --tm=5 --daemon 6023 2012-10-03 13:47:48,357 INFO Started subprocess (pid 6028): cumin-data --section=data.grid-slots --es=exit --tm=5 --daemon 6023 2012-10-03 13:47:48,363 INFO Started subprocess (pid 6029): cumin-data --section=data.grid-submissions --es=exit --tm=5 --daemon 6023 2012-10-03 13:47:48,501 INFO Started subprocess (pid 6030): cumin-data --section=data.sesame --es=exit --tm=5 --daemon 6023 2012-10-03 14:04:23,220 INFO Write termination string to all children 6023 2012-10-03 14:04:23,471 INFO Subprocess (6028) exited 6023 2012-10-03 14:04:24,223 INFO Subprocess (6030) exited 6023 2012-10-03 14:04:24,474 INFO Subprocess (6027) exited 6023 2012-10-03 14:04:24,474 INFO Subprocess (6029) exited 6023 2012-10-03 14:04:28,483 INFO Subprocess (6026) exited 6023 2012-10-03 14:04:28,483 INFO All children exited Re Comment 3, if /etc/sysconfig/cumin is used to create one of the startup failure scenarios listed there (one is enough), then a tight bash loop can be used to check for the existence of /var/run/cumin.pid. It will come and go. I did it by hand with multiple windows and command history :) By the way, here is expected output for an init-check failure of cumin-web (starting with an empty log directory) (set this in /etc/sysconfig/cumin to make it fail) CUMIN_OPTIONS="--web-options='--somebadoption'" # service cumin start Starting cumin: [FAILED] # more /var/log/cumin/master.log 7277 2012-10-03 14:30:14,589 INFO Started subprocess (pid 7280): cumin-web --section=web --es=exit --tm=5 --daemon --somebadoption 7277 2012-10-03 14:30:14,613 INFO Started subprocess (pid 7281): cumin-data --section=data.grid --es=exit --tm=5 --daemon 7277 2012-10-03 14:30:14,714 INFO Started subprocess (pid 7282): cumin-data --section=data.grid-slots --es=exit --tm=5 --daemon 7277 2012-10-03 14:30:14,720 INFO Started subprocess (pid 7283): cumin-data --section=data.grid-submissions --es=exit --tm=5 --daem on 7277 2012-10-03 14:30:14,973 INFO Started subprocess (pid 7284): cumin-data --section=data.sesame --es=exit --tm=5 --daemon 7277 2012-10-03 14:30:15,474 ERROR Subprocess (7280) failed init checks with status 3 (parse error), error in options, arguments, or config values 7277 2012-10-03 14:30:15,474 INFO Subprocess logs may contain more details. 7277 2012-10-03 14:30:15,474 INFO Stopping cumin 7277 2012-10-03 14:30:15,475 INFO Write termination string to all children 7277 2012-10-03 14:30:15,730 INFO Subprocess (7283) exited 7277 2012-10-03 14:30:15,981 INFO Subprocess (7281) exited 7277 2012-10-03 14:30:15,981 INFO Subprocess (7282) exited 7277 2012-10-03 14:30:15,982 INFO Subprocess (7284) exited 7277 2012-10-03 14:30:15,982 INFO All children exited # more /var/log/cumin/web.stderr usage: cumin-web [options] cumin-web: error: no such option: --somebadoption 7280 2012-10-03 14:30:15,183 ERROR Error in options # more /var/log/cumin/.*.init :::::::::::::: .data.grid.init :::::::::::::: 0 :::::::::::::: .data.grid-slots.init :::::::::::::: 0 :::::::::::::: .data.grid-submissions.init :::::::::::::: 0 :::::::::::::: .data.sesame.init :::::::::::::: 0 :::::::::::::: .master.init :::::::::::::: 3 exit :::::::::::::: .web.init :::::::::::::: I was able to reproduce on cumin-0.1.5192-4 Verified on RHEL5 and RHEL6, both i386 and x86_64, with cumin-0.1.5648-1 -- During the verification I also tried the cases mentioned in comment 3. When using this option: # make the report instance fail (if running) CUMIN_OPTIONS-"--report-options='--some_bad_option'" starting cumin failed on RHEL6, but it was succesful on RHEL5. The reason is that this bad option should be parsed by cumin-report subprocess, but on RHEL5 it is not started at all. As far as I know this is the expected behaviour, so setting this bz as verified. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2013-0564.html |