Description of problem: problem: [root@hateya-rhevm ~]# kill -9 `pgrep java` [root@hateya-rhevm ~]# /etc/init.d/ovirt-engine start The engine PID file "/var/run/ovirt-engine.pid" already exists. mitigation: [root@hateya-rhevm ~]# rm -rf /var/run/ovirt-engine.pid [root@hateya-rhevm ~]# /etc/init.d/ovirt-engine start Started engine process 11798. expected results: behave like any other app and allow user to start the service.
Actually, this is a sign of going down uncleanly ('dirty bit'). We may need to run consistency check on the DB or whatever before we delete the PID file and run the service.
The change suggested for alternative 1 is available here: http://gerrit.ovirt.org/7175 It changes the service script so that it will send the following message to syslog (/var/log/messages): Aug 14 15:49:46 f17vm engine-service[18877]: The engine PID file "/var/run/ovirt-engine.pid" contains 18713 but that process doesn't exist. This means that the engine crashed or was killed. You will need to stop and start it again.
If you are absolutely sure that Comment #1 is none issue then 1 may be an option however 1. Is it also presented to the command line when running restart? 2. What happens if the server has crashed? This means that power cycle fencing will never be able to recover the RHEV Manager, right? this may be unacceptable on some customers (unless /var/run/*.pid is cleaned on boot)
I am not absolutely sure, there can be other issues, but I am not aware of them, that is why I prefer to not start the service automatically but warn the user instead. The message goes to syslog, not to the terminal. In the terminal the user will see only this: # service ovirt-engine start Starting engine-service: [FAILED] # echo $? 1 The /var/run directory is cleaned during boot, so a power cycle will most probably recover the service. I don't think this is very problematic, as the typical routine of any system administrator will be something like this: # service ovirt-engine start Starting engine-service: [FAILED] # service ovirt-engine status The engine process 1080 is not running. # tail /var/log/messages Aug 14 15:49:46 f17vm engine-service[18877]: The engine PID file "/var/run/ovirt-engine.pid" contains 1080 but that process doesn't exist. This means that the engine crashed or was killed. You will need to stop and start it again. # service ovirt-engine stop Stopping engine-service: [ OK ] # service ovirt-engine start Starting engine-service: [ OK ] # service ovirt-engine status The engine process 1082 is running.
The proposed change has been merged upstream.
Merged downstream, https://gerrit.eng.lab.tlv.redhat.com/gitweb?p=ovirt-engine.git;a=commit;h=c41f7a859942d3565aa637f9bce0e4d445ce2097
[root@aqua-rhel ovirt-engine]# kill -9 `pgrep java` [root@aqua-rhel ovirt-engine]# service ovirt-engine start Starting engine-service: [FAILED] ## /var/log/messages ug 29 11:33:34 aqua-rhel engine-service[23375]: The engine PID file "/var/run/ovirt-engine.pid" contains 23196 but that process doesn't exist. This means that the engine crashed or was killed. You need to explicitly run 'service ovirt-engine stop' and then 'service ovirt-engine start' to enable it again. [root@aqua-rhel ovirt-engine]# service ovirt-engine restart Stopping engine-service: [ OK ] Starting engine-service: [ OK ] Verified si15.1
Just a follow up from the future... There is no reason to prevent user of starting a daemon because there is an old pid left, as the process surly is not running. Telling the user to perform start and stop is void math statement just like: (-1 + 1 = 0) I suggest removing this none standard behavior of our daemon, per[1] [1] http://gerrit.ovirt.org/#/c/13415/
Per Juan suggestion I am reopening this bug to allow farther discussion. As I wrote in comment#19, the decision to force user to stop inactive service is not something that is expected per the right comment#0, which was the reason of opening this bug.
Alon, as you wrote the patch, please assign the bug to yourself.
Modified per future rebase.
Fixed, 3.3/is4 1. kill -9 `pgrep java` 2. service ovirt-engine start Starting oVirt Engine: [ OK ] Fixed, 3.3/is4
This bug is currently attached to errata RHEA-2013:15231. If this change is not to be documented in the text for this errata please either remove it from the errata, set the requires_doc_text flag to minus (-), or leave a "Doc Text" value of "--no tech note required" if you do not have permission to alter the flag. Otherwise to aid in the development of relevant and accurate release documentation, please fill out the "Doc Text" field above with these four (4) pieces of information: * Cause: What actions or circumstances cause this bug to present. * Consequence: What happens when the bug presents. * Fix: What was done to fix the bug. * Result: What now happens when the actions or circumstances above occur. (NB: this is not the same as 'the bug doesn't present anymore') Once filled out, please set the "Doc Type" field to the appropriate value for the type of change made and submit your edits to the bug. For further details on the Cause, Consequence, Fix, Result format please refer to: https://bugzilla.redhat.com/page.cgi?id=fields.html#cf_release_notes Thanks in advance.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2014-0038.html