Bug 838527 - [rhevm] unable to start ovirt-engine if service crash and pid is left
[rhevm] unable to start ovirt-engine if service crash and pid is left
Status: CLOSED ERRATA
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine-setup (Show other bugs)
3.1.0
x86_64 Linux
unspecified Severity medium
: ---
: 3.3.0
Assigned To: Alon Bar-Lev
Pavel Stehlik
integration
: Reopened
Depends On: 952297
Blocks:
  Show dependency treegraph
 
Reported: 2012-07-09 07:34 EDT by Haim
Modified: 2015-09-22 09 EDT (History)
11 users (show)

See Also:
Fixed In Version: is1
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2014-01-21 12:28:09 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
oVirt gerrit 13415 None None None Never

  None (edit)
Description Haim 2012-07-09 07:34:44 EDT
Description of problem:

problem:

[root@hateya-rhevm ~]# kill -9 `pgrep java`
[root@hateya-rhevm ~]# /etc/init.d/ovirt-engine start 
The engine PID file "/var/run/ovirt-engine.pid" already exists.

mitigation:


[root@hateya-rhevm ~]# rm -rf /var/run/ovirt-engine.pid
[root@hateya-rhevm ~]# /etc/init.d/ovirt-engine start 
Started engine process 11798.

expected results: behave like any other app and allow user to start the service.
Comment 1 Yaniv Kaul 2012-07-09 07:37:04 EDT
Actually, this is a sign of going down uncleanly ('dirty bit'). We may need to run consistency check on the DB or whatever before we delete the PID file and run the service.
Comment 9 Juan Hernández 2012-08-14 09:56:45 EDT
The change suggested for alternative 1 is available here:

http://gerrit.ovirt.org/7175

It changes the service script so that it will send the following message to syslog (/var/log/messages):

Aug 14 15:49:46 f17vm engine-service[18877]: The engine PID file "/var/run/ovirt-engine.pid" contains 18713 but that process doesn't exist. This means that the engine crashed or was killed. You will need to stop and start it again.
Comment 10 Simon Grinberg 2012-08-14 10:09:12 EDT
If you are absolutely sure that Comment #1 is none issue then 1 may be an option however

1. Is it also presented to the command line when running restart?
2. What happens if the server has crashed? This means that power cycle fencing will never be able to recover the RHEV Manager, right? this may be unacceptable on some customers (unless /var/run/*.pid is cleaned on boot)
Comment 11 Juan Hernández 2012-08-14 11:30:30 EDT
I am not absolutely sure, there can be other issues, but I am not aware of them, that is why I prefer to not start the service automatically but warn the user instead.

The message goes to syslog, not to the terminal. In the terminal the user will see only this:

# service ovirt-engine start
Starting engine-service:                                    [FAILED]
# echo $?
1

The /var/run directory is cleaned during boot, so a power cycle will most probably recover the service.

I don't think this is very problematic, as the typical routine of any system administrator will be something like this:

# service ovirt-engine start
Starting engine-service:                                    [FAILED]

# service ovirt-engine status
The engine process 1080 is not running.

# tail /var/log/messages
Aug 14 15:49:46 f17vm engine-service[18877]: The engine PID file "/var/run/ovirt-engine.pid" contains 1080 but that process doesn't exist. This means that the engine crashed or was killed. You will need to stop and start it again.

# service ovirt-engine stop
Stopping engine-service:                                    [  OK  ]

# service ovirt-engine start
Starting engine-service:                                    [  OK  ]

# service ovirt-engine status
The engine process 1082 is running.
Comment 16 Juan Hernández 2012-08-17 08:43:19 EDT
The proposed change has been merged upstream.
Comment 18 Oded Ramraz 2012-08-29 04:35:22 EDT
[root@aqua-rhel ovirt-engine]# kill -9 `pgrep java`
[root@aqua-rhel ovirt-engine]# service ovirt-engine start
Starting engine-service:                                    [FAILED]

## /var/log/messages 

ug 29 11:33:34 aqua-rhel engine-service[23375]: The engine PID file "/var/run/ovirt-engine.pid" contains 23196 but that process doesn't exist. This means that the engine crashed or was killed. You need to explicitly run 'service ovirt-engine stop' and then 'service ovirt-engine start' to enable it again.

[root@aqua-rhel ovirt-engine]# service ovirt-engine restart
Stopping engine-service:                                    [  OK  ]
Starting engine-service:                                    [  OK  ]

Verified si15.1
Comment 19 Alon Bar-Lev 2013-04-01 05:51:08 EDT
Just a follow up from the future...

There is no reason to prevent user of starting a daemon because there is an old pid left, as the process surly is not running.

Telling the user to perform start and stop is void math statement just like:
  (-1 + 1 = 0)

I suggest removing this none standard behavior of our daemon, per[1]

[1] http://gerrit.ovirt.org/#/c/13415/
Comment 20 Alon Bar-Lev 2013-04-01 06:11:23 EDT
Per Juan suggestion I am reopening this bug to allow farther discussion.

As I wrote in comment#19, the decision to force user to stop inactive service is not something that is expected per the right comment#0, which was the reason of opening this bug.
Comment 21 Juan Hernández 2013-04-01 06:12:59 EDT
Alon, as you wrote the patch, please assign the bug to yourself.
Comment 22 Alon Bar-Lev 2013-04-15 11:53:29 EDT
Modified per future rebase.
Comment 23 David Botzer 2013-07-04 00:57:17 EDT
Fixed, 3.3/is4
1. kill -9 `pgrep java`
2. service ovirt-engine start
       Starting oVirt Engine:     [  OK  ]
Fixed, 3.3/is4
Comment 24 Charlie 2013-11-27 19:24:33 EST
This bug is currently attached to errata RHEA-2013:15231. If this change is not to be documented in the text for this errata please either remove it from the errata, set the requires_doc_text flag to minus (-), or leave a "Doc Text" value of "--no tech note required" if you do not have permission to alter the flag.

Otherwise to aid in the development of relevant and accurate release documentation, please fill out the "Doc Text" field above with these four (4) pieces of information:

* Cause: What actions or circumstances cause this bug to present.
* Consequence: What happens when the bug presents.
* Fix: What was done to fix the bug.
* Result: What now happens when the actions or circumstances above occur. (NB: this is not the same as 'the bug doesn't present anymore')

Once filled out, please set the "Doc Type" field to the appropriate value for the type of change made and submit your edits to the bug.

For further details on the Cause, Consequence, Fix, Result format please refer to:

https://bugzilla.redhat.com/page.cgi?id=fields.html#cf_release_notes 

Thanks in advance.
Comment 28 errata-xmlrpc 2014-01-21 12:28:09 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2014-0038.html

Note You need to log in before you can comment on or make changes to this bug.