Hide Forgot
Description of problem: Let's say you send a signal to the engine service (python script). Then, you restart the engine. The engine isn't stopped, as the service is down, so the result is two engines. Only one can get requests, of course, but there are many internal processes that are running. Version-Release number of selected component (if applicable): 3.6.4 How reproducible: Happened on local rhev environment once. Didn't try to reproduce. Steps to Reproduce: Longer description - steps to reproduce can be easily derived from here. * The engine was started on Mar 20, at 12:40:12, with PID 22749 for the engine and PID 22747 for the service script. * On Mar 21, around 13:01, someone tried to make a thread dump of the engine, maybe because it wasn't working correctly. It tried to use "kill -3", but used the wrong PID: # kill -3 22747 This killed the service script, but left the engine running. * On Mar 22, around 13:33, someone restarted the engine, but this didn't stop the engine that was already running, because the service script was already gone. As a result a new engine was started. This second engine failed to listen to ports, as they were in use by the old one, but anyhow it deployed the applications, connected to the database, and started to try to manage the hosts. Note that the old engine was still running, and servicing UI and API requests. Actual results: The signal wasn't propagated. Expected results: The signal should propagate. IMO on -3 it should keep the service running as well.
Moving from 4.0 alpha to 4.0 beta since 4.0 alpha has been already released and bug is not ON_QA.
oVirt 4.0 beta has been released, moving to RC milestone.
If you kill the process, it's up to you to make sure it is indeed dead . Closing won't fix.