Description of problem: When starting katello, we can get an "OK" even if the thin process(es) do not start up correctly. For example, if something is running on port 5001 already (such as a preexisting thin service from a previous bad shutdown....) and katello service is started, we'll still go green. Symptoms may include spurious oauth issues in UI and when pinging server. I am not exactly 100% sure of a repro case that can cause the oauth, but following the steps below, you should(?) be able to get a "green" response on katello server startup even if thin process cannot successfully be loaded. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. Start katello service. Assure you have at least two 'thin' threads running (probably ports 5000, 5001, etc.) 2. 'service katello stop'... after the first service (5000) has cleanly shutdown, hit ^C to simulate an unclean shutdown 3. 'ps ax|grep thin' to assure a secondary thin service is still running 4. Attempt to restart katello Actual results: katello service restarts, apparently green - even though the supplementary thin processes obviously cannot start with the port already in use. Expected results: We should fail, or warn, if not everything starts up just right. Additional info: As background, I discovered this because I kept seeing intermittent oauth errors during sessions. Like, every other pageload or every other server ping or something. I would shut down all services, start them back up, same problem. run the oauth reset script, lather, rinse repeat, same issue. It wasn't until I looked at the logs for one of the thin processes that showed me a traceback due to its inability to start. Apparently during an unclean shutdown, a thin process was left running in the background. Again, I am not sure what might cause an unclean shutdown and I am not sure what to do to make the oauth errors occur. This might've happened due to this thin process running entirely throughout a removal and reinstall of katello. In any case, I have seen it twice in the last few weeks. So I guess there are two issues: 1) We're not shutting down cleanly all the time. Not sure what is causing this, but I don't think you can ever catch all edge cases either 2) After an unclean shutdown, we're catching such problems on startup. /This/ is the root goal in this bug.
errr... 2) After an unclean shutdown, we're NOT catching such problems on startup.
Investigating.
This is rather feature of systemd. On Fedoras (15/16) our script actually never gets executed, because systemd already sees the instance is running. Try this: systemctl status katello.service Loaded: loaded (/etc/rc.d/init.d/katello) Active: active (running) since Tue, 15 Nov 2011 12:09:08 +0100; 3min 26s ago Process: 2297 ExecStop=/etc/rc.d/init.d/katello stop (code=exited, status=0/SUCCESS) Process: 2317 ExecStart=/etc/rc.d/init.d/katello start (code=exited, status=0/SUCCESS) Main PID: 1648 (code=exited, status=0/SUCCESS) CGroup: name=systemd:/system/katello.service ├ 2328 thin server (0.0.0.0:5000) ... └ 2336 thin server (0.0.0.0:5001) kill 2336 systemctl status katello.service It still reports it is running. I guess we need to write our own systemd units for taking care of that. I am really not sure if it helps. Giving it low priority as we can live with that until V1. http://fedoraproject.org/wiki/Packaging:Systemd I have modified your init script and enhanced "status" call - now it properly checks for all PID files. On top of that I have found we are running thin with a daemon command - this is not necessary since thin already daemonize processes. I removed that. I also removed unnecessary check_permission call because thin reports error when pid is not writable. This test was because of previous server (Webrick). 7d1da2e 752863 - katello service will return "OK" on error @Corey - Can you please try it again with my patch on Fedora and RHEL? If they both give the same result, I suggest to postpone this one. Please set back to MODIFIED once you are done with the testing, thanks.
I have talked to systemd guys and it seems it can't handle it. Adding dependency 754127 for this one.
[root@qetello02 ~]# service katello stop Stopping server on 0.0.0.0:5000 ... ^CStopping server on 0.0.0.0:5001 ... Sending QUIT signal to process 19109 ... >> Exiting! Stopping server on 0.0.0.0:5002 ... Sending QUIT signal to process 19115 ... >> Exiting! Stopping server on 0.0.0.0:5003 ... Sending QUIT signal to process 19147 ... >> Exiting! Stopping server on 0.0.0.0:5004 ... Sending QUIT signal to process 19161 ... >> Exiting! [root@qetello02 ~]# ps ax | grep thin 26050 pts/2 R+ 0:00 grep thin [root@qetello02 ~]# service katello start Starting katello: Starting server on 0.0.0.0:5000 ... Starting server on 0.0.0.0:5001 ... Starting server on 0.0.0.0:5002 ... Starting server on 0.0.0.0:5003 ... Starting server on 0.0.0.0:5004 ... [ OK ]
Okay, I was NOT able to VERIFY this one. 1) stop katello 2) occupy one port, eg. with nc -l 5001 & 3) start katello It reports OK but one instance (5001) was not started. We cannot do much anything with it. Thin returns ok in this case. Maybe we could add an explicit check for open ports before starting this, or explicit check after we start (something like pgrep)... if [ $(pgrep thin | wc -l) -eq $THIN_INSTANCES ]; then echo OK else echo FAIL fi It would be better to have some support upstream. Let's wait for them: https://github.com/macournoyer/thin/issues/93 Setting back to NEW
Fixed in https://bugzilla.redhat.com/show_bug.cgi?id=758651#c5
These bugs have been resolved in upstream projects for a period of months so I'm mass-closing them as CLOSED:UPSTREAM. If this is a mistake feel free to re-open.