Bug 752863 - katello service will return "OK" even if it all thin threads do not start correctly.
Summary: katello service will return "OK" even if it all thin threads do not start cor...
Keywords:
Status: CLOSED UPSTREAM
Alias: None
Product: Red Hat Satellite
Classification: Red Hat
Component: API
Version: 6.0.0
Hardware: Unspecified
OS: Unspecified
low
low
Target Milestone: Unspecified
Assignee: Miroslav Suchý
QA Contact: Katello QA List
URL:
Whiteboard:
Depends On: 754127 758651
Blocks: katello-blockers
TreeView+ depends on / blocked
 
Reported: 2011-11-10 16:08 UTC by Corey Welton
Modified: 2019-09-26 13:23 UTC (History)
4 users (show)

Fixed In Version: katello-1.1.10
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-09-19 18:15:11 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Corey Welton 2011-11-10 16:08:28 UTC
Description of problem:

When starting katello, we can get an "OK" even if the thin process(es) do not start up correctly.  For example, if something is running on port 5001 already (such as a preexisting thin service from a previous bad shutdown....) and katello service is started, we'll still go green.

Symptoms may include spurious oauth issues in UI and when pinging server.  I am not exactly 100% sure of a repro case that can cause the oauth, but following the steps below, you should(?) be able to get a "green" response on katello server startup even if thin process cannot successfully be loaded.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.  Start katello service. Assure you have at least two 'thin' threads running (probably ports 5000, 5001, etc.)
2.  'service katello stop'... after the first service (5000) has cleanly shutdown, hit ^C to simulate an unclean shutdown
3.  'ps ax|grep thin' to assure a secondary thin service is still running
4.  Attempt to restart katello
 
Actual results:

katello service restarts, apparently green - even though the supplementary thin processes obviously cannot start with the port already in use.

Expected results:
We should fail, or warn, if not everything starts up just right.

Additional info:

As background, I discovered this because I kept seeing intermittent oauth errors during sessions. Like, every other pageload or every other server ping or something. I would shut down all services, start them back up, same problem. run the oauth reset script, lather, rinse repeat, same issue. It wasn't until I looked at the logs for one of the thin processes that showed me a traceback due to its inability to start.   Apparently during an unclean shutdown, a thin process was left running in the background.

Again, I am not sure what might cause an unclean shutdown and  I am not sure what to do to make the oauth errors occur.  This might've happened due to this thin process running entirely throughout a removal and reinstall of katello.  In any case, I have seen it twice in the last few weeks.

So I guess there are two issues:
1) We're not shutting down cleanly all the time.  Not sure what is causing this, but I don't think you can ever catch all edge cases either
2) After an unclean shutdown, we're catching such problems on startup. /This/ is the root goal in this bug.

Comment 1 Corey Welton 2011-11-10 16:10:37 UTC
errr...

2) After an unclean shutdown, we're NOT catching such problems on startup.

Comment 2 Lukas Zapletal 2011-11-15 10:31:42 UTC
Investigating.

Comment 3 Lukas Zapletal 2011-11-15 12:57:44 UTC
This is rather feature of systemd. On Fedoras (15/16) our script actually never gets executed, because systemd already sees the instance is running. Try this:

systemctl status katello.service

	  Loaded: loaded (/etc/rc.d/init.d/katello)
	  Active: active (running) since Tue, 15 Nov 2011 12:09:08 +0100; 3min 26s ago
	 Process: 2297 ExecStop=/etc/rc.d/init.d/katello stop (code=exited, status=0/SUCCESS)
	 Process: 2317 ExecStart=/etc/rc.d/init.d/katello start (code=exited, status=0/SUCCESS)
	Main PID: 1648 (code=exited, status=0/SUCCESS)
	  CGroup: name=systemd:/system/katello.service
		  ├ 2328 thin server (0.0.0.0:5000)                                       ...
		  └ 2336 thin server (0.0.0.0:5001)

kill 2336
systemctl status katello.service

It still reports it is running.

I guess we need to write our own systemd units for taking care of that. I am really not sure if it helps. Giving it low priority as we can live with that until V1.

http://fedoraproject.org/wiki/Packaging:Systemd

I have modified your init script and enhanced "status" call - now it properly checks for all PID files.

On top of that I have found we are running thin with a daemon command - this is not necessary since thin already daemonize processes. I removed that.

I also removed unnecessary check_permission call because thin reports error when pid is not writable. This test was because of previous server (Webrick).

7d1da2e 752863 - katello service will return "OK" on error

@Corey - Can you please try it again with my patch on Fedora and RHEL? If they both give the same result, I suggest to postpone this one. Please set back to MODIFIED once you are done with the testing, thanks.

Comment 4 Lukas Zapletal 2011-11-15 13:53:25 UTC
I have talked to systemd guys and it seems it can't handle it. Adding dependency 754127 for this one.

Comment 6 Og Maciel 2012-02-29 15:29:41 UTC
[root@qetello02 ~]# service katello stop
Stopping server on 0.0.0.0:5000 ... 
^CStopping server on 0.0.0.0:5001 ... 
Sending QUIT signal to process 19109 ... 
>> Exiting!
Stopping server on 0.0.0.0:5002 ... 
Sending QUIT signal to process 19115 ... 
>> Exiting!
Stopping server on 0.0.0.0:5003 ... 
Sending QUIT signal to process 19147 ... 
>> Exiting!
Stopping server on 0.0.0.0:5004 ... 
Sending QUIT signal to process 19161 ... 
>> Exiting!
[root@qetello02 ~]# ps ax | grep thin
26050 pts/2    R+     0:00 grep thin
[root@qetello02 ~]# service katello start
Starting katello: Starting server on 0.0.0.0:5000 ... 
Starting server on 0.0.0.0:5001 ... 
Starting server on 0.0.0.0:5002 ... 
Starting server on 0.0.0.0:5003 ... 
Starting server on 0.0.0.0:5004 ... 
                                                           [  OK  ]

Comment 7 Lukas Zapletal 2012-02-29 16:12:08 UTC
Okay, I was NOT able to VERIFY this one.

1) stop katello
2) occupy one port, eg. with

nc -l 5001 &

3) start katello

It reports OK but one instance (5001) was not started.

We cannot do much anything with it. Thin returns ok in this case. Maybe we could add an explicit check for open ports before starting this, or explicit check after we start (something like pgrep)...

if [ $(pgrep thin | wc -l) -eq $THIN_INSTANCES ]; then
 echo OK
else
 echo FAIL
fi

It would be better to have some support upstream. Let's wait for them:

https://github.com/macournoyer/thin/issues/93

Setting back to NEW

Comment 11 Miroslav Suchý 2012-09-04 11:57:31 UTC
Fixed in https://bugzilla.redhat.com/show_bug.cgi?id=758651#c5

Comment 12 Mike McCune 2013-09-19 18:15:11 UTC
These bugs have been resolved in upstream projects for a period of months so I'm mass-closing them as CLOSED:UPSTREAM.  If this is a mistake feel free to re-open.


Note You need to log in before you can comment on or make changes to this bug.