752863 – katello service will return "OK" even if it all thin threads do not start correctly.

Red Hat Satellite engineering is moving the tracking of its product development work on Satellite to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "Satellite project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs will be migrated starting at the end of May. If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "Satellite project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/SAT-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 752863 - katello service will return "OK" even if it all thin threads do not start correctly.

Summary: katello service will return "OK" even if it all thin threads do not start cor...

Keywords:
Status:	CLOSED UPSTREAM
Alias:	None
Product:	Red Hat Satellite
Classification:	Red Hat
Component:	API
Sub Component:
Version:	6.0.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	low
Severity:	low
Target Milestone:	Unspecified
Assignee:	Miroslav Suchý
QA Contact:	Katello QA List
Docs Contact:
URL:
Whiteboard:
Depends On:	754127 758651
Blocks:	katello-blockers
TreeView+	depends on / blocked

Reported:	2011-11-10 16:08 UTC by Corey Welton
Modified:	2019-09-26 13:23 UTC (History)
CC List:	4 users (show)
Fixed In Version:	katello-1.1.10
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2013-09-19 18:15:11 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Corey Welton 2011-11-10 16:08:28 UTC

Description of problem:

When starting katello, we can get an "OK" even if the thin process(es) do not start up correctly. For example, if something is running on port 5001 already (such as a preexisting thin service from a previous bad shutdown....) and katello service is started, we'll still go green.

Symptoms may include spurious oauth issues in UI and when pinging server. I am not exactly 100% sure of a repro case that can cause the oauth, but following the steps below, you should(?) be able to get a "green" response on katello server startup even if thin process cannot successfully be loaded.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1. Start katello service. Assure you have at least two 'thin' threads running (probably ports 5000, 5001, etc.)
2. 'service katello stop'... after the first service (5000) has cleanly shutdown, hit ^C to simulate an unclean shutdown
3. 'ps ax|grep thin' to assure a secondary thin service is still running
4. Attempt to restart katello

Actual results:

katello service restarts, apparently green - even though the supplementary thin processes obviously cannot start with the port already in use.

Expected results:
We should fail, or warn, if not everything starts up just right.

Additional info:

As background, I discovered this because I kept seeing intermittent oauth errors during sessions. Like, every other pageload or every other server ping or something. I would shut down all services, start them back up, same problem. run the oauth reset script, lather, rinse repeat, same issue. It wasn't until I looked at the logs for one of the thin processes that showed me a traceback due to its inability to start. Apparently during an unclean shutdown, a thin process was left running in the background.

Again, I am not sure what might cause an unclean shutdown and I am not sure what to do to make the oauth errors occur. This might've happened due to this thin process running entirely throughout a removal and reinstall of katello. In any case, I have seen it twice in the last few weeks.

So I guess there are two issues:
1) We're not shutting down cleanly all the time. Not sure what is causing this, but I don't think you can ever catch all edge cases either
2) After an unclean shutdown, we're catching such problems on startup. /This/ is the root goal in this bug.

Comment 1 Corey Welton 2011-11-10 16:10:37 UTC

errr...

2) After an unclean shutdown, we're NOT catching such problems on startup.

Comment 2 Lukas Zapletal 2011-11-15 10:31:42 UTC

Investigating.

Comment 3 Lukas Zapletal 2011-11-15 12:57:44 UTC

This is rather feature of systemd. On Fedoras (15/16) our script actually never gets executed, because systemd already sees the instance is running. Try this:

systemctl status katello.service

	  Loaded: loaded (/etc/rc.d/init.d/katello)
	  Active: active (running) since Tue, 15 Nov 2011 12:09:08 +0100; 3min 26s ago
	 Process: 2297 ExecStop=/etc/rc.d/init.d/katello stop (code=exited, status=0/SUCCESS)
	 Process: 2317 ExecStart=/etc/rc.d/init.d/katello start (code=exited, status=0/SUCCESS)
	Main PID: 1648 (code=exited, status=0/SUCCESS)
	  CGroup: name=systemd:/system/katello.service
		  ├ 2328 thin server (0.0.0.0:5000)                                       ...
		  └ 2336 thin server (0.0.0.0:5001)

kill 2336
systemctl status katello.service

It still reports it is running.

I guess we need to write our own systemd units for taking care of that. I am really not sure if it helps. Giving it low priority as we can live with that until V1.

http://fedoraproject.org/wiki/Packaging:Systemd

I have modified your init script and enhanced "status" call - now it properly checks for all PID files.

On top of that I have found we are running thin with a daemon command - this is not necessary since thin already daemonize processes. I removed that.

I also removed unnecessary check_permission call because thin reports error when pid is not writable. This test was because of previous server (Webrick).

7d1da2e 752863 - katello service will return "OK" on error

@Corey - Can you please try it again with my patch on Fedora and RHEL? If they both give the same result, I suggest to postpone this one. Please set back to MODIFIED once you are done with the testing, thanks.

Comment 4 Lukas Zapletal 2011-11-15 13:53:25 UTC

I have talked to systemd guys and it seems it can't handle it. Adding dependency 754127 for this one.

Comment 6 Og Maciel 2012-02-29 15:29:41 UTC

[root@qetello02 ~]# service katello stop
Stopping server on 0.0.0.0:5000 ... 
^CStopping server on 0.0.0.0:5001 ... 
Sending QUIT signal to process 19109 ... 
>> Exiting!
Stopping server on 0.0.0.0:5002 ... 
Sending QUIT signal to process 19115 ... 
>> Exiting!
Stopping server on 0.0.0.0:5003 ... 
Sending QUIT signal to process 19147 ... 
>> Exiting!
Stopping server on 0.0.0.0:5004 ... 
Sending QUIT signal to process 19161 ... 
>> Exiting!
[root@qetello02 ~]# ps ax | grep thin
26050 pts/2    R+     0:00 grep thin
[root@qetello02 ~]# service katello start
Starting katello: Starting server on 0.0.0.0:5000 ... 
Starting server on 0.0.0.0:5001 ... 
Starting server on 0.0.0.0:5002 ... 
Starting server on 0.0.0.0:5003 ... 
Starting server on 0.0.0.0:5004 ... 
                                                           [  OK  ]

Comment 7 Lukas Zapletal 2012-02-29 16:12:08 UTC

Okay, I was NOT able to VERIFY this one.

1) stop katello
2) occupy one port, eg. with

nc -l 5001 &

3) start katello

It reports OK but one instance (5001) was not started.

We cannot do much anything with it. Thin returns ok in this case. Maybe we could add an explicit check for open ports before starting this, or explicit check after we start (something like pgrep)...

if [ $(pgrep thin | wc -l) -eq $THIN_INSTANCES ]; then
 echo OK
else
 echo FAIL
fi

It would be better to have some support upstream. Let's wait for them:

https://github.com/macournoyer/thin/issues/93

Setting back to NEW

Comment 11 Miroslav Suchý 2012-09-04 11:57:31 UTC

Fixed in https://bugzilla.redhat.com/show_bug.cgi?id=758651#c5

Comment 12 Mike McCune 2013-09-19 18:15:11 UTC

These bugs have been resolved in upstream projects for a period of months so I'm mass-closing them as CLOSED:UPSTREAM.  If this is a mistake feel free to re-open.

Note You need to log in before you can comment on or make changes to this bug.