Bug 1375714

Summary: if qpid is slow/unresponsive, candlepin event listener will freeze in event loop, causing dynflow executor to stop responding
Product: Red Hat Satellite Reporter: Chris Duryee <cduryee>
Component: Content ManagementAssignee: satellite6-bugs <satellite6-bugs>
Status: CLOSED WORKSFORME QA Contact:
Severity: medium Docs Contact:
Priority: high    
Version: 6.2.0CC: bbuckingham, bkearney, cadams, cduryee, jcallaha, jhutar, mmccune, oshtaier
Target Milestone: UnspecifiedKeywords: Triaged
Target Release: Unused   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-07-10 21:42:00 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Chris Duryee 2016-09-13 20:03:49 UTC
Description of problem:

If you suspend the qpid process (simulating high qpid load), the candlepin event listener process will hang in its event loop. This causes the dynflow executor to not proceed.

For example:

* on tasks page, view candlepin task process, it should say something like:

<pre>
{"messages"=>"b190e333-7821-302a-8131-9693e66e2144",
 "last_message"=>"b190e333-7821-302a-8131-9693e66e2144 - import.created",
 "error"=>nil,
 "connection"=>"Connected"}
</pre>

* now, freeze the qpidd process: kill -19 `pidof qpidd`. Note that the candlepin event listener still thinks its connected.

* do a "hammer ping", it will hang due to https://pulp.plan.io/issues/2253.

* do a "foreman-rake console" and run Katello::Ping.ping(services: [:foreman_tasks]). Note that the executor failed to respond.

Once qpidd is unsuspended via kill -18, things will run normally again.

Version-Release number of selected component (if applicable): 6.2

Comment 1 Chris Duryee 2016-09-13 20:07:08 UTC
This is a bug that jhutar originally found a few months ago. What happens is if qpid slows down, the Katello connection to qpid may eventually terminate after minutes/hours but Katello is none the wiser. Qpid will become responsive again, but the katello_event_queue will then keep filling as candlepin puts more events on it.

I have not reproduced this since it can take some time but I believe that is what happened.

The workaround is to restart foreman_tasks if the katello_event_queue appears to not be draining.

Comment 2 Bryan Kearney 2016-09-13 22:18:20 UTC
Upstream bug component is Content Management

Comment 7 Chris Duryee 2017-07-10 21:42:00 UTC
I think this is fixed. I have not seen it in some time, either on my own machines or on other machines.

Marking as closed/worksforme.