Bug 1375714 - if qpid is slow/unresponsive, candlepin event listener will freeze in event loop, causing dynflow executor to stop responding
Summary: if qpid is slow/unresponsive, candlepin event listener will freeze in event l...
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: Red Hat Satellite
Classification: Red Hat
Component: Content Management
Version: 6.2.0
Hardware: Unspecified
OS: Unspecified
high
medium vote
Target Milestone: Unspecified
Assignee: satellite6-bugs
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-09-13 20:03 UTC by Chris Duryee
Modified: 2019-12-16 06:45 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-07-10 21:42:00 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Foreman Issue Tracker 16543 0 None None None 2016-09-13 20:03:48 UTC

Description Chris Duryee 2016-09-13 20:03:49 UTC
Description of problem:

If you suspend the qpid process (simulating high qpid load), the candlepin event listener process will hang in its event loop. This causes the dynflow executor to not proceed.

For example:

* on tasks page, view candlepin task process, it should say something like:

<pre>
{"messages"=>"b190e333-7821-302a-8131-9693e66e2144",
 "last_message"=>"b190e333-7821-302a-8131-9693e66e2144 - import.created",
 "error"=>nil,
 "connection"=>"Connected"}
</pre>

* now, freeze the qpidd process: kill -19 `pidof qpidd`. Note that the candlepin event listener still thinks its connected.

* do a "hammer ping", it will hang due to https://pulp.plan.io/issues/2253.

* do a "foreman-rake console" and run Katello::Ping.ping(services: [:foreman_tasks]). Note that the executor failed to respond.

Once qpidd is unsuspended via kill -18, things will run normally again.

Version-Release number of selected component (if applicable): 6.2

Comment 1 Chris Duryee 2016-09-13 20:07:08 UTC
This is a bug that jhutar originally found a few months ago. What happens is if qpid slows down, the Katello connection to qpid may eventually terminate after minutes/hours but Katello is none the wiser. Qpid will become responsive again, but the katello_event_queue will then keep filling as candlepin puts more events on it.

I have not reproduced this since it can take some time but I believe that is what happened.

The workaround is to restart foreman_tasks if the katello_event_queue appears to not be draining.

Comment 2 Bryan Kearney 2016-09-13 22:18:20 UTC
Upstream bug component is Content Management

Comment 7 Chris Duryee 2017-07-10 21:42:00 UTC
I think this is fixed. I have not seen it in some time, either on my own machines or on other machines.

Marking as closed/worksforme.


Note You need to log in before you can comment on or make changes to this bug.