Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1399877 - Improve ListenOnCandlepinEvents throughput
Improve ListenOnCandlepinEvents throughput
Status: CLOSED ERRATA
Product: Red Hat Satellite 6
Classification: Red Hat
Component: Candlepin (Show other bugs)
6.2.0
Unspecified Unspecified
unspecified Severity high (vote)
: 6.2.7
: Unused
Assigned To: Eric Helms
Roman Plevka
: PrioBumpField, Triaged
Depends On:
Blocks: 1405526
  Show dependency treegraph
 
Reported: 2016-11-29 19:45 EST by Mike McCune
Modified: 2018-04-24 03:41 EDT (History)
9 users (show)

See Also:
Fixed In Version: rubygem-katello-3.0.0.91-1
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1405526 (view as bug list)
Environment:
Last Closed: 2017-01-26 05:46:27 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Patch (1.72 KB, patch)
2017-01-23 17:24 EST, Justin Sherrill
no flags Details | Diff


External Trackers
Tracker ID Priority Status Summary Last Updated
Foreman Issue Tracker 17498 None None None 2016-11-29 19:45 EST
Red Hat Product Errata RHBA-2017:0197 normal SHIPPED_LIVE Satellite 6.2.7 Async Bug Release 2017-01-26 10:38:38 EST

  None (edit)
Description Mike McCune 2016-11-29 19:45:02 EST
Main loop of the task has redundant "sleep 1", see (current) https://github.com/Katello/katello/blob/master/app/lib/actions/candlepin/candlepin_listening_service.rb#L86 . That means, the task can process at most 1 candlepin event per second.

This could be limiting factor when a bigger burst of events come, as well as in situations when katello_event_queue has thousands of messages of backlog and processing it takes ridiculously huge time in hours (this is something I see in field relatively often - ListenOnCandlepinEvents task is stopped/paused due to whatever issue, and fixing it means several hours of processing its backlog - only due to the "sleep 1" after each and every message processing).

I have successfully tested removal of the sleep by having >20k messages in the queue and restarting foreman-tasks service. Since the time this task subscribed to the queue, the 21k messages were consumed within 35 seconds with the only impact of dynflow_executor consuming high CPU.

Therefore I see no reason for having the sleep there (that is present there rather due to historical reasons, as far as I understood).
Comment 1 Mike McCune 2016-11-29 19:45:07 EST
Created from redmine issue http://projects.theforeman.org/issues/17498
Comment 4 Bryan Kearney 2016-12-01 10:15:19 EST
Upstream bug component is Candlepin
Comment 6 Bryan Kearney 2016-12-02 12:15:18 EST
Moving this bug to POST for triage into Satellite 6 since the upstream issue http://projects.theforeman.org/issues/17498 has been resolved.
Comment 7 Justin Sherrill 2017-01-23 17:24 EST
Created attachment 1243776 [details]
Patch
Comment 8 Justin Sherrill 2017-01-24 10:25:48 EST
To reproduce:

1.  edit /opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.0.*/lib/katello/engine.rb

comment out this line:

::Actions::Candlepin::ListenOnCandlepinEvents.ensure_running(world)

(around line 118), so it look like:

#::Actions::Candlepin::ListenOnCandlepinEvents.ensure_running(world)

2.  restart foreman-tasks

3.  Register a bunch of clients, checking:

qpid-stat -q --ssl-certificate=/etc/pki/katello/qpid_client_striped.crt -b amqps://localhost:5671 | grep katello_event_queue                                                       

and you should see the queue message count going up.  When it gets up to 1-2 thousand:

4.  Uncomment the same line in engine.rb

5.  Restart foreman-tasks

6.  monitor the qpid-stat command again, you see the count decrease fairly quickly.  1000 messages should be processed in ~10-20 seconds.
Comment 9 Roman Plevka 2017-01-24 12:44:40 EST
VERIFIED
on sat6.2.7-1

message count in katello_event_queue queue managed to drop from about 2.79k to 27 in about 30seconds.
No complications detected
Comment 11 errata-xmlrpc 2017-01-26 05:46:27 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:0197

Note You need to log in before you can comment on or make changes to this bug.