Bug 1277292 - Package installation via Satellite 6.1 is much slower than yum
Summary: Package installation via Satellite 6.1 is much slower than yum
Alias: None
Product: Red Hat Satellite
Classification: Red Hat
Component: Performance
Version: 6.1.4
Hardware: Unspecified
OS: Unspecified
high vote
Target Milestone: Unspecified
Assignee: Katello Bug Bin
QA Contact: jcallaha
Depends On: 1269509 1323726
Blocks: 1317008
TreeView+ depends on / blocked
Reported: 2015-11-02 23:50 UTC by Mike McCune
Modified: 2020-04-15 14:18 UTC (History)
27 users (show)

Fixed In Version: gofer-2.7.6-1
Doc Type: Bug Fix
Doc Text:
Extraneous progress tracking was slowing package installation via Satellite 6.1. This tracking has been removed, and performance is much improved.
Clone Of: 1269509
Last Closed: 2016-07-27 08:58:43 UTC
Target Upstream Version:

Attachments (Terms of Use)

System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2016:1500 normal SHIPPED_LIVE Red Hat Satellite 6.2 Base Libraries 2016-07-27 12:24:38 UTC

Description Mike McCune 2015-11-02 23:50:00 UTC
+++ This bug was initially created as a clone of Bug #1269509 +++

Description of problem:

When you attempt to install errata(s) onto a host via
the errata managment tab of a host. If that operation
will take longer to complete than approx ~120 seconds the operation will time out.

We narrowed this down to an issue with qpid:

The difference in performance between satellite and using YUM directly to install errata (as reported in
#01519801) can be attributed to a known issue [2] with Qpid and AMQP 1.0.  Further it explains why this
performance is a regression in Satellite 6.1.  In 6.0, AMQP-0-10 was used.  In 6.1, AMQP-1.0 was introduced
to support Qpid Dispatch Router on Capsules.  Using AMQP-0-10, the client can explicitly trigger the flush
by sending a sync control (or setting the sync flag on the transfer). There is no such explicit mechanism in
AMQP 1.0, so at present the flush happens after a short timeout on the broker.  During the development and
testing of python-gofer-proton, I noted an approximate 1 second delay sending messages.  After discussion
with Ted Ross, Gordon Sim and Ken Giusti, this finding was confirmed.

So, what this means to Satellite is that it takes 1 second to send every message when using AMQP 1.0 to
durable queues when Qpid persistence is enabled (which it is).  The messaging flow during an errata install
looks something like this:

Satellite           Agent
- -----------------------------
   |                  |
   | request -------->|
   |<------- accepted |
   |<------- started  |
   |<------- progress*|
   |<------- result   |
   |                  |


For example: 549 (progress) messages where sent on a test system when installing 191 packages associated
with an Errata install.  At 1 second per progress message this adds 549 seconds (9 minutes) to the install.
 These numbers matched test results.  With progress reporting enabled: ~18 min.  Without progress reporting:
~9 min.


1. Fix the AMQP-1.0 issue in Qpid.  This has been described by the Qpid teams as difficult so unlikely this
will get fixed soon.  Or, that the fixed version be ported to RHEL6.

2. Disconnect progress reporting in the katelo agent plugin.  This solution is simple but assumes Katello
can live without progress reported for agent tasks.  This progress is reported to Pulp and is included in
the Pulp task.

3. A variation on #2.  If katello needs the reported progress information we could rate-limit the reported
progress.  This mitigates the issue but not as good as #1 or #2.

The disconnect delays introduced by AMQP-1.0 heartbeats that I observed and reported earlier seems to have
no significant impact on the performance of installing Errata.



For 6.1.z we opted for (2) from above as merged into katello-2.2's agent:


but we need a more permanent solution for 6.2 (or we just need to carry this change forward in 6.2 if we decide to just keep it as-is).

If this bug is being looked at near the end of the 6.2 cycle and we are out of time, just get the above mentioned PR merged into 6.2 and avoid any regressions.

Comment 1 Mike McCune 2015-11-02 23:51:23 UTC
I flagged this as a regression and a blocker as we can't ship 6.2 without at least pulling that PR above into 6.2's code.

Comment 3 Stuart Auchterlonie 2015-12-14 13:25:27 UTC
Suggestion on alternative fix #4

4) Report (interim) progress messages via a non durable queue.

The only progress message that we must received is the "completed"
messages. For any message that just provides an indication of
progress, it doesn't matter if the queue is not durable, because in
the event of a crash, the whole task will end up aborted, so any
"interim" progress reports are irrelevant.

Comment 4 Jeff Ortel 2016-01-28 15:36:36 UTC
This has been fixed using option #2 and can be moved to MODIFIED, ON_QA or CLOSED, right?  See: https://github.com/Katello/katello-agent/pull/33.

Comment 7 Stephen Benjamin 2016-02-29 19:19:21 UTC

Comment 8 Stephen Benjamin 2016-02-29 19:23:01 UTC
Actually, that commit already shipped with an errata as part of BZ1269509.  Can we just close this as a dupe? I would think it should just get into 6.2's katello-agent when we branch it for 6.2, no?

Comment 9 Stuart Auchterlonie 2016-03-01 13:14:57 UTC

In my opinion this bz cannot be closed until 6.2 is shipped.
BZ1269509 tracked the 6.1.z version of this issue.


Comment 10 Stephen Benjamin 2016-03-01 14:28:10 UTC
Why? We generally only have one BZ per bug, not per Satellite version. Generally because it made a 6.1.z, the code will be in 6.2 when we branch the katello-agent repo. 

Comment #1 from Mike indicates he wants to keep this open though, so it doesn't much matter to me.

Comment 12 sthirugn@redhat.com 2016-04-03 01:52:02 UTC
Retested this scenario in Satellite 6.2-beta-snap-6.

Test steps:
1. Install 60+ errata in a rhel 7.2 content host from content host -> errata tab.
2. The task did not time out in 120 seconds as the bug status.
3. Instead, it timed out in 3600 seconds since the installation did not complete in 3600 seconds.  Note that 

Administer -> Settings -> Katello -> 
content_action_accept_timeout = 20
content_action_finish_timeout = 3600 

Also note that the content host got all the errata updates but took a little longer.

It looks to me that the bug is resolved as the timeout did not occur in 120 seconds.

Comment 14 sthirugn@redhat.com 2016-04-03 12:49:00 UTC
I tested with an other host applying 20+ erratas, the content host did not get the erratas.  I also tried with 3+ erratas - same problem.  There is definitely a problem in satellite connecting to katello-agent of content hosts.

Comment 17 Mike McCune 2016-04-06 05:42:45 UTC
moved https://bugzilla.redhat.com/show_bug.cgi?id=1323726 back ON_QA , moving this one as well.

Comment 18 sthirugn@redhat.com 2016-04-07 13:18:53 UTC
https://bugzilla.redhat.com/show_bug.cgi?id=1323726 is failed again.  Moving this back to ASSIGNED

Comment 19 Mike McCune 2016-04-08 00:21:18 UTC
ping pong ... back ON_QA :)

Comment 20 sthirugn@redhat.com 2016-04-11 18:57:34 UTC
still blocked on https://bugzilla.redhat.com/show_bug.cgi?id=1323726

Comment 21 sthirugn@redhat.com 2016-04-12 15:32:49 UTC
Moving to Assigned as per Comment 20

Comment 22 Mike McCune 2016-04-14 04:57:39 UTC
POST as indicated in https://bugzilla.redhat.com/show_bug.cgi?id=1323726

Comment 23 jcallaha 2016-04-22 15:56:29 UTC
Verified in Satellite 6.2 Beta Snap 9. With the new gofer package, and the resolution of the bug above, package installation is now within a reasonable margin of yum.

Comment 26 errata-xmlrpc 2016-07-27 08:58:43 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.