1277292 – Package installation via Satellite 6.1 is much slower than yum

Red Hat Satellite engineering is moving the tracking of its product development work on Satellite to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "Satellite project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs will be migrated starting at the end of May. If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "Satellite project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/SAT-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1277292 - Package installation via Satellite 6.1 is much slower than yum

Summary: Package installation via Satellite 6.1 is much slower than yum

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Satellite
Classification:	Red Hat
Component:	Performance
Sub Component:
Version:	6.1.4
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	Unspecified
Assignee:	Katello Bug Bin
QA Contact:	jcallaha
Docs Contact:
URL:
Whiteboard:
Depends On:	1269509 1323726
Blocks:	1317008
TreeView+	depends on / blocked

Reported:	2015-11-02 23:50 UTC by Mike McCune
Modified:	2020-04-15 14:18 UTC (History)
CC List:	27 users (show)
Fixed In Version:	gofer-2.7.6-1
Doc Type:	Bug Fix
Doc Text:	Extraneous progress tracking was slowing package installation via Satellite 6.1. This tracking has been removed, and performance is much improved.
Clone Of:	1269509
Environment:
Last Closed:	2016-07-27 08:58:43 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2016:1500	0	normal	SHIPPED_LIVE	Red Hat Satellite 6.2 Base Libraries	2016-07-27 12:24:38 UTC

Description Mike McCune 2015-11-02 23:50:00 UTC

+++ This bug was initially created as a clone of Bug #1269509 +++

Description of problem:

When you attempt to install errata(s) onto a host via
the errata managment tab of a host. If that operation
will take longer to complete than approx ~120 seconds the operation will time out.

We narrowed this down to an issue with qpid:

"""
The difference in performance between satellite and using YUM directly to install errata (as reported in
#01519801) can be attributed to a known issue [2] with Qpid and AMQP 1.0.  Further it explains why this
performance is a regression in Satellite 6.1.  In 6.0, AMQP-0-10 was used.  In 6.1, AMQP-1.0 was introduced
to support Qpid Dispatch Router on Capsules.  Using AMQP-0-10, the client can explicitly trigger the flush
by sending a sync control (or setting the sync flag on the transfer). There is no such explicit mechanism in
AMQP 1.0, so at present the flush happens after a short timeout on the broker.  During the development and
testing of python-gofer-proton, I noted an approximate 1 second delay sending messages.  After discussion
with Ted Ross, Gordon Sim and Ken Giusti, this finding was confirmed.

So, what this means to Satellite is that it takes 1 second to send every message when using AMQP 1.0 to
durable queues when Qpid persistence is enabled (which it is).  The messaging flow during an errata install
looks something like this:

Satellite           Agent
- -----------------------------
   |                  |
   | request -------->|
   |<------- accepted |
   |<------- started  |
   |<------- progress*|
   |<------- result   |
   |                  |

* A TON OF THESE

For example: 549 (progress) messages where sent on a test system when installing 191 packages associated
with an Errata install.  At 1 second per progress message this adds 549 seconds (9 minutes) to the install.
 These numbers matched test results.  With progress reporting enabled: ~18 min.  Without progress reporting:
~9 min.

Solutions:

1. Fix the AMQP-1.0 issue in Qpid.  This has been described by the Qpid teams as difficult so unlikely this
will get fixed soon.  Or, that the fixed version be ported to RHEL6.

2. Disconnect progress reporting in the katelo agent plugin.  This solution is simple but assumes Katello
can live without progress reported for agent tasks.  This progress is reported to Pulp and is included in
the Pulp task.

3. A variation on #2.  If katello needs the reported progress information we could rate-limit the reported
progress.  This mitigates the issue but not as good as #1 or #2.


The disconnect delays introduced by AMQP-1.0 heartbeats that I observed and reported earlier seems to have
no significant impact on the performance of installing Errata.

Regards,

Jeff
"""

For 6.1.z we opted for (2) from above as merged into katello-2.2's agent:

https://github.com/Katello/katello-agent/pull/32

but we need a more permanent solution for 6.2 (or we just need to carry this change forward in 6.2 if we decide to just keep it as-is).

If this bug is being looked at near the end of the 6.2 cycle and we are out of time, just get the above mentioned PR merged into 6.2 and avoid any regressions.

Comment 1 Mike McCune 2015-11-02 23:51:23 UTC

I flagged this as a regression and a blocker as we can't ship 6.2 without at least pulling that PR above into 6.2's code.

Comment 3 Stuart Auchterlonie 2015-12-14 13:25:27 UTC

Suggestion on alternative fix #4

4) Report (interim) progress messages via a non durable queue.

The only progress message that we must received is the "completed"
messages. For any message that just provides an indication of
progress, it doesn't matter if the queue is not durable, because in
the event of a crash, the whole task will end up aborted, so any
"interim" progress reports are irrelevant.

Comment 4 Jeff Ortel 2016-01-28 15:36:36 UTC

This has been fixed using option #2 and can be moved to MODIFIED, ON_QA or CLOSED, right?  See: https://github.com/Katello/katello-agent/pull/33.

Comment 7 Stephen Benjamin 2016-02-29 19:19:21 UTC

katello-agent|5210a603a8281fb14dd1e9a23012baa123122762

Comment 8 Stephen Benjamin 2016-02-29 19:23:01 UTC

Actually, that commit already shipped with an errata as part of BZ1269509.  Can we just close this as a dupe? I would think it should just get into 6.2's katello-agent when we branch it for 6.2, no?

Comment 9 Stuart Auchterlonie 2016-03-01 13:14:57 UTC

Stephen,

In my opinion this bz cannot be closed until 6.2 is shipped.
BZ1269509 tracked the 6.1.z version of this issue.


Regards
Stuart

Comment 10 Stephen Benjamin 2016-03-01 14:28:10 UTC

Why? We generally only have one BZ per bug, not per Satellite version. Generally because it made a 6.1.z, the code will be in 6.2 when we branch the katello-agent repo. 

Comment #1 from Mike indicates he wants to keep this open though, so it doesn't much matter to me.

Comment 12 sthirugn@redhat.com 2016-04-03 01:52:02 UTC

Retested this scenario in Satellite 6.2-beta-snap-6.

Test steps:
1. Install 60+ errata in a rhel 7.2 content host from content host -> errata tab.
2. The task did not time out in 120 seconds as the bug status.
3. Instead, it timed out in 3600 seconds since the installation did not complete in 3600 seconds.  Note that 

Administer -> Settings -> Katello -> 
content_action_accept_timeout = 20
content_action_finish_timeout = 3600 

Also note that the content host got all the errata updates but took a little longer.

It looks to me that the bug is resolved as the timeout did not occur in 120 seconds.

Comment 14 sthirugn@redhat.com 2016-04-03 12:49:00 UTC

I tested with an other host applying 20+ erratas, the content host did not get the erratas.  I also tried with 3+ erratas - same problem.  There is definitely a problem in satellite connecting to katello-agent of content hosts.

Comment 17 Mike McCune 2016-04-06 05:42:45 UTC

moved https://bugzilla.redhat.com/show_bug.cgi?id=1323726 back ON_QA , moving this one as well.

Comment 18 sthirugn@redhat.com 2016-04-07 13:18:53 UTC

https://bugzilla.redhat.com/show_bug.cgi?id=1323726 is failed again.  Moving this back to ASSIGNED

Comment 19 Mike McCune 2016-04-08 00:21:18 UTC

ping pong ... back ON_QA :)

Comment 20 sthirugn@redhat.com 2016-04-11 18:57:34 UTC

still blocked on https://bugzilla.redhat.com/show_bug.cgi?id=1323726

Comment 21 sthirugn@redhat.com 2016-04-12 15:32:49 UTC

Moving to Assigned as per Comment 20

Comment 22 Mike McCune 2016-04-14 04:57:39 UTC

POST as indicated in https://bugzilla.redhat.com/show_bug.cgi?id=1323726

Comment 23 jcallaha 2016-04-22 15:56:29 UTC

Verified in Satellite 6.2 Beta Snap 9. With the new gofer package, and the resolution of the bug above, package installation is now within a reasonable margin of yum.

Comment 26 errata-xmlrpc 2016-07-27 08:58:43 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1500

Note You need to log in before you can comment on or make changes to this bug.

awestbro
bbuckingham
bkearney
bugzilla_rhn
chrobert
clocklea
cwelton
ehelms
jcallaha
jortel
jsherril
katello-qa-list
mburgerh
mmccune
mmello
mullens
oshtaier
paji
rdickens
riehecky
sauchter
skielek
stbenjam
sthirugn
tonflo
wburrows
wharris