1853076 – large capsule syncs cause slow processing of dynflow tasks/steps

Red Hat Satellite engineering is moving the tracking of its product development work on Satellite to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "Satellite project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs will be migrated starting at the end of May. If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "Satellite project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/SAT-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1853076 - large capsule syncs cause slow processing of dynflow tasks/steps

Summary: large capsule syncs cause slow processing of dynflow tasks/steps

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Satellite
Classification:	Red Hat
Component:	Capsule - Content
Sub Component:
Version:	6.7.0
Hardware:	All
OS:	All
Priority:	high
Severity:	high
Target Milestone:	6.8.0
Assignee:	Justin Sherrill
QA Contact:	Vladimír Sedmík
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-07-01 21:35 UTC by Waldirio M Pinheiro
Modified:	2024-12-20 19:08 UTC (History)
CC List:	18 users (show)
Fixed In Version:	rubygem-katello-3.16.0-0.16.rc4.1
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1857359 (view as bug list)
Environment:
Last Closed:	2020-10-27 13:03:46 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
HOTFIX RPM for Satellite 6.7.1 (10.86 MB, application/x-rpm) 2020-07-07 13:22 UTC, wclark	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Foreman Issue Tracker	30286	0	Normal	Closed	large capsule syncs cause slow processing of dynflow tasks/steps	2021-02-15 23:06:13 UTC
Red Hat Product Errata	RHSA-2020:4366	0	None	None	None	2020-10-27 13:04:03 UTC

Description Waldirio M Pinheiro 2020-07-01 21:35:46 UTC

Description of problem:
After upgrade to Satellite 6.7, a lot of issues related to sync spending a long time to finish and dynflow consuming a lot of memory.

Version-Release number of selected component (if applicable):
6.7

How reproducible:
100%

Steps to Reproduce:
1. Sync a lot of repos
2. Still pushing the sync
3.

Actual results:
Dynflow consuming a lot and Capsule Sync task spending a lot of time

Expected results:
Be fast and with no fail

Additional info:

Comment 1 Justin Sherrill 2020-07-02 01:20:27 UTC

Created redmine issue https://projects.theforeman.org/issues/30286 from this bug

Comment 2 Francisco Peralta 2020-07-06 07:52:26 UTC

Dear Team,
 is there actually a workaround available for this issue? 

 My customer is also facing it and would like to understand what could be an ETA for a (hot)fix?

Thanks in advance,
 Cisco.

Comment 7 Bryan Kearney 2020-07-06 20:02:55 UTC

Moving this bug to POST for triage into Satellite since the upstream issue https://projects.theforeman.org/issues/30286 has been resolved.

Comment 9 wclark 2020-07-07 13:22:33 UTC

Created attachment 1700153 [details]
HOTFIX RPM for Satellite 6.7.1

Comment 10 wclark 2020-07-07 13:27:45 UTC

HOTFIX is attached. Please find installation instructions below:

1. Take a backup or snapshot of Satellite server

2. Download the Hotfix RPM and copy it to Satellite server

3. # yum install tfm-rubygem-katello-3.14.0.21-6.HOTFIXRHBZ1830403RHBZ1789911RHBZ1853076.el7sat.noarch.rpm --disableplugin=foreman-protector

4. # systemctl restart httpd dynflow

By default, the Hotfix will configure a batch size of 25 for Pulp sub-tasks during Capsule sync. The effect is that it will reduce the necessary amount of polling of Dynflow --> Pulp, reducing the load on both services as neither needs to track nor communicate with the other about 1000s of sub-tasks.

The batch size is also configurable so you may find a more optimal value for your deployment. To configure it, navigate to Administer --> Settings --> Content --> modify the parameter labeled "Batch size to sync repositories in."

Comment 13 Jan Hutař 2020-09-08 06:03:58 UTC

I'm very sorry Vláďo, I was not able to work on this :-/

Comment 14 Vladimír Sedmík 2020-09-14 10:51:03 UTC

To verify this BZ I was comparing two setups:
1) Satellite + Capsule 6.7.0 snap 20
2) Satellite + Capsule 6.8.0 snap 14

In each setup 6 repos (RHEL7Server, RHEL7Server-Optional, RHSCL for RHEL7, RHEL8-BaseOS, RHEL8-AppStream, test_simple_errata) were published into 40 content views each (240 content views in total) and were synchronized (Complete Sync) from Sat to Caps. I used the default batch size settings 'foreman_proxy_batch_size'=25 in case 2). Four hosts were registered and unregistered through the capsule.

Results:
-------------------------------------------------------------------
					6.7.0-20	6.8.0-14
-------------------------------------------------------------------
Overall sync time [hh:mm:ss]		28:44:45	26:54:33
Host registration time			13 sec		11 sec
Host unregistration time		2 s		2 s
Average errata enumeration time		163 s		19.5 s
Average CPU load during sync		1.88		1.89
Median CPU load during sync		0.87		0.22
REX command run time (hostnamectl)	27-44 s		4-8 s
-------------------------------------------------------------------

Conclusion: We can see huge improvement in the Errata enumeration time for new registered hosts (need to use workaround of BZ#1771921) and also REX times improved significantly. Overall sync time has improved slightly (by 1h50m) while the average CPU load remained almost the same. Mean CPU load was lower at 6.8 as higher peaks and longer valleys occurred during the sync.

I haven't noticed any large or fast-growing log files during or after the sync. The size of /var/log/foreman was 3.5MB and whole /var/log directory occupied ~1G of space on both instances after the sync.

Comment 17 errata-xmlrpc 2020-10-27 13:03:46 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Satellite 6.8 release), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:4366

Note You need to log in before you can comment on or make changes to this bug.