1554282 – [Performance improvement] On CV publish/promote, call Pulp::Repository::* dynflow steps concurrently

Red Hat Satellite engineering is moving the tracking of its product development work on Satellite to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "Satellite project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs will be migrated starting at the end of May. If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "Satellite project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/SAT-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1554282 - [Performance improvement] On CV publish/promote, call Pulp::Repository::* dynflow steps concurrently

Summary: [Performance improvement] On CV publish/promote, call Pulp::Repository::* dyn...

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Satellite
Classification:	Red Hat
Component:	Content Views
Sub Component:
Version:	6.3.0
Hardware:	x86_64
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	Unspecified
Assignee:	Samir Jha
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-03-12 10:15 UTC by Pavel Moravec
Modified:	2018-05-25 06:03 UTC (History)
CC List:	1 user (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-05-23 14:19:37 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Foreman Issue Tracker	23322	0	None	None	None	2018-04-19 01:03:06 UTC

Description Pavel Moravec 2018-03-12 10:15:19 UTC

Description of problem:
When publishing a CV or promoting it to a LE, a content of a pulp repo (or two repos, sequentially) is copied to another one. That is achieved via Pulp::Repository::* dynflow steps that each copies a different type of content units (rpm, errata, srpm,..).

Currently, these dynflow steps are run sequentially, so the call flow is:
- trigger Pulp::Repository::CopySrpm
- wait until it completes - by periodical checking of the pulp task status
- after it completes, trigger Pulp::Repository::CopyRpm
- wait until it completes
- trigger ..
..

These dynflow steps can be executed concurrently - pulp engineering confirmed me this is safe to trigger those pulp tasks concurrently. Internally in pulp, they will be performed sequentially either way, but simply we dont require the extra "locking" on dynflow steps level. Since the waiting for task completion adds redundant delay too many times.

Preliminary tests showed observable-to-significant improvement in CV publish time, but these will definitely vary on many factors (i.e. # and size of repos in the CV, # of pulp workers, other dynflow tasks etc.).

Version-Release number of selected component (if applicable):
Sat 6.3.0

How reproducible:
100%

Steps to Reproduce:
1. Publish or promote a CV with yum repos.
2. Count the time it took

Actual results:
2. is slower when https://github.com/Katello/katello/blob/master/app/lib/actions/katello/repository/clone_yum_content.rb#L22 is "sequence do"

Expected results:
2. is faster if https://github.com/Katello/katello/blob/master/app/lib/actions/katello/repository/clone_yum_content.rb#L22 is "concurrence do"

Additional info:
(*) Not sure if the above change improves also CV promote (it should, havent tested). It doesnt improve incremental update where my proposal isnt applicable.

(*) I will provide a script to test performance gain

Comment 2 Pavel Moravec 2018-03-12 10:29:04 UTC

Script to test performance:

1) have a file repos-to-CV.txt with lines like:

1
4
1-2-3-4
813-814-815-816

(each line contains list of repository IDs separated by '-', that shall be added to the tested CV; worthto get the IDs via "hammer repository list" command)

2) Have a script (optionally customize first 2 lines):


hmr="hammer -u admin -p redhat "
CVBaseName="CV_${1}"

for repoIds in $(cat repos-to-CV.txt); do
	for i in $(seq 1 10); do
		CVName=${CVBaseName}_${repoIds%-}_${i}
		echo "$(date): creating repo $CVName"
		$hmr content-view create --organization-id=1 --name=${CVName}
		for repoId in $(echo $repoIds | tr '-' '\n'); do
			$hmr content-view add-repository --organization-id=1 --name=${CVName} --repository-id=$repoId
		done
		echo "$(date): publishing repo $CVName"
		$hmr content-view publish --name=${CVName} --organization-id=1
		echo "$(date): repo $CVName publish task: $($hmr task list --search "label = Actions::Katello::ContentView::Publish" --per-page=1 | grep admin)"
		$hmr content-view remove-version --name=${CVName} --content-view=${CVName} --content-view-version="1.0" --organization-id=1
		$hmr content-view remove --name=${CVName} --organization-id=1 --environment-ids=1
		$hmr content-view delete --name=${CVName} --organization-id=1
	done
done

The script will:
- create a CV
- add there repos from one line from the repos-to-CV.txt file
- publish the CV
- print out task summary
- delete the CV
- repeat 10 times for the same CV content
- repeat for each line of repos-to-CV.txt (i.e. for various tested CV content)

in the output, grep for lines like:

Sun Mar 11 19:52:00 CET 2018: repo CV_original_1-2-3-4_10 publish task: 1c9e858e-ec01-48c1-995c-e580a88c1778 |      | admin | 2018/03/11 18:46:16 | 2018/03/11 18:51:59 | stopped | success | Publish     |    

where publishing of CV with repo IDs 1,2,3 and 4 (10th such test) was run between 18:46:16 - 18:51:59.

Comment 3 Pavel Moravec 2018-03-12 15:39:34 UTC

Stats from my testing (times of CV publish in seconds, original / improved-by-concurrency):

- CV with "Sat6.3 tools" small repo: 22s --> 7s 
- CV with RHEL7 7Server RPMs repo: 330s --> 210s
- CV with 4 repos (7Server, RHSCL, Caps6.3, Sat6.3 tools): 345s --> 241s
- CV with 4 random medium-sized repos: 34s --> 22s

Comment 4 Brad Buckingham 2018-03-14 20:25:58 UTC

Hi Pavel,

Thanks for the bugzilla.

Was this tested with the upcoming performance improvements (bug 1522912)?  I am curious to see if those changes impact the current results (comment 3) and if they lead to the same or different recommendation.

Comment 5 Pavel Moravec 2018-03-15 10:58:17 UTC

(In reply to Brad Buckingham from comment #4)
> Hi Pavel,
> 
> Thanks for the bugzilla.
> 
> Was this tested with the upcoming performance improvements (bug 1522912)?  I
> am curious to see if those changes impact the current results (comment 3)
> and if they lead to the same or different recommendation.

Ah thanks! How I could forget that CV improvement :(

With the HF applied, I still see significant improvement:

> - CV with "Sat6.3 tools" small repo: 22s --> 7s 
now improved from 13s to 8s

> - CV with RHEL7 7Server RPMs repo: 330s --> 210s
now improved from 225s to 126s

> - CV with 4 repos (7Server, RHSCL, Caps6.3, Sat6.3 tools): 345s --> 241s
now improved from 234s to 115s

> - CV with 4 random medium-sized repos: 34s --> 22s
now improved from 22s to 17s

Anyway, my patch isnt fully correct. Some steps I marked as concurrent must be done sequentially (but I can't identify which needs sequential processing) - I rarely got error during CV publish:

Actions::Katello::Repository::IndexContent :

 ActiveRecord::RecordNotUnique

PG::Error: ERROR: duplicate key value violates unique constraint "index_katello_repository_errata_on_erratum_id_and_repo_id" DETAIL: Key (erratum_id, repository_id)=(1, 1669) already exists. : INSERT INTO katello_repository_errata (erratum_id, repository_id, created_at, updated_at) VALUES (1, 1669, '2018-03-15 09:21:17', '2018-03-15 09:21:17') 

I think the proper way would be:

sequence do:
  concurrence do:
    plan_step(Actions::Pulp::Repository::Copy*)
  plan_step(Actions::Katello::Repository::IndexContent)
  concurrence do:
    plan_step(Actions::Pulp::Repository::Purge*)
  plan_step(Actions::Katello::Repository::IndexErrata)
  plan_step(Actions::Katello::Repository::IndexPackageGroup)

(i.e. we can run concurrently *only* pulp steps, and copy* before any purge*)

Could some katello guy review this if the above is right (and original idea wrong)?

Comment 6 Brad Buckingham 2018-03-15 11:07:32 UTC

Hi Pavel,

Thanks for re-running the scenario with the current cv-shredder improvements included.  That is really good news to hear that there may be some additional improvements possible.

I do think it is worthwhile to investigate and see if there are opportunities perform some of the steps in parallel vs sequentially.  I will add this to an upcoming sprint for investigation.

Comment 7 Pavel Moravec 2018-03-15 11:18:10 UTC

(anyway I would be curious why my patch parallelizing Copy* helps (so much), if one of the improvements in the HF was "dont copy units when using yum_clone_distributor" - that improvement shall skip these steps, no? but I still see them :-o)

Comment 8 Justin Sherrill 2018-04-19 01:03:05 UTC

Created redmine issue http://projects.theforeman.org/issues/23322 from this bug

Comment 9 Samir Jha 2018-05-23 14:19:37 UTC

Hello Pavel,

We have a PR (https://github.com/Katello/katello/pull/7343) with the changes to add concurrency where possible.

We did not see enough improvement with the changes, mostly cause, pulp does most of the I/O work sequentially and that is where most of the time is consumed. Making some of those steps concurrent adds a layer of complexity which does not improve performance to justify adding the changes.

Thanks,
Samir

Comment 10 Pavel Moravec 2018-05-25 06:03:43 UTC

Thanks for testing and sorry my idea was not as good as it originally looked like :)

Note You need to log in before you can comment on or make changes to this bug.