Bug 1554282

Summary:	[Performance improvement] On CV publish/promote, call Pulp::Repository::* dynflow steps concurrently
Product:	Red Hat Satellite	Reporter:	Pavel Moravec <pmoravec>
Component:	Content Views	Assignee:	Samir Jha <sajha>
Status:	CLOSED WONTFIX	QA Contact:
Severity:	medium	Docs Contact:
Priority:	medium
Version:	6.3.0	CC:	pmoravec
Target Milestone:	Unspecified	Keywords:	EasyFix, Triaged
Target Release:	Unused
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2018-05-23 14:19:37 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Pavel Moravec 2018-03-12 10:15:19 UTC

Description of problem:
When publishing a CV or promoting it to a LE, a content of a pulp repo (or two repos, sequentially) is copied to another one. That is achieved via Pulp::Repository::* dynflow steps that each copies a different type of content units (rpm, errata, srpm,..).

Currently, these dynflow steps are run sequentially, so the call flow is:
- trigger Pulp::Repository::CopySrpm
- wait until it completes - by periodical checking of the pulp task status
- after it completes, trigger Pulp::Repository::CopyRpm
- wait until it completes
- trigger ..
..

These dynflow steps can be executed concurrently - pulp engineering confirmed me this is safe to trigger those pulp tasks concurrently. Internally in pulp, they will be performed sequentially either way, but simply we dont require the extra "locking" on dynflow steps level. Since the waiting for task completion adds redundant delay too many times.

Preliminary tests showed observable-to-significant improvement in CV publish time, but these will definitely vary on many factors (i.e. # and size of repos in the CV, # of pulp workers, other dynflow tasks etc.).

Version-Release number of selected component (if applicable):
Sat 6.3.0

How reproducible:
100%

Steps to Reproduce:
1. Publish or promote a CV with yum repos.
2. Count the time it took

Actual results:
2. is slower when https://github.com/Katello/katello/blob/master/app/lib/actions/katello/repository/clone_yum_content.rb#L22 is "sequence do"

Expected results:
2. is faster if https://github.com/Katello/katello/blob/master/app/lib/actions/katello/repository/clone_yum_content.rb#L22 is "concurrence do"

Additional info:
(*) Not sure if the above change improves also CV promote (it should, havent tested). It doesnt improve incremental update where my proposal isnt applicable.

(*) I will provide a script to test performance gain

Comment 2 Pavel Moravec 2018-03-12 10:29:04 UTC

Script to test performance:

1) have a file repos-to-CV.txt with lines like:

1
4
1-2-3-4
813-814-815-816

(each line contains list of repository IDs separated by '-', that shall be added to the tested CV; worthto get the IDs via "hammer repository list" command)

2) Have a script (optionally customize first 2 lines):


hmr="hammer -u admin -p redhat "
CVBaseName="CV_${1}"

for repoIds in $(cat repos-to-CV.txt); do
	for i in $(seq 1 10); do
		CVName=${CVBaseName}_${repoIds%-}_${i}
		echo "$(date): creating repo $CVName"
		$hmr content-view create --organization-id=1 --name=${CVName}
		for repoId in $(echo $repoIds | tr '-' '\n'); do
			$hmr content-view add-repository --organization-id=1 --name=${CVName} --repository-id=$repoId
		done
		echo "$(date): publishing repo $CVName"
		$hmr content-view publish --name=${CVName} --organization-id=1
		echo "$(date): repo $CVName publish task: $($hmr task list --search "label = Actions::Katello::ContentView::Publish" --per-page=1 | grep admin)"
		$hmr content-view remove-version --name=${CVName} --content-view=${CVName} --content-view-version="1.0" --organization-id=1
		$hmr content-view remove --name=${CVName} --organization-id=1 --environment-ids=1
		$hmr content-view delete --name=${CVName} --organization-id=1
	done
done

The script will:
- create a CV
- add there repos from one line from the repos-to-CV.txt file
- publish the CV
- print out task summary
- delete the CV
- repeat 10 times for the same CV content
- repeat for each line of repos-to-CV.txt (i.e. for various tested CV content)

in the output, grep for lines like:

Sun Mar 11 19:52:00 CET 2018: repo CV_original_1-2-3-4_10 publish task: 1c9e858e-ec01-48c1-995c-e580a88c1778 |      | admin | 2018/03/11 18:46:16 | 2018/03/11 18:51:59 | stopped | success | Publish     |    

where publishing of CV with repo IDs 1,2,3 and 4 (10th such test) was run between 18:46:16 - 18:51:59.

Comment 3 Pavel Moravec 2018-03-12 15:39:34 UTC

Stats from my testing (times of CV publish in seconds, original / improved-by-concurrency):

- CV with "Sat6.3 tools" small repo: 22s --> 7s 
- CV with RHEL7 7Server RPMs repo: 330s --> 210s
- CV with 4 repos (7Server, RHSCL, Caps6.3, Sat6.3 tools): 345s --> 241s
- CV with 4 random medium-sized repos: 34s --> 22s

Comment 4 Brad Buckingham 2018-03-14 20:25:58 UTC

Hi Pavel,

Thanks for the bugzilla.

Was this tested with the upcoming performance improvements (bug 1522912)?  I am curious to see if those changes impact the current results (comment 3) and if they lead to the same or different recommendation.

Comment 5 Pavel Moravec 2018-03-15 10:58:17 UTC

(In reply to Brad Buckingham from comment #4)
> Hi Pavel,
> 
> Thanks for the bugzilla.
> 
> Was this tested with the upcoming performance improvements (bug 1522912)?  I
> am curious to see if those changes impact the current results (comment 3)
> and if they lead to the same or different recommendation.

Ah thanks! How I could forget that CV improvement :(

With the HF applied, I still see significant improvement:

> - CV with "Sat6.3 tools" small repo: 22s --> 7s 
now improved from 13s to 8s

> - CV with RHEL7 7Server RPMs repo: 330s --> 210s
now improved from 225s to 126s

> - CV with 4 repos (7Server, RHSCL, Caps6.3, Sat6.3 tools): 345s --> 241s
now improved from 234s to 115s

> - CV with 4 random medium-sized repos: 34s --> 22s
now improved from 22s to 17s

Anyway, my patch isnt fully correct. Some steps I marked as concurrent must be done sequentially (but I can't identify which needs sequential processing) - I rarely got error during CV publish:

Actions::Katello::Repository::IndexContent :

 ActiveRecord::RecordNotUnique

PG::Error: ERROR: duplicate key value violates unique constraint "index_katello_repository_errata_on_erratum_id_and_repo_id" DETAIL: Key (erratum_id, repository_id)=(1, 1669) already exists. : INSERT INTO katello_repository_errata (erratum_id, repository_id, created_at, updated_at) VALUES (1, 1669, '2018-03-15 09:21:17', '2018-03-15 09:21:17') 

I think the proper way would be:

sequence do:
  concurrence do:
    plan_step(Actions::Pulp::Repository::Copy*)
  plan_step(Actions::Katello::Repository::IndexContent)
  concurrence do:
    plan_step(Actions::Pulp::Repository::Purge*)
  plan_step(Actions::Katello::Repository::IndexErrata)
  plan_step(Actions::Katello::Repository::IndexPackageGroup)

(i.e. we can run concurrently *only* pulp steps, and copy* before any purge*)

Could some katello guy review this if the above is right (and original idea wrong)?

Comment 6 Brad Buckingham 2018-03-15 11:07:32 UTC

Hi Pavel,

Thanks for re-running the scenario with the current cv-shredder improvements included.  That is really good news to hear that there may be some additional improvements possible.

I do think it is worthwhile to investigate and see if there are opportunities perform some of the steps in parallel vs sequentially.  I will add this to an upcoming sprint for investigation.

Comment 7 Pavel Moravec 2018-03-15 11:18:10 UTC

(anyway I would be curious why my patch parallelizing Copy* helps (so much), if one of the improvements in the HF was "dont copy units when using yum_clone_distributor" - that improvement shall skip these steps, no? but I still see them :-o)

Comment 8 Justin Sherrill 2018-04-19 01:03:05 UTC

Created redmine issue http://projects.theforeman.org/issues/23322 from this bug

Comment 9 Samir Jha 2018-05-23 14:19:37 UTC

Hello Pavel,

We have a PR (https://github.com/Katello/katello/pull/7343) with the changes to add concurrency where possible.

We did not see enough improvement with the changes, mostly cause, pulp does most of the I/O work sequentially and that is where most of the time is consumed. Making some of those steps concurrent adds a layer of complexity which does not improve performance to justify adding the changes.

Thanks,
Samir

Comment 10 Pavel Moravec 2018-05-25 06:03:43 UTC

Thanks for testing and sorry my idea was not as good as it originally looked like :)