Bug 1554282
Summary: | [Performance improvement] On CV publish/promote, call Pulp::Repository::* dynflow steps concurrently | ||
---|---|---|---|
Product: | Red Hat Satellite | Reporter: | Pavel Moravec <pmoravec> |
Component: | Content Views | Assignee: | Samir Jha <sajha> |
Status: | CLOSED WONTFIX | QA Contact: | |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 6.3.0 | CC: | pmoravec |
Target Milestone: | Unspecified | Keywords: | EasyFix, Triaged |
Target Release: | Unused | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2018-05-23 14:19:37 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Pavel Moravec
2018-03-12 10:15:19 UTC
Script to test performance: 1) have a file repos-to-CV.txt with lines like: 1 4 1-2-3-4 813-814-815-816 (each line contains list of repository IDs separated by '-', that shall be added to the tested CV; worthto get the IDs via "hammer repository list" command) 2) Have a script (optionally customize first 2 lines): hmr="hammer -u admin -p redhat " CVBaseName="CV_${1}" for repoIds in $(cat repos-to-CV.txt); do for i in $(seq 1 10); do CVName=${CVBaseName}_${repoIds%-}_${i} echo "$(date): creating repo $CVName" $hmr content-view create --organization-id=1 --name=${CVName} for repoId in $(echo $repoIds | tr '-' '\n'); do $hmr content-view add-repository --organization-id=1 --name=${CVName} --repository-id=$repoId done echo "$(date): publishing repo $CVName" $hmr content-view publish --name=${CVName} --organization-id=1 echo "$(date): repo $CVName publish task: $($hmr task list --search "label = Actions::Katello::ContentView::Publish" --per-page=1 | grep admin)" $hmr content-view remove-version --name=${CVName} --content-view=${CVName} --content-view-version="1.0" --organization-id=1 $hmr content-view remove --name=${CVName} --organization-id=1 --environment-ids=1 $hmr content-view delete --name=${CVName} --organization-id=1 done done The script will: - create a CV - add there repos from one line from the repos-to-CV.txt file - publish the CV - print out task summary - delete the CV - repeat 10 times for the same CV content - repeat for each line of repos-to-CV.txt (i.e. for various tested CV content) in the output, grep for lines like: Sun Mar 11 19:52:00 CET 2018: repo CV_original_1-2-3-4_10 publish task: 1c9e858e-ec01-48c1-995c-e580a88c1778 | | admin | 2018/03/11 18:46:16 | 2018/03/11 18:51:59 | stopped | success | Publish | where publishing of CV with repo IDs 1,2,3 and 4 (10th such test) was run between 18:46:16 - 18:51:59. Stats from my testing (times of CV publish in seconds, original / improved-by-concurrency): - CV with "Sat6.3 tools" small repo: 22s --> 7s - CV with RHEL7 7Server RPMs repo: 330s --> 210s - CV with 4 repos (7Server, RHSCL, Caps6.3, Sat6.3 tools): 345s --> 241s - CV with 4 random medium-sized repos: 34s --> 22s Hi Pavel, Thanks for the bugzilla. Was this tested with the upcoming performance improvements (bug 1522912)? I am curious to see if those changes impact the current results (comment 3) and if they lead to the same or different recommendation. (In reply to Brad Buckingham from comment #4) > Hi Pavel, > > Thanks for the bugzilla. > > Was this tested with the upcoming performance improvements (bug 1522912)? I > am curious to see if those changes impact the current results (comment 3) > and if they lead to the same or different recommendation. Ah thanks! How I could forget that CV improvement :( With the HF applied, I still see significant improvement: > - CV with "Sat6.3 tools" small repo: 22s --> 7s now improved from 13s to 8s > - CV with RHEL7 7Server RPMs repo: 330s --> 210s now improved from 225s to 126s > - CV with 4 repos (7Server, RHSCL, Caps6.3, Sat6.3 tools): 345s --> 241s now improved from 234s to 115s > - CV with 4 random medium-sized repos: 34s --> 22s now improved from 22s to 17s Anyway, my patch isnt fully correct. Some steps I marked as concurrent must be done sequentially (but I can't identify which needs sequential processing) - I rarely got error during CV publish: Actions::Katello::Repository::IndexContent : ActiveRecord::RecordNotUnique PG::Error: ERROR: duplicate key value violates unique constraint "index_katello_repository_errata_on_erratum_id_and_repo_id" DETAIL: Key (erratum_id, repository_id)=(1, 1669) already exists. : INSERT INTO katello_repository_errata (erratum_id, repository_id, created_at, updated_at) VALUES (1, 1669, '2018-03-15 09:21:17', '2018-03-15 09:21:17') I think the proper way would be: sequence do: concurrence do: plan_step(Actions::Pulp::Repository::Copy*) plan_step(Actions::Katello::Repository::IndexContent) concurrence do: plan_step(Actions::Pulp::Repository::Purge*) plan_step(Actions::Katello::Repository::IndexErrata) plan_step(Actions::Katello::Repository::IndexPackageGroup) (i.e. we can run concurrently *only* pulp steps, and copy* before any purge*) Could some katello guy review this if the above is right (and original idea wrong)? Hi Pavel, Thanks for re-running the scenario with the current cv-shredder improvements included. That is really good news to hear that there may be some additional improvements possible. I do think it is worthwhile to investigate and see if there are opportunities perform some of the steps in parallel vs sequentially. I will add this to an upcoming sprint for investigation. (anyway I would be curious why my patch parallelizing Copy* helps (so much), if one of the improvements in the HF was "dont copy units when using yum_clone_distributor" - that improvement shall skip these steps, no? but I still see them :-o) Created redmine issue http://projects.theforeman.org/issues/23322 from this bug Hello Pavel, We have a PR (https://github.com/Katello/katello/pull/7343) with the changes to add concurrency where possible. We did not see enough improvement with the changes, mostly cause, pulp does most of the I/O work sequentially and that is where most of the time is consumed. Making some of those steps concurrent adds a layer of complexity which does not improve performance to justify adding the changes. Thanks, Samir Thanks for testing and sorry my idea was not as good as it originally looked like :) |