Bug 1669186
Summary: | Manifest upload task takes too much time | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Satellite | Reporter: | Roman Plevka <rplevka> | ||||
Component: | Subscription Management | Assignee: | Partha Aji <paji> | ||||
Status: | CLOSED ERRATA | QA Contact: | Roman Plevka <rplevka> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 6.5.0 | CC: | ehelms, jomitsch, jsherril, ktordeur, mjia, mmccune, nkathole, pmoravec, rplevka | ||||
Target Milestone: | 6.5.0 | Keywords: | Performance, PrioBumpQA, Regression, Triaged | ||||
Target Release: | Unused | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | tfm-rubygem-katello-3.10.0.24-1 | Doc Type: | If docs needed, set a value | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | |||||||
: | 1669216 1684703 1698513 (view as bug list) | Environment: | |||||
Last Closed: | 2019-05-14 12:39:53 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1669216 | ||||||
Attachments: |
|
Comment 4
Roman Plevka
2019-01-24 15:32:46 UTC
*** Bug 1669216 has been marked as a duplicate of this bug. *** All the below are on identical hardware, importing the manifest attached to this bug: * 6.3.5 Satellite: 1m36sec * 6.4.1 Satellite with external database: 2m49s * 6.5 SNAP 6: 0m33s * 6.5 SNAP 13: 3m53s definite gradual progression getting worse. The 0m33s almost looks like an anomoly and I wonder if we were skipping something critical but we should at least get it back into the same time levels as 6.3 in the 1.5minute range if we can. We have added steps to the manifest import such as storing the product content in the database https://github.com/Katello/katello/commit/946b2990c9a054babc9ad563e1b14c4161207b3e#diff-17c58a600ab07af9fd58f9c4ccb66be4. This is to provide better searching and avoid calls to candlepin later, it's my understanding this is a trade-off of manifest import time for better searching and less backend calls later This and other changes have been around since Katello 3.6, so it may help to explain the upward trend in manifest import time, but doesn't explain the recent change in 6.5 snaps. I'm also wondering if something had been missing or skipped in the previous 6.5 snaps, that is now functioning in the latest snaps. Roman, do we know for sure the time regressed in the 6.4 snap? I see per the following comment that the times were low in 6.5 snaps: https://bugzilla.redhat.com/show_bug.cgi?id=1655981#c5 But having data on the 6.4 snaps and releases would help to tell us more, from mike's testing it seems the manifest took ~3 mins in 6.4, so there doesn't seem to be a change. Created redmine issue https://projects.theforeman.org/issues/25981 from this bug Moving this bug to POST for triage into Satellite 6 since the upstream issue https://projects.theforeman.org/issues/25981 has been resolved. This bug was cloned and is still going to be included in the 6.4.3 release. It no longer has the sat-6.4.z+ flag and 6.4.3 Target Milestone Set which are now on the 6.4.z cloned bug. Please see the Clones field to track the progress of this bug in the 6.4.3 release. Hey Roman, There are several steps to the manifest import process, and i didn't compare the overall time (including candlepin importing), but if you look at the Actions::Candlepin::Owner::ImportProducts step within the dynflow console of the manifest import foreman task, it was taking a large portion of the overall time. That particular step should see a speed up of 70-90%. Jonathan did some over all timings and you can find them here: https://github.com/Katello/katello/pull/7963#issuecomment-461497814 manifest import prior (clean environment): 350 seconds manifest refresh prior: 90 seconds manifest refresh with change: 43 seconds manifest import with change (existing content): 36 seconds manifest import with change (no content): 50 seconds Hello Justin, so I tried to verify this with the same test that produced the earlier timings and the new timings look really great! 21.97s call manifest_upload.py::test_manifest_upload[1/10] 15.51s call manifest_upload.py::test_manifest_upload[2/10] 10.38s call manifest_upload.py::test_manifest_upload[3/10] 10.57s call manifest_upload.py::test_manifest_upload[4/10] 15.53s call manifest_upload.py::test_manifest_upload[5/10] 12.35s call manifest_upload.py::test_manifest_upload[6/10] 10.55s call manifest_upload.py::test_manifest_upload[7/10] 10.51s call manifest_upload.py::test_manifest_upload[8/10] 10.48s call manifest_upload.py::test_manifest_upload[9/10] 10.41s call manifest_upload.py::test_manifest_upload[10/10] However, the last iteration errored out with the following message: PG::UniqueViolation: ERROR: duplicate key value violates unique constraint "index_katello_pools_on_cp_id" DETAIL: Key (cp_id)=(4028f9536995fe480169960bde580a6c) already exists. : INSERT INTO "katello_pools" ("cp_id", "created_at", "updated_at", "organization_id") VALUES ($1, $2, $3, $4) RETURNING "id" is it possible that this might be related to the changes you made? - i tried to run the test (in batches of 10) several times, and this errors occurs in almost every batch. The operations are carried out synchronously. Hey Roman, That error shouldn't be related to this, as this converted products and product content to use bulk importing, but the error is related to 'pools'. I'm happy to take a closer look though, do you have a traceback? Its possible that my change caused the error to occur because a step that was taking a long time now isn't, which exposed an existing race condition. The more I think about it, I would put my money on that. I think the fix should be fairly simple, if you want to handle it as part of this bz. Justin I see the bz [1] for same error. Thanks. [1] https://bugzilla.redhat.com/show_bug.cgi?id=1686604 Created attachment 1545740 [details]
traceback
attaching the requested traceback
If anyone is open to testing the upstream pr: https://github.com/Katello/katello/pull/8034 it would be very helpful, its hard to reproduce as a race condition (In reply to Justin Sherrill from comment #27) > If anyone is open to testing the upstream pr: > https://github.com/Katello/katello/pull/8034 > > it would be very helpful, its hard to reproduce as a race condition Thanks for a quick fix, however, 3 retries might not be enough: https://github.com/Katello/katello/pull/8034#issuecomment-474775055 Justin, will you include the race condition fix mentioned in Comment #27, so I can properly test and Verify this BZ? Hi Roman, I am clearing the needinfo for Justin and placing this back to ON_QA. The fix that was referenced in comment 27 was associated with bug 1686604. That bug was delivered in snap 22 and since been verified. Thank you for confirmation. with the commit in place we are now good to put this to VERIFIED *** Bug 1698513 has been marked as a duplicate of this bug. *** Connecting redmine issue https://projects.theforeman.org/issues/26733 from this bug Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2019:1222 |