1669186 – Manifest upload task takes too much time

Red Hat Satellite engineering is moving the tracking of its product development work on Satellite to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "Satellite project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs will be migrated starting at the end of May. If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "Satellite project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/SAT-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1669186 - Manifest upload task takes too much time

Summary: Manifest upload task takes too much time

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Satellite
Classification:	Red Hat
Component:	Subscription Management
Sub Component:
Version:	6.5.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium with 1 vote
Target Milestone:	6.5.0
Assignee:	Partha Aji
QA Contact:	Roman Plevka
Docs Contact:
URL:
Whiteboard:
Duplicates (2):	1669216 1698513 (view as bug list)
Depends On:
Blocks:	1669216
TreeView+	depends on / blocked

Reported:	2019-01-24 14:53 UTC by Roman Plevka
Modified:	2023-03-24 14:31 UTC (History)
CC List:	9 users (show)
Fixed In Version:	tfm-rubygem-katello-3.10.0.24-1
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1669216 1684703 1698513 (view as bug list)
Environment:
Last Closed:	2019-05-14 12:39:53 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
traceback (23.43 KB, text/plain) 2019-03-19 16:00 UTC, Roman Plevka	no flags	Details
View All

Links
System	ID	Priority	Status	Summary	Last Updated
Foreman Issue Tracker	25981	Normal	Closed	Manifest upload task takes too much time	2021-02-17 22:48:06 UTC
Foreman Issue Tracker	26733	Normal	Closed	Foreign-key violation on upgrade - AddContentConstraints	2021-02-17 22:48:06 UTC
Red Hat Product Errata	RHSA-2019:1222	None	None	None	2019-05-14 12:40:02 UTC

Comment 4 Roman Plevka 2019-01-24 15:32:46 UTC

here are some timings for 10 consecutive manifest imports:

209.78s manifest_upload.py::test_manifest_upload[1/10]
213.80s manifest_upload.py::test_manifest_upload[2/10]
224.00s manifest_upload.py::test_manifest_upload[3/10]
228.95s manifest_upload.py::test_manifest_upload[4/10]
229.30s manifest_upload.py::test_manifest_upload[5/10]
219.14s manifest_upload.py::test_manifest_upload[6/10]
226.37s manifest_upload.py::test_manifest_upload[7/10]
229.08s manifest_upload.py::test_manifest_upload[8/10]
228.91s manifest_upload.py::test_manifest_upload[9/10]
234.40s manifest_upload.py::test_manifest_upload[10/10]

Comment 6 Brad Buckingham 2019-01-25 16:13:30 UTC

*** Bug 1669216 has been marked as a duplicate of this bug. ***

Comment 11 Mike McCune 2019-01-31 17:53:25 UTC

All the below are on identical hardware, importing the manifest attached to this bug:

* 6.3.5 Satellite: 1m36sec

* 6.4.1 Satellite with external database: 2m49s

* 6.5 SNAP 6: 0m33s

* 6.5 SNAP 13: 3m53s

definite gradual progression getting worse. The 0m33s almost looks like an anomoly and I wonder if we were skipping something critical but we should at least get it back into the same time levels as 6.3 in the 1.5minute range if we can.

Comment 12 John Mitsch 2019-01-31 18:05:19 UTC

We have added steps to the manifest import such as storing the product content in the database https://github.com/Katello/katello/commit/946b2990c9a054babc9ad563e1b14c4161207b3e#diff-17c58a600ab07af9fd58f9c4ccb66be4. This is to provide better searching and avoid calls to candlepin later, it's my understanding this is a trade-off of manifest import time for better searching and less backend calls later

This and other changes have been around since Katello 3.6, so it may help to explain the upward trend in manifest import time, but doesn't explain the recent change in 6.5 snaps. I'm also wondering if something had been missing or skipped in the previous 6.5 snaps, that is now functioning in the latest snaps.

Roman, do we know for sure the time regressed in the 6.4 snap? I see per the following comment that the times were low in 6.5 snaps: https://bugzilla.redhat.com/show_bug.cgi?id=1655981#c5

But having data on the 6.4 snaps and releases would help to tell us more, from mike's testing it seems the manifest took ~3 mins in 6.4, so there doesn't seem to be a change.

Comment 16 Justin Sherrill 2019-02-05 03:03:09 UTC

Created redmine issue https://projects.theforeman.org/issues/25981 from this bug

Comment 18 Bryan Kearney 2019-02-08 05:03:49 UTC

Moving this bug to POST for triage into Satellite 6 since the upstream issue https://projects.theforeman.org/issues/25981 has been resolved.

Comment 20 Mike McCune 2019-03-01 22:16:34 UTC

This bug was cloned and is still going to be included in the 6.4.3 release. It no longer has the sat-6.4.z+ flag and 6.4.3 Target Milestone Set which are now on the 6.4.z cloned bug. Please see the Clones field to track the progress of this bug in the 6.4.3 release.

Comment 22 Justin Sherrill 2019-03-06 15:12:40 UTC

Hey Roman, 

There are several steps to the manifest import process, and i didn't compare the overall time (including candlepin importing), but if you look at the  Actions::Candlepin::Owner::ImportProducts step within the dynflow console of the manifest import foreman task, it was taking a large portion of the overall time.  That particular step should see a speed up of 70-90%.

Jonathan did some over all timings and you can find them here:  https://github.com/Katello/katello/pull/7963#issuecomment-461497814



manifest import prior (clean environment): 350 seconds
manifest refresh prior: 90 seconds

manifest refresh with change: 43 seconds
manifest import with change (existing content): 36 seconds
manifest import with change (no content): 50 seconds

Comment 23 Roman Plevka 2019-03-19 13:56:43 UTC

Hello Justin,
so I tried to verify this with the same test that produced the earlier timings and the new timings look really great!

21.97s call     manifest_upload.py::test_manifest_upload[1/10]
15.51s call     manifest_upload.py::test_manifest_upload[2/10]
10.38s call     manifest_upload.py::test_manifest_upload[3/10]
10.57s call     manifest_upload.py::test_manifest_upload[4/10]
15.53s call     manifest_upload.py::test_manifest_upload[5/10]
12.35s call     manifest_upload.py::test_manifest_upload[6/10]
10.55s call     manifest_upload.py::test_manifest_upload[7/10]
10.51s call     manifest_upload.py::test_manifest_upload[8/10]
10.48s call     manifest_upload.py::test_manifest_upload[9/10]
10.41s call     manifest_upload.py::test_manifest_upload[10/10]

However, the last iteration errored out with the following message:

PG::UniqueViolation: ERROR:  duplicate key value violates unique constraint "index_katello_pools_on_cp_id"
DETAIL:  Key (cp_id)=(4028f9536995fe480169960bde580a6c) already exists.
: INSERT INTO "katello_pools" ("cp_id", "created_at", "updated_at", "organization_id") VALUES ($1, $2, $3, $4) RETURNING "id"

is it possible that this might be related to the changes you made?

- i tried to run the test (in batches of 10) several times, and this errors occurs in almost every batch. The operations are carried out synchronously.

Comment 24 Justin Sherrill 2019-03-19 14:05:30 UTC

Hey Roman, 

That error shouldn't be related to this, as this converted products and product content to use bulk importing, but the error is related to 'pools'.  I'm happy to take a closer look though, do you have a traceback?

Its possible that my change caused the error to occur because a step that was taking a long time now isn't, which exposed an existing race condition.  The more I think about it, I would put my money on that.  I think the fix should be fairly simple, if you want to handle it as part of this bz.

Justin

Comment 25 Nikhil Kathole 2019-03-19 14:10:52 UTC

I see the bz [1] for same error.

Thanks.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1686604

Comment 26 Roman Plevka 2019-03-19 16:00:16 UTC

Created attachment 1545740 [details]
traceback

attaching the requested traceback

Comment 27 Justin Sherrill 2019-03-19 18:02:34 UTC

If anyone is open to testing the upstream pr: https://github.com/Katello/katello/pull/8034

it would be very helpful,  its hard to reproduce as a race condition

Comment 28 Roman Plevka 2019-03-20 10:38:50 UTC

(In reply to Justin Sherrill from comment #27)
> If anyone is open to testing the upstream pr:
> https://github.com/Katello/katello/pull/8034
> 
> it would be very helpful,  its hard to reproduce as a race condition

Thanks for a quick fix, however, 3 retries might not be enough:
https://github.com/Katello/katello/pull/8034#issuecomment-474775055

Comment 29 Roman Plevka 2019-04-05 06:39:56 UTC

Justin,
will you include the race condition fix mentioned in Comment #27, so I can properly test and Verify this BZ?

Comment 31 Brad Buckingham 2019-04-05 18:00:48 UTC

Hi Roman,

I am clearing the needinfo for Justin and placing this back to ON_QA.

The fix that was referenced in comment 27 was associated with bug 1686604.  That bug was delivered in snap 22 and since been verified.

Comment 32 Roman Plevka 2019-04-08 12:07:37 UTC

Thank you for confirmation.

with the commit in place we are now good to put this to VERIFIED

Comment 33 Mike McCune 2019-04-23 15:46:03 UTC

*** Bug 1698513 has been marked as a duplicate of this bug. ***

Comment 34 Partha Aji 2019-05-02 13:58:12 UTC

Connecting redmine issue https://projects.theforeman.org/issues/26733 from this bug

Comment 37 errata-xmlrpc 2019-05-14 12:39:53 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2019:1222

Note You need to log in before you can comment on or make changes to this bug.