Bug 1669186

Summary:

Manifest upload task takes too much time

Product:

Red Hat Satellite

Reporter:

Roman Plevka <rplevka>

Component:

Subscription Management

Assignee:

Partha Aji <paji>

Status:

CLOSED ERRATA

QA Contact:

Roman Plevka <rplevka>

Severity:

medium

Docs Contact:

Priority:

unspecified

Version:

6.5.0

CC:

ehelms, jomitsch, jsherril, ktordeur, mjia, mmccune, nkathole, pmoravec, rplevka

Target Milestone:

6.5.0

Keywords:

Performance, PrioBumpQA, Regression, Triaged

Target Release:

Unused

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

tfm-rubygem-katello-3.10.0.24-1

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Clones:

1669216 1684703 1698513 (view as bug list)

Environment:

Last Closed:

2019-05-14 12:39:53 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

1669216

Attachments:

Description	Flags
traceback	none

Comment 4 Roman Plevka 2019-01-24 15:32:46 UTC

here are some timings for 10 consecutive manifest imports:

209.78s manifest_upload.py::test_manifest_upload[1/10]
213.80s manifest_upload.py::test_manifest_upload[2/10]
224.00s manifest_upload.py::test_manifest_upload[3/10]
228.95s manifest_upload.py::test_manifest_upload[4/10]
229.30s manifest_upload.py::test_manifest_upload[5/10]
219.14s manifest_upload.py::test_manifest_upload[6/10]
226.37s manifest_upload.py::test_manifest_upload[7/10]
229.08s manifest_upload.py::test_manifest_upload[8/10]
228.91s manifest_upload.py::test_manifest_upload[9/10]
234.40s manifest_upload.py::test_manifest_upload[10/10]

Comment 6 Brad Buckingham 2019-01-25 16:13:30 UTC

*** Bug 1669216 has been marked as a duplicate of this bug. ***

Comment 11 Mike McCune 2019-01-31 17:53:25 UTC

All the below are on identical hardware, importing the manifest attached to this bug:

* 6.3.5 Satellite: 1m36sec

* 6.4.1 Satellite with external database: 2m49s

* 6.5 SNAP 6: 0m33s

* 6.5 SNAP 13: 3m53s

definite gradual progression getting worse. The 0m33s almost looks like an anomoly and I wonder if we were skipping something critical but we should at least get it back into the same time levels as 6.3 in the 1.5minute range if we can.

Comment 12 John Mitsch 2019-01-31 18:05:19 UTC

We have added steps to the manifest import such as storing the product content in the database https://github.com/Katello/katello/commit/946b2990c9a054babc9ad563e1b14c4161207b3e#diff-17c58a600ab07af9fd58f9c4ccb66be4. This is to provide better searching and avoid calls to candlepin later, it's my understanding this is a trade-off of manifest import time for better searching and less backend calls later

This and other changes have been around since Katello 3.6, so it may help to explain the upward trend in manifest import time, but doesn't explain the recent change in 6.5 snaps. I'm also wondering if something had been missing or skipped in the previous 6.5 snaps, that is now functioning in the latest snaps.

Roman, do we know for sure the time regressed in the 6.4 snap? I see per the following comment that the times were low in 6.5 snaps: https://bugzilla.redhat.com/show_bug.cgi?id=1655981#c5

But having data on the 6.4 snaps and releases would help to tell us more, from mike's testing it seems the manifest took ~3 mins in 6.4, so there doesn't seem to be a change.

Comment 16 Justin Sherrill 2019-02-05 03:03:09 UTC

Created redmine issue https://projects.theforeman.org/issues/25981 from this bug

Comment 18 Bryan Kearney 2019-02-08 05:03:49 UTC

Moving this bug to POST for triage into Satellite 6 since the upstream issue https://projects.theforeman.org/issues/25981 has been resolved.

Comment 20 Mike McCune 2019-03-01 22:16:34 UTC

This bug was cloned and is still going to be included in the 6.4.3 release. It no longer has the sat-6.4.z+ flag and 6.4.3 Target Milestone Set which are now on the 6.4.z cloned bug. Please see the Clones field to track the progress of this bug in the 6.4.3 release.

Comment 22 Justin Sherrill 2019-03-06 15:12:40 UTC

Hey Roman, 

There are several steps to the manifest import process, and i didn't compare the overall time (including candlepin importing), but if you look at the  Actions::Candlepin::Owner::ImportProducts step within the dynflow console of the manifest import foreman task, it was taking a large portion of the overall time.  That particular step should see a speed up of 70-90%.

Jonathan did some over all timings and you can find them here:  https://github.com/Katello/katello/pull/7963#issuecomment-461497814



manifest import prior (clean environment): 350 seconds
manifest refresh prior: 90 seconds

manifest refresh with change: 43 seconds
manifest import with change (existing content): 36 seconds
manifest import with change (no content): 50 seconds

Comment 23 Roman Plevka 2019-03-19 13:56:43 UTC

Hello Justin,
so I tried to verify this with the same test that produced the earlier timings and the new timings look really great!

21.97s call     manifest_upload.py::test_manifest_upload[1/10]
15.51s call     manifest_upload.py::test_manifest_upload[2/10]
10.38s call     manifest_upload.py::test_manifest_upload[3/10]
10.57s call     manifest_upload.py::test_manifest_upload[4/10]
15.53s call     manifest_upload.py::test_manifest_upload[5/10]
12.35s call     manifest_upload.py::test_manifest_upload[6/10]
10.55s call     manifest_upload.py::test_manifest_upload[7/10]
10.51s call     manifest_upload.py::test_manifest_upload[8/10]
10.48s call     manifest_upload.py::test_manifest_upload[9/10]
10.41s call     manifest_upload.py::test_manifest_upload[10/10]

However, the last iteration errored out with the following message:

PG::UniqueViolation: ERROR:  duplicate key value violates unique constraint "index_katello_pools_on_cp_id"
DETAIL:  Key (cp_id)=(4028f9536995fe480169960bde580a6c) already exists.
: INSERT INTO "katello_pools" ("cp_id", "created_at", "updated_at", "organization_id") VALUES ($1, $2, $3, $4) RETURNING "id"

is it possible that this might be related to the changes you made?

- i tried to run the test (in batches of 10) several times, and this errors occurs in almost every batch. The operations are carried out synchronously.

Comment 24 Justin Sherrill 2019-03-19 14:05:30 UTC

Hey Roman, 

That error shouldn't be related to this, as this converted products and product content to use bulk importing, but the error is related to 'pools'.  I'm happy to take a closer look though, do you have a traceback?

Its possible that my change caused the error to occur because a step that was taking a long time now isn't, which exposed an existing race condition.  The more I think about it, I would put my money on that.  I think the fix should be fairly simple, if you want to handle it as part of this bz.

Justin

Comment 25 Nikhil Kathole 2019-03-19 14:10:52 UTC

I see the bz [1] for same error.

Thanks.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1686604

Comment 26 Roman Plevka 2019-03-19 16:00:16 UTC

Created attachment 1545740 [details]
traceback

attaching the requested traceback

Comment 27 Justin Sherrill 2019-03-19 18:02:34 UTC

If anyone is open to testing the upstream pr: https://github.com/Katello/katello/pull/8034

it would be very helpful,  its hard to reproduce as a race condition

Comment 28 Roman Plevka 2019-03-20 10:38:50 UTC

(In reply to Justin Sherrill from comment #27)
> If anyone is open to testing the upstream pr:
> https://github.com/Katello/katello/pull/8034
> 
> it would be very helpful,  its hard to reproduce as a race condition

Thanks for a quick fix, however, 3 retries might not be enough:
https://github.com/Katello/katello/pull/8034#issuecomment-474775055

Comment 29 Roman Plevka 2019-04-05 06:39:56 UTC

Justin,
will you include the race condition fix mentioned in Comment #27, so I can properly test and Verify this BZ?

Comment 31 Brad Buckingham 2019-04-05 18:00:48 UTC

Hi Roman,

I am clearing the needinfo for Justin and placing this back to ON_QA.

The fix that was referenced in comment 27 was associated with bug 1686604.  That bug was delivered in snap 22 and since been verified.

Comment 32 Roman Plevka 2019-04-08 12:07:37 UTC

Thank you for confirmation.

with the commit in place we are now good to put this to VERIFIED

Comment 33 Mike McCune 2019-04-23 15:46:03 UTC

*** Bug 1698513 has been marked as a duplicate of this bug. ***

Comment 34 Partha Aji 2019-05-02 13:58:12 UTC

Connecting redmine issue https://projects.theforeman.org/issues/26733 from this bug

Comment 37 errata-xmlrpc 2019-05-14 12:39:53 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2019:1222