Description of problem: It looks like candlepin polls every org and every activation key every time a manifest is being uploaded. this is the section where my manifest has been uploaded to Default Organization, which was the only org on the system: http://pastebin.com/RTZnhL85 here I created a new organization my_org_1 and did the same thing: http://pastebin.com/nPZwdryD (notice, there is polling to Default Organization done too). I've also extracted the CI machine's candlepin log around the manifest import phase: http://pastebin.com/1VhrEx1m This took about 11 minutes I realize 1000+ is a bit extreme and is not likely to be used in a production, however this causes many timeout issues to our automation. Also it is good to demonstrate the increasing time needed to process the manifest. Also it looks like even a few organizations with many Activation Keys might cause the issue. Version-Release number of selected component (if applicable): 6.2.0 beta (GA12.1) Steps to Reproduce: 1. get multiple organizations, optionally with many activation keys 2. try to upload the manifest 3. experiment by adding more organizations and activation keys Expected results: The time taken to import a manifest should be constant and shouldn't exceed 1 minute (subjective) Additional info:
This will be fixed in a future release. See https://bugzilla.redhat.com/show_bug.cgi?id=1340229 for where it will be fixed.
Could you please provide more info about the manifest that you are importing? or the manifest itself if you still have it.
Created attachment 1171924 [details] time measurements of the import task # attached file contains import task time measurements up to 130 orgs. I'll just plot first 30 orgs for a better resolution. gnuplot> plot 'times_all.csv' every ::0::20 using 1:3 with points title columnhead 50 +-+-----------+-------------+-------------+-------------+--A--------+-+ + + + + A + 45 +-+..................................................A..............+-+ | : : : A A : | 40 +-+..........................................A......................+-+ 35 +-+....................................A..A.........................+-+ | : : A : : | 30 +-+...............................A.................................+-+ | : : A : : | 25 +-+.........................A.......................................+-+ | : A A : : : | 20 +-+.................A...............................................+-+ | : A : : : | 15 +-+........A..A.....................................................+-+ 10 +-+.................................................................+-+ | A A : : : : | 5 +-+.................................................................+-+ + A + + + + + 0 +-+-----------+-------------+-------------+-------------+-----------+-+ 0 5 10 15 20 25 Org #
Created attachment 1171926 [details] plot of the time measurements
Thanks Roman. Would you mind describing your exact reproducer steps?
Created attachment 1171953 [details] candlepin.log - ELQpercZTmZ org sample Attaching a part of candlepin log, where there are already 100+ organizations created. This log captures a creation of org: ELQpercZTmZ and importing its manifest. This action triggers about 1054 (!) GET requests to candlepin - many of them are duplicated.
(In reply to Roman Plevka from comment #8) > Created attachment 1171953 [details] > candlepin.log - ELQpercZTmZ org sample > > Attaching a part of candlepin log, where there are already 100+ > organizations created. > This log captures a creation of org: ELQpercZTmZ and importing its manifest. > This action triggers about 1054 (!) GET requests to candlepin - many of them > are duplicated. According to this - I don't think the issue lies inside Candlepin component. Looks like the problem is in Katello.
(In reply to Michael Stead from comment #7) > Thanks Roman. Would you mind describing your exact reproducer steps? I used satellite 6.2 beta (GA17.0). On a clean installation I performed the following in a loop: - create an organization using satellite API - start timer - uploaded a manifest using satellite API - stopped timer there are 2 datasets in the CSV i've attached. the first one contains measurements of the nailgun (python framework for Satellite API) measurements. this is of worse resolution, as Nailgun uses 5-second intervals for checking the status of the tasks, so the measurements are rounded to 5 seconds. the second dataset contains the times (in seconds) extracted from `hammer task list` started_at and ended_at times. - Anyway, this won't be reproducible in Candlepin ot its own as I think it's katello which sends many GET requests to Candlepin (see my previous comments and attached logs)
Looking at the import manifest code in katello there are indeed many places where candlepin is being hit for multiple orgs. https://github.com/Katello/katello/blob/master/app/models/katello/glue/candlepin/candlepin_object.rb#L37 Part of this, I believe, is because katello needs to keep its data state absolutely in sync with candlepin. There may be cases for optimization here but the risk versus reward of refactoring this area needs to be considered carefully. Overall, I would agree that this is a katello issue and not a candlepin one.
Moving 6.2 bugs out to sat-backlog.
Small comment after investigating some automation results, hope this will be helpful: not only manifest importing, but also manifest refresh/manifest delete commands have decreased performance with increased number of organizations, although the root cause is most likely the same.
btw, the same issue applies to creating products. I believe the root cause is the same.
Created redmine issue http://projects.theforeman.org/issues/20233 from this bug
Moving this bug to POST for triage into Satellite 6 since the upstream issue http://projects.theforeman.org/issues/20233 has been resolved.
VERIFIED on sat6.3.0-16 20 +-+-----------+-------------+-------------+-------------+-----------+-+ + + + + + + | 'times' using 1:2 A | | | 15 +-+ +-+ | | | | | | | | 10 +-+ A A A A A A +-A A A A | | A | | | 5 +-+ +-+ | | | | | | + + + + + + 0 +-+-----------+-------------+-------------+-------------+-----------+-+ 0 20 40 60 80 100 Orgs # This now seems to be constant. I've also measured a time taken for deleting the manifest, got the identical results. - created org - if orgs# % 10 == 0: - start timer - upload manifest - stop timer - start timer - delete manifest - stop timer
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:0336