Bug 1339696

Summary: Importing manifest gets slow with increasing number of organizations
Product: Red Hat Satellite Reporter: Roman Plevka <rplevka>
Component: Subscription ManagementAssignee: Justin Sherrill <jsherril>
Status: CLOSED ERRATA QA Contact: Roman Plevka <rplevka>
Severity: low Docs Contact:
Priority: medium    
Version: 6.2.0CC: abalakht, bbuckingham, bcourt, cdonnell, egolov, ehelms, jcallaha, jsherril, mstead, redakkan, rplevka, sghai, skallesh, tomckay, vrjain, zhunting
Target Milestone: UnspecifiedKeywords: PrioBumpGSS, PrioBumpQA, Reopened, Triaged
Target Release: Unused   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: tfm-rubygem-katello-3.4.4 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1340229 1340245 1655981 (view as bug list) Environment:
Last Closed: 2018-02-21 16:59:05 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1339766, 1340229    
Attachments:
Description Flags
time measurements of the import task
none
plot of the time measurements
none
candlepin.log - ELQpercZTmZ org sample none

Description Roman Plevka 2016-05-25 16:01:09 UTC
Description of problem:
It looks like candlepin polls every org and every activation key every time a manifest is being uploaded.

this is the section where my manifest has been uploaded to Default Organization, which was the only org on the system:
http://pastebin.com/RTZnhL85

here I created a new organization my_org_1 and did the same thing:
http://pastebin.com/nPZwdryD
(notice, there is polling to Default Organization done too).

I've also extracted the CI machine's candlepin log around the manifest import phase:
http://pastebin.com/1VhrEx1m
This took about 11 minutes

I realize 1000+ is a bit extreme and is not likely to be used in a production, however this causes many timeout issues to our automation. Also it is good to demonstrate the increasing time needed to process the manifest.

Also it looks like even a few organizations with many Activation Keys might cause the issue.

Version-Release number of selected component (if applicable):
6.2.0 beta (GA12.1)

Steps to Reproduce:
1. get multiple organizations, optionally with many activation keys
2. try to upload the manifest
3. experiment by adding more organizations and activation keys

Expected results:
The time taken to import a manifest should be constant and shouldn't exceed 1 minute (subjective)

Additional info:

Comment 2 Barnaby Court 2016-06-07 14:02:05 UTC
This will be fixed in a future release. See https://bugzilla.redhat.com/show_bug.cgi?id=1340229 for where it will be fixed.

Comment 3 Michael Stead 2016-06-21 18:52:00 UTC
Could you please provide more info about the manifest that you are importing? or the manifest itself if you still have it.

Comment 5 Roman Plevka 2016-06-24 11:23:09 UTC
Created attachment 1171924 [details]
time measurements of the import task

# attached file contains import task time measurements up to 130 orgs. I'll just plot first 30 orgs for a better resolution.
gnuplot> plot 'times_all.csv' every ::0::20 using 1:3 with points title columnhead  



  50 +-+-----------+-------------+-------------+-------------+--A--------+-+
     +             +             +             +             A             +
  45 +-+..................................................A..............+-+
     |             :             :             :     A A     :             |
  40 +-+..........................................A......................+-+
  35 +-+....................................A..A.........................+-+
     |             :             :       A     :             :             |
  30 +-+...............................A.................................+-+
     |             :             :  A          :             :             |
  25 +-+.........................A.......................................+-+
     |             :       A  A  :             :             :             |
  20 +-+.................A...............................................+-+
     |             :  A          :             :             :             |
  15 +-+........A..A.....................................................+-+
  10 +-+.................................................................+-+
     |     A A     :             :             :             :             |
   5 +-+.................................................................+-+
     +  A          +             +             +             +             +
   0 +-+-----------+-------------+-------------+-------------+-----------+-+
     0             5             10            15            20            25
                                      Org #

Comment 6 Roman Plevka 2016-06-24 11:42:22 UTC
Created attachment 1171926 [details]
plot of the time measurements

Comment 7 Michael Stead 2016-06-24 12:37:25 UTC
Thanks Roman. Would you mind describing your exact reproducer steps?

Comment 8 Roman Plevka 2016-06-24 12:38:45 UTC
Created attachment 1171953 [details]
candlepin.log - ELQpercZTmZ org sample

Attaching a part of candlepin log, where there are already 100+ organizations created.
This log captures a creation of org: ELQpercZTmZ and importing its manifest.
This action triggers about 1054 (!) GET requests to candlepin - many of them are duplicated.

Comment 9 Roman Plevka 2016-06-24 12:39:41 UTC
(In reply to Roman Plevka from comment #8)
> Created attachment 1171953 [details]
> candlepin.log - ELQpercZTmZ org sample
> 
> Attaching a part of candlepin log, where there are already 100+
> organizations created.
> This log captures a creation of org: ELQpercZTmZ and importing its manifest.
> This action triggers about 1054 (!) GET requests to candlepin - many of them
> are duplicated.

According to this - I don't think the issue lies inside Candlepin component. Looks like the problem is in Katello.

Comment 10 Roman Plevka 2016-06-24 12:44:53 UTC
(In reply to Michael Stead from comment #7)
> Thanks Roman. Would you mind describing your exact reproducer steps?

I used satellite 6.2 beta (GA17.0).
On a clean installation I performed the following in a loop:

- create an organization using satellite API
- start timer
- uploaded a manifest using satellite API
- stopped timer

there are 2 datasets in the CSV i've attached. the first one contains measurements of the nailgun (python framework for Satellite API) measurements. this is of worse resolution, as Nailgun uses 5-second intervals for checking the status of the tasks, so the measurements are rounded to 5 seconds.

the second dataset contains the times (in seconds) extracted from `hammer task list` started_at and ended_at times.
- Anyway, this won't be reproducible in Candlepin ot its own as I think it's katello which sends many GET requests to Candlepin (see my previous comments and attached logs)

Comment 11 Tom McKay 2016-06-24 13:31:37 UTC
Looking at the import manifest code in katello there are indeed many places where candlepin is being hit for multiple orgs.

https://github.com/Katello/katello/blob/master/app/models/katello/glue/candlepin/candlepin_object.rb#L37

Part of this, I believe, is because katello needs to keep its data state absolutely in sync with candlepin. There may be cases for optimization here but the risk versus reward of refactoring this area needs to be considered carefully.

Overall, I would agree that this is a katello issue and not a candlepin one.

Comment 12 Bryan Kearney 2016-07-26 18:57:42 UTC
Moving 6.2 bugs out to sat-backlog.

Comment 13 Andrii Balakhtar 2017-01-20 10:24:19 UTC
Small comment after investigating some automation results, hope this will be helpful:
not only manifest importing, but also manifest refresh/manifest delete commands have decreased performance with increased number of organizations, although the root cause is most likely the same.

Comment 17 Roman Plevka 2017-06-01 13:14:19 UTC
btw, the same issue applies to creating products. I believe the root cause is the same.

Comment 19 Justin Sherrill 2017-07-06 14:40:02 UTC
Created redmine issue http://projects.theforeman.org/issues/20233 from this bug

Comment 20 Satellite Program 2017-07-10 16:03:20 UTC
Moving this bug to POST for triage into Satellite 6 since the upstream issue http://projects.theforeman.org/issues/20233 has been resolved.

Comment 21 Satellite Program 2017-08-03 22:03:33 UTC
Moving this bug to POST for triage into Satellite 6 since the upstream issue http://projects.theforeman.org/issues/20233 has been resolved.

Comment 22 Roman Plevka 2017-09-20 16:10:49 UTC
VERIFIED
on sat6.3.0-16

  20 +-+-----------+-------------+-------------+-------------+-----------+-+   
     +             +             +             +             +             +   
     |                                           'times' using 1:2    A    |   
     |                                                                     |   
  15 +-+                                                                 +-+   
     |                                                                     |   
     |                                                                     |   
     |                                                                     |   
     |                                                                     |   
  10 +-+    A      A             A             A      A             A    +-A   
     A                    A                                  A             |   
     |                                  A                                  |   
     |                                                                     |   
   5 +-+                                                                 +-+   
     |                                                                     |   
     |                                                                     |   
     |                                                                     |   
     +             +             +             +             +             +   
   0 +-+-----------+-------------+-------------+-------------+-----------+-+   
     0             20            40            60            80           100  
                                     Orgs #                                    
   

This now seems to be constant. I've also measured a time taken for deleting the manifest, got the identical results.

- created org
- if orgs# % 10 == 0:
  - start timer
  - upload manifest
  - stop timer

  - start timer
  - delete manifest
  - stop timer

Comment 23 Bryan Kearney 2018-02-21 16:59:05 UTC
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA.

For information on the advisory, and where to find the updated files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:0336