1458857 – Improved performance of Puppet and RHSM fact importers

Red Hat Satellite engineering is moving the tracking of its product development work on Satellite to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "Satellite project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs will be migrated starting at the end of May. If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "Satellite project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/SAT-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1458857 - Improved performance of Puppet and RHSM fact importers

Summary: Improved performance of Puppet and RHSM fact importers

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Satellite
Classification:	Red Hat
Component:	Hosts - Content
Sub Component:
Version:	6.2.9
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	Unspecified
Assignee:	Shimon Shtein
QA Contact:	jcallaha
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-06-05 16:33 UTC by Chris Duryee
Modified:	2020-12-14 08:48 UTC (History)
CC List:	13 users (show)
Fixed In Version:	foreman-1.11.0.81-1
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1463803 (view as bug list)
Environment:
Last Closed:	2017-08-10 17:02:29 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Foreman Issue Tracker	20024	Normal	Closed	Improve performance of rhsm fact importer	2020-05-28 04:07:23 UTC
Red Hat Bugzilla	1474293	medium	CLOSED	duplicate key value violates unique constraint "index_fact_names_on_name_and_type" on simulatenous host register	2022-03-13 14:21:45 UTC
Red Hat Product Errata	RHBA-2017:2466	normal	SHIPPED_LIVE	Satellite 6.2.11 Async Release	2017-08-10 21:01:20 UTC

Internal Links: 1474293

Description Chris Duryee 2017-06-05 16:33:16 UTC

Description of problem:

The fix for https://bugzilla.redhat.com/show_bug.cgi?id=1405085 passes system facts from 'plan' phase to 'run' phase on the update task. This causes dynflow to take extra time to serialize and deserialize the task envelope, which can cause dynflow to not keep up with incoming tasks on a busy system.


Version-Release number of selected component (if applicable): 6.2.9


How reproducible: difficult to repro without a busy Satellite


Steps to Reproduce:
1. create lots of host update tasks (perhaps 1 per sec)
2. let the task list get to about 5000+ tasks; it may help to kick off some syncs to load up dynflow
3. view task list periodically

Actual results: task list grows and does not clear out


Expected results: dynflow can keep up, tasks clear out


Additional info: reverting 5dca166c84d did the trick, except some existing tasks were still looking for the 'run' method on Katello::Host::Update instead of inheriting from the superclass. Note though that reverting brought back 1405085.

We may want to gzip the fact list or only pass in updated facts instead of the full list.

Comment 3 Adam Ruzicka 2017-06-07 10:30:48 UTC

Created redmine issue http://projects.theforeman.org/issues/19951 from this bug

Comment 6 Ivan Necas 2017-06-07 11:31:22 UTC

The fact that the step tool a long time to change the state doesn't mean the issue is the serialization. I'm pretty sure we would not see the issue, if the `run` of the facts importer would be cleaned, and we would leave just serialization/deserialization there. Do we have task export with the fact improt tasks?

Comment 7 Shimon Shtein 2017-06-12 07:12:03 UTC

Preliminary analysis: from checks performed by @aruzicka it seems that the serialization is not the main time consumer.
From my own checks, I can confirm that fact updating for a single host takes around 600ms. I am investigating how to reduce this time.

Comment 8 Shimon Shtein 2017-06-15 05:37:16 UTC

Connecting redmine issue http://projects.theforeman.org/issues/20024 from this bug

Comment 9 Satellite Program 2017-06-19 10:06:57 UTC

Upstream bug assigned to sshtein

Comment 10 Satellite Program 2017-06-19 10:07:01 UTC

Upstream bug assigned to sshtein

Comment 11 Satellite Program 2017-06-19 20:07:16 UTC

Moving this bug to POST for triage into Satellite 6 since the upstream issue http://projects.theforeman.org/issues/20024 has been resolved.

Comment 16 Lukas Zapletal 2017-07-17 07:33:08 UTC

QA notes: Please test also with existing systems, import some systems with structured facts without the patch, upgrade (apply the errata) then compare how it works and how this is presented in the Facts pages.

Comment 18 Lukas Zapletal 2017-07-18 08:13:04 UTC

QA notes:

Stress test RHSM fact uploads and Puppet fact uploads. Make sure there are multiple passenger processes serving the requests.

Comment 19 jcallaha 2017-08-03 21:01:57 UTC

Verified in Satellite 6.2.11 Snap 3

I registered 100 then additional 100 docker-based content hosts to my satellite. Each one looped through a subman facts upload then a 1 second sleep, for an hour. When there were 100 hosts, the maximum number of pending tasks was 115, with an average of 80 pending tasks. When bumped up to 200, the maximum number of pending tasks was up to 328, with an average of 211 pending tasks. Below is the passenger status with 200 hosts running facts upload loops

-bash-4.2# passenger-status 
Version : 4.0.18
Date    : 2017-08-02 12:42:00 -0400
Instance: 21529
----------- General information -----------
Max pool size : 12
Processes     : 12
Requests in top-level queue : 0

----------- Application groups -----------
/usr/share/foreman#default:
  App root: /usr/share/foreman
  Requests in queue: 18
  * PID: 12040   Sessions: 1       Processed: 7128    Uptime: 20h 6m 17s
    CPU: 0%      Memory  : 582M    Last used: 0s ago
  * PID: 1052    Sessions: 1       Processed: 6734    Uptime: 1h 28m 16s
    CPU: 12%     Memory  : 641M    Last used: 0s ago
  * PID: 1075    Sessions: 1       Processed: 6805    Uptime: 1h 28m 15s
    CPU: 12%     Memory  : 306M    Last used: 0s ago
  * PID: 1101    Sessions: 1       Processed: 7220    Uptime: 1h 28m 14s
    CPU: 13%     Memory  : 309M    Last used: 0s ago
  * PID: 1124    Sessions: 1       Processed: 6869    Uptime: 1h 28m 13s
    CPU: 12%     Memory  : 629M    Last used: 0s ago
  * PID: 1159    Sessions: 1       Processed: 7382    Uptime: 1h 28m 12s
    CPU: 13%     Memory  : 669M    Last used: 0s ago
  * PID: 1199    Sessions: 1       Processed: 6003    Uptime: 1h 28m 11s
    CPU: 11%     Memory  : 300M    Last used: 0s ago
  * PID: 1227    Sessions: 1       Processed: 7072    Uptime: 1h 28m 10s
    CPU: 13%     Memory  : 304M    Last used: 0s ago
  * PID: 1255    Sessions: 1       Processed: 6897    Uptime: 1h 28m 9s
    CPU: 13%     Memory  : 299M    Last used: 0s ago
  * PID: 1288    Sessions: 1       Processed: 6724    Uptime: 1h 28m 7s
    CPU: 12%     Memory  : 299M    Last used: 0s ago
  * PID: 1318    Sessions: 1       Processed: 6829    Uptime: 1h 28m 6s
    CPU: 12%     Memory  : 298M    Last used: 0s ago

/etc/puppet/rack#default:
  App root: /etc/puppet/rack
  Requests in queue: 0
  * PID: 23904   Sessions: 0       Processed: 270     Uptime: 26h 30m 36s
    CPU: 0%      Memory  : 50M     Last used: 7s ago


After killing the hosts, all the tasks were completed within 2 imutes and the passenger status was what is shown below.

-bash-4.2# passenger-status 
Version : 4.0.18
Date    : 2017-08-02 12:43:35 -0400
Instance: 21529
----------- General information -----------
Max pool size : 12
Processes     : 12
Requests in top-level queue : 0

----------- Application groups -----------
/usr/share/foreman#default:
  App root: /usr/share/foreman
  Requests in queue: 0
  * PID: 12040   Sessions: 0       Processed: 7163    Uptime: 20h 7m 52s
    CPU: 1%      Memory  : 582M    Last used: 1m 7s ag
  * PID: 1052    Sessions: 0       Processed: 6766    Uptime: 1h 29m 51s
    CPU: 12%     Memory  : 641M    Last used: 1m 15s a
  * PID: 1075    Sessions: 0       Processed: 6844    Uptime: 1h 29m 50s
    CPU: 12%     Memory  : 306M    Last used: 1m 12s a
  * PID: 1101    Sessions: 0       Processed: 7262    Uptime: 1h 29m 49s
    CPU: 13%     Memory  : 309M    Last used: 1m 11s a
  * PID: 1124    Sessions: 0       Processed: 6900    Uptime: 1h 29m 48s
    CPU: 12%     Memory  : 630M    Last used: 1m 10s a
  * PID: 1159    Sessions: 0       Processed: 7421    Uptime: 1h 29m 47s
    CPU: 13%     Memory  : 669M    Last used: 1m 13s a
  * PID: 1199    Sessions: 0       Processed: 6039    Uptime: 1h 29m 46s
    CPU: 11%     Memory  : 301M    Last used: 1m 12s a
  * PID: 1227    Sessions: 0       Processed: 7117    Uptime: 1h 29m 45s
    CPU: 12%     Memory  : 304M    Last used: 1m 7s ag
  * PID: 1255    Sessions: 0       Processed: 6947    Uptime: 1h 29m 44s
    CPU: 12%     Memory  : 300M    Last used: 10s ago
  * PID: 1288    Sessions: 0       Processed: 6763    Uptime: 1h 29m 42s
    CPU: 12%     Memory  : 299M    Last used: 1m 14s a
  * PID: 1318    Sessions: 0       Processed: 6862    Uptime: 1h 29m 41s
    CPU: 12%     Memory  : 298M    Last used: 1m 15s a

/etc/puppet/rack#default:
  App root: /etc/puppet/rack
  Requests in queue: 0
  * PID: 23904   Sessions: 0       Processed: 270     Uptime: 26h 32m 11s
    CPU: 0%      Memory  : 50M     Last used: 1m 42s 


--------------------------------------------------------------------------
for puppet facts, i created a new container image that uploaded puppet facts on a loop with only a one second interval. I spun up 25 container hosts simultaneously and monitored the passenger status.

the initial status had very few puppet workers.

-bash-4.2# passenger-status
Version : 4.0.18
Date    : 2017-08-03 16:43:48 -0400
Instance: 11062
----------- General information -----------
Max pool size : 12
Processes     : 12
Requests in top-level queue : 0

----------- Application groups -----------
/usr/share/foreman#default:
  App root: /usr/share/foreman
  Requests in queue: 0
  * PID: 11539   Sessions: 0       Processed: 127     Uptime: 16m 21s
    CPU: 2%      Memory  : 590M    Last used: 8s ago
  * PID: 12299   Sessions: 0       Processed: 89      Uptime: 5m 29s
    CPU: 2%      Memory  : 277M    Last used: 5s ago
  * PID: 13319   Sessions: 0       Processed: 45      Uptime: 1m 35s
    CPU: 6%      Memory  : 275M    Last used: 8s ago
  * PID: 13340   Sessions: 0       Processed: 62      Uptime: 1m 34s
    CPU: 8%      Memory  : 276M    Last used: 8s ago
  * PID: 13358   Sessions: 0       Processed: 73      Uptime: 1m 33s
    CPU: 9%      Memory  : 283M    Last used: 5s ago
  * PID: 13379   Sessions: 1       Processed: 3       Uptime: 1m 32s
    CPU: 3%      Memory  : 263M    Last used: 35s ago
  * PID: 13406   Sessions: 1       Processed: 3       Uptime: 1m 31s
    CPU: 3%      Memory  : 262M    Last used: 35s ago
  * PID: 13435   Sessions: 0       Processed: 43      Uptime: 1m 29s
    CPU: 5%      Memory  : 267M    Last used: 6s ago
  * PID: 13468   Sessions: 0       Processed: 53      Uptime: 1m 27s
    CPU: 7%      Memory  : 266M    Last used: 8s ago
  * PID: 13510   Sessions: 0       Processed: 15      Uptime: 1m 26s
    CPU: 4%      Memory  : 242M    Last used: 8s ago

/etc/puppet/rack#default:
  App root: /etc/puppet/rack
  Requests in queue: 0
  * PID: 11880   Sessions: 0       Processed: 179     Uptime: 11m 28s
    CPU: 0%      Memory  : 49M     Last used: 2s ago
  * PID: 13038   Sessions: 0       Processed: 0       Uptime: 2m 19s
    CPU: 0%      Memory  : 8M      Last used: 2m 19s ago


but after running for a bit, the number of workers increased. at no point did the queue grow beyond 0.

-bash-4.2# passenger-status
Version : 4.0.18
Date    : 2017-08-03 16:55:22 -0400
Instance: 11062
----------- General information -----------
Max pool size : 12
Processes     : 12
Requests in top-level queue : 0

----------- Application groups -----------
/usr/share/foreman#default:
  App root: /usr/share/foreman
  Requests in queue: 0
  * PID: 11539   Sessions: 0       Processed: 216     Uptime: 27m 55s
    CPU: 1%      Memory  : 653M    Last used: 1m 43s ago
  * PID: 12299   Sessions: 0       Processed: 149     Uptime: 17m 3s
    CPU: 2%      Memory  : 572M    Last used: 2m 56s ago
  * PID: 13319   Sessions: 0       Processed: 133     Uptime: 13m 9s
    CPU: 3%      Memory  : 633M    Last used: 1m 44s ago
  * PID: 13340   Sessions: 0       Processed: 123     Uptime: 13m 8s
    CPU: 3%      Memory  : 583M    Last used: 1m 7s ago
  * PID: 13379   Sessions: 0       Processed: 97      Uptime: 13m 6s
    CPU: 2%      Memory  : 579M    Last used: 2m 46s ago
  * PID: 13406   Sessions: 0       Processed: 70      Uptime: 13m 5s
    CPU: 2%      Memory  : 546M    Last used: 1m 44s ago
  * PID: 13468   Sessions: 0       Processed: 132     Uptime: 13m 1s
    CPU: 2%      Memory  : 518M    Last used: 1m 44s ago

/etc/puppet/rack#default:
  App root: /etc/puppet/rack
  Requests in queue: 0
  * PID: 11880   Sessions: 0       Processed: 998     Uptime: 23m 2s
    CPU: 2%      Memory  : 52M     Last used: 1s ago
  * PID: 16573   Sessions: 0       Processed: 617     Uptime: 3m 54s
    CPU: 8%      Memory  : 41M     Last used: 1s ago
  * PID: 16579   Sessions: 0       Processed: 670     Uptime: 3m 54s
    CPU: 9%      Memory  : 40M     Last used: 10s ago
  * PID: 16587   Sessions: 0       Processed: 610     Uptime: 3m 54s
    CPU: 9%      Memory  : 38M     Last used: 10s ago
  * PID: 16594   Sessions: 0       Processed: 791     Uptime: 3m 54s
    CPU: 10%     Memory  : 40M     Last used: 10s ago

Comment 22 errata-xmlrpc 2017-08-10 17:02:29 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2466

Note You need to log in before you can comment on or make changes to this bug.