Bug 1440825

Summary: Fact import code consumes lot of memory
Product: Red Hat Satellite Reporter: Chris Roberts <chrobert>
Component: PuppetAssignee: satellite6-bugs <satellite6-bugs>
Status: CLOSED ERRATA QA Contact: Katello QA List <katello-qa-list>
Severity: medium Docs Contact:
Priority: medium    
Version: 6.2.8CC: inecas, jcallaha, lzap, oshtaier
Target Milestone: UnspecifiedKeywords: Triaged
Target Release: Unused   
Hardware: x86_64   
OS: Linux   
URL: http://projects.theforeman.org/issues/9016
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-02-21 17:03:49 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Chris Roberts 2017-04-10 14:30:39 UTC
This was initially reported on discovery process, but I located performance bottleneck in fact importing.

When importing a large set of hosts, the avg response time is high:

Completed 201 Created in 8403.7ms (Views: 3.6ms | ActiveRecord: 0.0ms)
Completed 201 Created in 3546.0ms (Views: 27.6ms | ActiveRecord: 0.0ms)
Completed 201 Created in 3260.2ms (Views: 3.4ms | ActiveRecord: 0.0ms)
Completed 201 Created in 5449.5ms (Views: 3.1ms | ActiveRecord: 0.0ms)
Completed 201 Created in 5356.3ms (Views: 3.5ms | ActiveRecord: 0.1ms)
Completed 201 Created in 8333.2ms (Views: 3.7ms | ActiveRecord: 0.1ms)
Completed 201 Created in 5271.3ms (Views: 3.2ms | ActiveRecord: 0.1ms)
Completed 201 Created in 8462.4ms (Views: 3.2ms | ActiveRecord: 0.1ms)
Completed 201 Created in 4223.7ms (Views: 3.2ms | ActiveRecord: 0.2ms)
Completed 201 Created in 8827.6ms (Views: 3.9ms | ActiveRecord: 0.1ms)
Completed 201 Created in 4510.4ms (Views: 3.1ms | ActiveRecord: 0.1ms)
Completed 201 Created in 4038.9ms (Views: 2.9ms | ActiveRecord: 0.1ms)
Completed 201 Created in 4075.4ms (Views: 3.2ms | ActiveRecord: 0.1ms)
Completed 201 Created in 4007.8ms (Views: 4.5ms | ActiveRecord: 0.2ms)
Completed 201 Created in 3886.1ms (Views: 3.6ms | ActiveRecord: 0.1ms)
Completed 201 Created in 4845.1ms (Views: 3.5ms | ActiveRecord: 0.2ms)
Completed 201 Created in 4626.0ms (Views: 3.6ms | ActiveRecord: 0.1ms)
Completed 201 Created in 4751.8ms (Views: 3.7ms | ActiveRecord: 0.1ms)
Observation, the O(n) depends on amount of facts reported. These are performance tests on production installation of Satellite 6.2 RC1 (roughly Foreman 1.11) with five concurrent CURL processes in loop registering discovery hosts (it calls facts import basically):

With 10 facts:

2016-07-21 13:06:20 [app] [I] Completed 201 Created in 386ms (Views: 2.0ms | ActiveRecord: 66.4ms)
2016-07-21 13:06:20 [app] [I] Completed 201 Created in 353ms (Views: 1.9ms | ActiveRecord: 68.8ms)
2016-07-21 13:06:21 [app] [I] Completed 201 Created in 360ms (Views: 2.1ms | ActiveRecord: 72.6ms)
2016-07-21 13:06:21 [app] [I] Completed 201 Created in 316ms (Views: 1.7ms | ActiveRecord: 58.7ms)
2016-07-21 13:06:21 [app] [I] Completed 201 Created in 366ms (Views: 2.0ms | ActiveRecord: 68.5ms)
2016-07-21 13:06:21 [app] [I] Completed 201 Created in 331ms (Views: 1.7ms | ActiveRecord: 59.6ms)
2016-07-21 13:06:22 [app] [I] Completed 201 Created in 322ms (Views: 1.7ms | ActiveRecord: 58.8ms)
2016-07-21 13:06:22 [app] [I] Completed 201 Created in 380ms (Views: 1.9ms | ActiveRecord: 85.0ms)
2016-07-21 13:06:22 [app] [I] Completed 201 Created in 337ms (Views: 2.2ms | ActiveRecord: 61.8ms)
2016-07-21 13:06:22 [app] [I] Completed 201 Created in 385ms (Views: 2.4ms | ActiveRecord: 77.8ms)
2016-07-21 13:06:22 [app] [I] Completed 201 Created in 442ms (Views: 1.7ms | ActiveRecord: 70.3ms)
2016-07-21 13:06:22 [app] [I] Completed 201 Created in 388ms (Views: 1.9ms | ActiveRecord: 86.2ms)
2016-07-21 13:06:23 [app] [I] Completed 201 Created in 349ms (Views: 2.1ms | ActiveRecord: 66.0ms)
2016-07-21 13:06:23 [app] [I] Completed 201 Created in 346ms (Views: 1.7ms | ActiveRecord: 74.3ms)
2016-07-21 13:06:23 [app] [I] Completed 201 Created in 364ms (Views: 1.8ms | ActiveRecord: 61.6ms)
2016-07-21 13:06:23 [app] [I] Completed 201 Created in 488ms (Views: 2.4ms | ActiveRecord: 80.6ms)
2016-07-21 13:06:24 [app] [I] Completed 201 Created in 485ms (Views: 2.7ms | ActiveRecord: 74.3ms)
2016-07-21 13:06:24 [app] [I] Completed 201 Created in 577ms (Views: 2.8ms | ActiveRecord: 76.6ms)
With 500 facts:

2016-07-21 13:09:29 [app] [I] Completed 201 Created in 1666ms (Views: 1.8ms | ActiveRecord: 376.8ms)
2016-07-21 13:09:30 [app] [I] Completed 201 Created in 1688ms (Views: 1.8ms | ActiveRecord: 380.2ms)
2016-07-21 13:09:31 [app] [I] Completed 201 Created in 1775ms (Views: 1.8ms | ActiveRecord: 387.7ms)
2016-07-21 13:09:32 [app] [I] Completed 201 Created in 1732ms (Views: 1.8ms | ActiveRecord: 436.5ms)
2016-07-21 13:09:33 [app] [I] Completed 201 Created in 1767ms (Views: 1.8ms | ActiveRecord: 410.1ms)
2016-07-21 13:09:34 [app] [I] Completed 201 Created in 1749ms (Views: 1.9ms | ActiveRecord: 404.3ms)
2016-07-21 13:09:34 [app] [I] Completed 201 Created in 1777ms (Views: 1.9ms | ActiveRecord: 416.1ms)
2016-07-21 13:09:35 [app] [I] Completed 201 Created in 1702ms (Views: 1.7ms | ActiveRecord: 398.9ms)
2016-07-21 13:09:36 [app] [I] Completed 201 Created in 1650ms (Views: 1.8ms | ActiveRecord: 358.3ms)
2016-07-21 13:09:37 [app] [I] Completed 201 Created in 1647ms (Views: 1.8ms | ActiveRecord: 330.7ms)
2016-07-21 13:09:38 [app] [I] Completed 201 Created in 1761ms (Views: 1.8ms | ActiveRecord: 404.6ms)
2016-07-21 13:09:39 [app] [I] Completed 201 Created in 1629ms (Views: 1.8ms | ActiveRecord: 347.6ms)
2016-07-21 13:09:40 [app] [I] Completed 201 Created in 1735ms (Views: 1.8ms | ActiveRecord: 420.6ms)
2016-07-21 13:09:40 [app] [I] Completed 201 Created in 1616ms (Views: 2.1ms | ActiveRecord: 329.6ms)
2016-07-21 13:09:41 [app] [I] Completed 201 Created in 1792ms (Views: 2.0ms | ActiveRecord: 418.4ms)
2016-07-21 13:09:42 [app] [I] Completed 201 Created in 1794ms (Views: 1.8ms | ActiveRecord: 454.0ms)
2016-07-21 13:09:43 [app] [I] Completed 201 Created in 1779ms (Views: 2.5ms | ActiveRecord: 424.4ms)
2016-07-21 13:09:44 [app] [I] Completed 201 Created in 1735ms (Views: 1.8ms | ActiveRecord: 411.7ms)
2016-07-21 13:09:45 [app] [I] Completed 201 Created in 1912ms (Views: 1.9ms | ActiveRecord: 461.2ms)
2016-07-21 13:09:46 [app] [I] Completed 201 Created in 1781ms (Views: 1.9ms | ActiveRecord: 463.5ms)
2016-07-21 13:09:47 [app] [I] Completed 201 Created in 1682ms (Views: 1.7ms | ActiveRecord: 377.2ms)
2016-07-21 13:09:47 [app] [I] Completed 201 Created in 1645ms (Views: 2.2ms | ActiveRecord: 336.1ms)
2016-07-21 13:09:49 [app] [I] Completed 201 Created in 1843ms (Views: 2.0ms | ActiveRecord: 435.9ms)
2016-07-21 13:09:49 [app] [I] Completed 201 Created in 1900ms (Views: 2.3ms | ActiveRecord: 470.4ms)
2016-07-21 13:09:51 [app] [I] Completed 201 Created in 1841ms (Views: 1.7ms | ActiveRecord: 406.0ms)
2016-07-21 13:09:51 [app] [I] Completed 201 Created in 1949ms (Views: 2.5ms | ActiveRecord: 443.3ms)
2016-07-21 13:09:52 [app] [I] Completed 201 Created in 1773ms (Views: 1.8ms | ActiveRecord: 453.9ms)
And with 1000 facts reported in each call:

2016-07-21 13:11:12 [app] [I] Completed 201 Created in 3104ms (Views: 1.8ms | ActiveRecord: 764.2ms)
2016-07-21 13:11:14 [app] [I] Completed 201 Created in 3133ms (Views: 1.9ms | ActiveRecord: 667.6ms)
2016-07-21 13:11:15 [app] [I] Completed 201 Created in 3113ms (Views: 2.0ms | ActiveRecord: 732.8ms)
2016-07-21 13:11:17 [app] [I] Completed 201 Created in 3139ms (Views: 1.8ms | ActiveRecord: 752.2ms)
2016-07-21 13:11:18 [app] [I] Completed 201 Created in 3015ms (Views: 2.2ms | ActiveRecord: 730.2ms)
2016-07-21 13:11:20 [app] [I] Completed 201 Created in 3139ms (Views: 1.9ms | ActiveRecord: 646.0ms)
2016-07-21 13:11:21 [app] [I] Completed 201 Created in 2877ms (Views: 1.8ms | ActiveRecord: 620.0ms)
2016-07-21 13:11:23 [app] [I] Completed 201 Created in 3011ms (Views: 2.4ms | ActiveRecord: 611.1ms)
2016-07-21 13:11:24 [app] [I] Completed 201 Created in 3007ms (Views: 1.8ms | ActiveRecord: 614.2ms)
2016-07-21 13:11:26 [app] [I] Completed 201 Created in 3163ms (Views: 1.9ms | ActiveRecord: 661.9ms)
2016-07-21 13:11:27 [app] [I] Completed 201 Created in 2923ms (Views: 1.8ms | ActiveRecord: 641.7ms)
2016-07-21 13:11:29 [app] [I] Completed 201 Created in 2982ms (Views: 2.1ms | ActiveRecord: 657.7ms)
2016-07-21 13:11:30 [app] [I] Completed 201 Created in 3018ms (Views: 1.8ms | ActiveRecord: 706.9ms)
2016-07-21 13:11:32 [app] [I] Completed 201 Created in 3212ms (Views: 1.8ms | ActiveRecord: 726.9ms)
2016-07-21 13:11:33 [app] [I] Completed 201 Created in 2931ms (Views: 2.1ms | ActiveRecord: 602.9ms)
2016-07-21 13:11:35 [app] [I] Completed 201 Created in 2991ms (Views: 2.5ms | ActiveRecord: 638.4ms)
2016-07-21 13:11:36 [app] [I] Completed 201 Created in 3016ms (Views: 1.8ms | ActiveRecord: 728.8ms)
2016-07-21 13:11:39 [app] [I] Completed 201 Created in 3170ms (Views: 1.7ms | ActiveRecord: 744.1ms)
2016-07-21 13:11:40 [app] [I] Completed 201 Created in 3312ms (Views: 1.9ms | ActiveRecord: 823.8ms)
2016-07-21 13:11:42 [app] [I] Completed 201 Created in 3198ms (Views: 1.7ms | ActiveRecord: 820.9ms)
Width of fact name or value does not matter, what's the bottleneck is our code that does fact importing. It does this in a loop over all facts merging them with database. We need to optimize that.

Comment 2 Lukas Zapletal 2017-04-10 14:37:49 UTC
Thanks, fixed upstream, we need to backport. I can do the work, I made the upstream patch.

Comment 4 Lukas Zapletal 2017-04-11 07:34:32 UTC
QA NOTES: To reproduce slow fact loading, create million (or more) fact names. Use foreman-rake console to do this:

FactName.transaction do
  (0..1000000).each do |x|
    FactName.connection.execute 
      "INSERT INTO fact_names (name) values ('rand_fact_name_#{x}')"
  end
end

Then upload facts using both discovery (discover node) or using puppet (register puppet client and perform puppet run). In the production.log the processing time should dropped from several (or tens) seconds to under a second. Process will also consume much less memory over time.

Typical fact import session is around a second:

2017-04-11 03:28:58 [app] [I] Started POST "/api/hosts/facts" for 10.8.106.1 at 
2017-04-11 03:28:58 [audit] [I] [cruicble.toledo.satellite.lab.eng.rdu2.redhat.com] deleted 0 (38.3ms)
2017-04-11 03:28:58 [audit] [I] [cruicble.toledo.satellite.lab.eng.rdu2.redhat.com] updated 9 (251.6ms)
2017-04-11 03:28:58 [audit] [I] [cruicble.toledo.satellite.lab.eng.rdu2.redhat.com] added 0 (4.2ms)
2017-04-11 03:28:58 [app] [I] Import facts for 'cruicble.toledo.satellite.lab.eng.rdu2.redhat.com' completed. Added: 0, Updated: 9, Deleted 0 facts
2017-04-11 03:28:59 [app] [I] Completed 201 Created in 712ms (Views: 14.1ms | ActiveRecord: 135.8ms)

The way this was designed requires quite complex transaction, do not expect this to drop down to fractions of second. We'd need to redesign how facts are stored.

Comment 6 Lukas Zapletal 2017-04-11 07:44:11 UTC
QA NOTES 2: To test how fact import performs do this command:

# grep -A1 'Import facts for' /var/log/foreman/production.log | grep Completed

The command does not show all hits but at least some. When testing this rember to fill database with fake fact names as in comment 4.

This is how it looks before the patch:

2017-04-10 03:42:36 [app] [I] Completed 201 Created in 264ms (Views: 4.1ms | ActiveRecord: 93.6ms)
2017-04-10 03:50:37 [app] [I] Completed 201 Created in 2645ms (Views: 19.2ms | ActiveRecord: 377.8ms)
2017-04-10 03:54:15 [app] [I] Completed 201 Created in 5722ms (Views: 22.0ms | ActiveRecord: 920.4ms)
2017-04-10 03:56:46 [app] [I] Completed 201 Created in 1734ms (Views: 18.2ms | ActiveRecord: 110.8ms)
2017-04-10 03:59:18 [app] [I] Completed 201 Created in 2070ms (Views: 19.9ms | ActiveRecord: 156.3ms)
2017-04-10 04:08:15 [app] [I] Completed 200 OK in 7040ms (Views: 0.9ms | ActiveRecord: 1.0ms)
2017-04-10 04:26:13 [app] [I] Completed 201 Created in 21071ms (Views: 97.7ms | ActiveRecord: 3828.8ms)
2017-04-10 04:29:44 [app] [I] Completed 201 Created in 3627ms (Views: 24.3ms | ActiveRecord: 3008.5ms)
2017-04-10 04:35:01 [app] [I] Completed 201 Created in 373ms (Views: 16.0ms | ActiveRecord: 99.9ms)
2017-04-10 04:35:08 [app] [I] Completed 201 Created in 420ms (Views: 3.2ms | ActiveRecord: 262.5ms)

After all requests should be less than a second *on average*:

2017-04-11 03:19:19 [app] [I] Completed 201 Created in 1132ms (Views: 18.4ms | ActiveRecord: 385.8ms)
2017-04-11 03:22:09 [app] [I] Completed 201 Created in 575ms (Views: 6.2ms | ActiveRecord: 127.4ms)
2017-04-11 03:28:59 [app] [I] Completed 201 Created in 712ms (Views: 14.1ms | ActiveRecord: 135.8ms)
2017-04-11 03:34:55 [app] [I] Completed 201 Created in 599ms (Views: 6.9ms | ActiveRecord: 111.3ms)
2017-04-11 03:35:02 [app] [I] Completed 201 Created in 439ms (Views: 7.2ms | ActiveRecord: 76.7ms)

Comment 10 Lukas Zapletal 2017-04-11 11:32:55 UTC
Chris, I deleted all interfaces on your libvirt host, it had thousands of them. This fixed the problem:

Host.find(27).interfaces.destroy_all

I suggest you to use "Ignore interfaces with matching identifier" to prevent this again. The system has currently 78 interfaces, I assume this will grow as you will be adding more VMs or dockers there (what kind of hypervisor is this?):

Host.find(27).interfaces.map(&:identifier)
=> ["cfeature_el6", "dcapsule_el6", "dcapsule_el7", "downstream_el6", "downstream_el7", "em1.205", "em1.206", "em1.207", "em1.208", "em1.209", "em1.210", "em1.211", "em1.212", "em1.213", "em1.214", "em1.222", "em1.223", "em1.224", "em1.225", "em1.226", "em1.227", "em1.228", "em1.229", "em1.250", "em1.251", "em1.252", "em1.253", "em1.254", "em1.255", "em2", "em3", "em4", "internal_el6", "internal_el7", "katello_latest", "katello_nightly", "libvirt1.205", "libvirt1.206", "libvirt1.207", "libvirt1.208", "libvirt1.209", "libvirt1.210", "libvirt1.211", "libvirt1.212", "libvirt1.213", "libvirt1.214", "libvirt1.222", "libvirt1.223", "libvirt1.224", "libvirt1.225", "libvirt1.226", "libvirt1.227", "libvirt1.228", "libvirt1.229", "libvirt1.250", "libvirt1.251", "libvirt1.252", "libvirt1.253", "libvirt1.254", "libvirt1.255", "nightly_el6", "nightly_el7", "released_el6", "released_el7", "rhci", "sat6major_el6", "sat6major_el7", "sfeature_el6", "testingsat6_el6", "testingsat6_el7", "upgrade_el6", "upgrade_el7", "virbr0", "virbr0_nic", "virbr2", "virbr2_nic", "zstream_el6", "zstream_el7"]

And I have reported this upstream:

http://projects.theforeman.org/issues/19243

and

http://projects.theforeman.org/issues/19244

These are trackers to fix this in more sane way.

The patch for this BZ speeds things up and prevents memory from growing in other case that you experienced, not sure if we should backport. I guess not.

Comment 12 Lukas Zapletal 2017-07-31 13:54:05 UTC
Also part of the 6.3 there will be this issue:

http://projects.theforeman.org/issues/20024

which optimized RHSM importer code to be more efficient on both memory and CPU.

QA NOTES:

MINIMUM TEST SCOPE:

Verify there are no regression in Puppet and RHSM scope.

BEST TEST SCOPE:

Verify import of puppet and RHSM both works under heavy load.

Comment 13 Bryan Kearney 2018-02-21 17:03:49 UTC
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA.

For information on the advisory, and where to find the updated files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:0336