Bug 1607550 - Ansible Tower inventory integration is slow
Summary: Ansible Tower inventory integration is slow
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Satellite
Classification: Red Hat
Component: API
Version: 6.3.2
Hardware: x86_64
OS: Linux
high
high
Target Milestone: 6.7.0
Assignee: satellite6-bugs
QA Contact: Vladimír Sedmík
URL:
Whiteboard:
Depends On: 1782807
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-07-23 18:04 UTC by TJ Renna
Modified: 2020-04-14 13:23 UTC (History)
19 users (show)

Fixed In Version:
Doc Type: Enhancement
Doc Text:
This release includes a new reports API to improve the performance of Ansible Tower inventory integration.
Clone Of:
Environment:
Last Closed: 2020-04-14 13:23:24 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Foreman Issue Tracker 27937 High Closed Ansible Tower inventory integration is slow 2020-10-05 16:15:18 UTC
Red Hat Product Errata RHSA-2020:1454 None None None 2020-04-14 13:23:37 UTC

Internal Links: 1726386

Description TJ Renna 2018-07-23 18:04:10 UTC
Description of problem:
This is a new bug for an old issue that was closed via errata.  After Bug 1437197 was closed and showing positive testing, we upgraded our satellite instance to 6.3.2.  Our tower installation still is taking over an hour to refresh inventory from our Satellite 6 server with 988 hosts.

Version-Release number of selected component (if applicable):
6.3.2

How reproducible:
Everytime

Steps to Reproduce:
1. perform inventory sync from ansible tower 


Actual results:
before upgrading from satellite 6.3  to satellite 6.3.2 inventory sync took 3886seconds. After upgrading to 6.3.2 inventory sync took 3216seconds.

Expected results:
Expected inventory sync to be less than 10minutes like other users were reporting.

Additional info:

Comment 6 Justin Sherrill 2018-10-26 01:25:38 UTC
Hi,

Both cases mentioned here seem to indicate about 5 seconds per host!  Internally on my satellite's I am seeing closer to 300-400ms per host.  The difference could be resources on the physical systems or the type of data associated with these hosts.  

Would it be possible to get a database backup of one of these systems?

Thanks,
Justin Sherrill

Comment 10 Justin Sherrill 2018-10-30 01:49:20 UTC
TJ provided me a backup, (THANK YOU!), however I have some bad news.  I was able to curl every host (via https://hostname/api/v2/hosts/ID/) in about 240 seconds, about 244ms per host.  

There are a couple possibilities i can think of:

a) Hitting the server while other stuff is going on (repos syncing, systems checking in) is causing this to be much slower than on my fresh system

b) the server is resource constrained


It does tell me that the raw data is not what is causing the issue, but not where its slowing down on your individual machines.  I'll dig through the cases for foreman-debugs, but any information you can provide regarding cpu usage, io wait, load averages, etc... during an ansible run would be very helpful.  Also, the /var/log/foreman/production.log immediately after an ansible run would be helpful as well.

Comment 11 Justin Sherrill 2018-10-30 02:32:05 UTC
 Also, if you could enable slow debug logging in the postgresql configuration:

1. setting log_min_duration_statement to '200' in /var/lib/pgsql/data/postgresql.conf and restarting/reloading postgresql

2. trigger the ansible inventory

3.  capture /var/lib/pgsql/data/pg_log/ and upload it

that may be helpful.

Comment 12 TJ Renna 2018-11-05 12:15:32 UTC
this has been uploaded to the support portal. thanks!

Comment 13 Justin Sherrill 2018-11-05 16:34:16 UTC
Thanks TJ,

I am able to see the slowness in your production.log, which to me indicates that there is likely some slowness in the database layer.  It seems its requesting both the full host info (/api/v2/hosts/ID/) and the facts (/api/v2/hosts/ID/facts), and it seems that both of these are quite a bit slower than on my system.  

TJ, one thing that i am confused about is that you mentioned that the previous fix, shipped in 6.3.1, showed positive testing.  Was this done on the same hardware that you are on now?


Would it be possible to do two things:

Run a disk speed test on the disk device holding /var/lib/pgsql/.  See https://access.redhat.com/solutions/3397771 for more information on how to do that.

Can you repeat the steps in comment #11: https://bugzilla.redhat.com/show_bug.cgi?id=1607550#c11  but bump it down to '80' instead of 200.

Comment 26 Pavel Moravec 2019-01-30 16:20:55 UTC
What user is specified in foreman.ini ? Does it have admin permissions in Satellite?

If not, welcome to : [Bug 1667647] getting host details takes several times more for non-admin user

Comment 32 Marek Hulan 2019-09-25 12:29:12 UTC
Created redmine issue https://projects.theforeman.org/issues/27937 from this bug

Comment 33 Ryan 2019-10-11 03:19:11 UTC
I have seen previously seen that the performance is 6-8x worse when querying the hosts API as a non-admin user. It is a bad workaround, but potentially changing the satellite user account to be an admin would resolve the performance issue if it is not already. More details at https://bugzilla.redhat.com/show_bug.cgi?id=1667647

Comment 34 Bryan Kearney 2019-10-21 14:02:13 UTC
Moving this bug to POST for triage into Satellite 6 since the upstream issue https://projects.theforeman.org/issues/27937 has been resolved.

Comment 36 Vladimír Sedmík 2020-02-06 10:45:08 UTC
Tested with ~20k hosts DB on Ansible Tower 3.6.2 +patch with these results:
1) Using admin user credentials the sync time has improved 24x without facts, 3.8x faster with facts
2) Using non-admin user credentials the sync time has improved 10x w/o facts, 3.5x faster with facts
3) There is remaining time penalty for the non-admin users due to authorization checks which can not be omitted. This impact grows with hosts count and DB complexity significantly. It is recommended to use admin user for large DBs (thousands of hosts).

To see these improvements you need to have Ansible Tower in 3.6.3 version (or later) and Satellite 6.7.

Comment 39 errata-xmlrpc 2020-04-14 13:23:24 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:1454


Note You need to log in before you can comment on or make changes to this bug.