1607550 – Ansible Tower inventory integration is slow

Red Hat Satellite engineering is moving the tracking of its product development work on Satellite to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "Satellite project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs will be migrated starting at the end of May. If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "Satellite project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/SAT-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1607550 - Ansible Tower inventory integration is slow

Summary: Ansible Tower inventory integration is slow

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Satellite
Classification:	Red Hat
Component:	API
Sub Component:
Version:	6.3.2
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	6.7.0
Assignee:	satellite6-bugs
QA Contact:	Vladimír Sedmík
Docs Contact:
URL:
Whiteboard:
Depends On:	1782807
Blocks:
TreeView+	depends on / blocked

Reported:	2018-07-23 18:04 UTC by TJ Renna
Modified:	2023-10-06 17:51 UTC (History)
CC List:	19 users (show)
Fixed In Version:
Doc Type:	Enhancement
Doc Text:	This release includes a new reports API to improve the performance of Ansible Tower inventory integration.
Clone Of:
Environment:
Last Closed:	2020-04-14 13:23:24 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Foreman Issue Tracker	27937	0	High	Closed	Ansible Tower inventory integration is slow	2021-02-19 00:43:50 UTC
Red Hat Product Errata	RHSA-2020:1454	0	None	None	None	2020-04-14 13:23:37 UTC

Internal Links: 1726386

Description TJ Renna 2018-07-23 18:04:10 UTC

Description of problem:
This is a new bug for an old issue that was closed via errata.  After Bug 1437197 was closed and showing positive testing, we upgraded our satellite instance to 6.3.2.  Our tower installation still is taking over an hour to refresh inventory from our Satellite 6 server with 988 hosts.

Version-Release number of selected component (if applicable):
6.3.2

How reproducible:
Everytime

Steps to Reproduce:
1. perform inventory sync from ansible tower 


Actual results:
before upgrading from satellite 6.3  to satellite 6.3.2 inventory sync took 3886seconds. After upgrading to 6.3.2 inventory sync took 3216seconds.

Expected results:
Expected inventory sync to be less than 10minutes like other users were reporting.

Additional info:

Comment 6 Justin Sherrill 2018-10-26 01:25:38 UTC

Hi,

Both cases mentioned here seem to indicate about 5 seconds per host!  Internally on my satellite's I am seeing closer to 300-400ms per host.  The difference could be resources on the physical systems or the type of data associated with these hosts.  

Would it be possible to get a database backup of one of these systems?

Thanks,
Justin Sherrill

Comment 10 Justin Sherrill 2018-10-30 01:49:20 UTC

TJ provided me a backup, (THANK YOU!), however I have some bad news.  I was able to curl every host (via https://hostname/api/v2/hosts/ID/) in about 240 seconds, about 244ms per host.  

There are a couple possibilities i can think of:

a) Hitting the server while other stuff is going on (repos syncing, systems checking in) is causing this to be much slower than on my fresh system

b) the server is resource constrained


It does tell me that the raw data is not what is causing the issue, but not where its slowing down on your individual machines.  I'll dig through the cases for foreman-debugs, but any information you can provide regarding cpu usage, io wait, load averages, etc... during an ansible run would be very helpful.  Also, the /var/log/foreman/production.log immediately after an ansible run would be helpful as well.

Comment 11 Justin Sherrill 2018-10-30 02:32:05 UTC

 Also, if you could enable slow debug logging in the postgresql configuration:

1. setting log_min_duration_statement to '200' in /var/lib/pgsql/data/postgresql.conf and restarting/reloading postgresql

2. trigger the ansible inventory

3.  capture /var/lib/pgsql/data/pg_log/ and upload it

that may be helpful.

Comment 12 TJ Renna 2018-11-05 12:15:32 UTC

this has been uploaded to the support portal. thanks!

Comment 13 Justin Sherrill 2018-11-05 16:34:16 UTC

Thanks TJ,

I am able to see the slowness in your production.log, which to me indicates that there is likely some slowness in the database layer.  It seems its requesting both the full host info (/api/v2/hosts/ID/) and the facts (/api/v2/hosts/ID/facts), and it seems that both of these are quite a bit slower than on my system.  

TJ, one thing that i am confused about is that you mentioned that the previous fix, shipped in 6.3.1, showed positive testing.  Was this done on the same hardware that you are on now?


Would it be possible to do two things:

Run a disk speed test on the disk device holding /var/lib/pgsql/.  See https://access.redhat.com/solutions/3397771 for more information on how to do that.

Can you repeat the steps in comment #11: https://bugzilla.redhat.com/show_bug.cgi?id=1607550#c11  but bump it down to '80' instead of 200.

Comment 26 Pavel Moravec 2019-01-30 16:20:55 UTC

What user is specified in foreman.ini ? Does it have admin permissions in Satellite?

If not, welcome to : [Bug 1667647] getting host details takes several times more for non-admin user

Comment 32 Marek Hulan 2019-09-25 12:29:12 UTC

Created redmine issue https://projects.theforeman.org/issues/27937 from this bug

Comment 33 Ryan 2019-10-11 03:19:11 UTC

I have seen previously seen that the performance is 6-8x worse when querying the hosts API as a non-admin user. It is a bad workaround, but potentially changing the satellite user account to be an admin would resolve the performance issue if it is not already. More details at https://bugzilla.redhat.com/show_bug.cgi?id=1667647

Comment 34 Bryan Kearney 2019-10-21 14:02:13 UTC

Moving this bug to POST for triage into Satellite 6 since the upstream issue https://projects.theforeman.org/issues/27937 has been resolved.

Comment 36 Vladimír Sedmík 2020-02-06 10:45:08 UTC

Tested with ~20k hosts DB on Ansible Tower 3.6.2 +patch with these results:
1) Using admin user credentials the sync time has improved 24x without facts, 3.8x faster with facts
2) Using non-admin user credentials the sync time has improved 10x w/o facts, 3.5x faster with facts
3) There is remaining time penalty for the non-admin users due to authorization checks which can not be omitted. This impact grows with hosts count and DB complexity significantly. It is recommended to use admin user for large DBs (thousands of hosts).

To see these improvements you need to have Ansible Tower in 3.6.3 version (or later) and Satellite 6.7.

Comment 39 errata-xmlrpc 2020-04-14 13:23:24 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:1454

Note You need to log in before you can comment on or make changes to this bug.