Description of problem:
This is a new bug for an old issue that was closed via errata. After Bug 1437197 was closed and showing positive testing, we upgraded our satellite instance to 6.3.2. Our tower installation still is taking over an hour to refresh inventory from our Satellite 6 server with 988 hosts.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. perform inventory sync from ansible tower
before upgrading from satellite 6.3 to satellite 6.3.2 inventory sync took 3886seconds. After upgrading to 6.3.2 inventory sync took 3216seconds.
Expected inventory sync to be less than 10minutes like other users were reporting.
Both cases mentioned here seem to indicate about 5 seconds per host! Internally on my satellite's I am seeing closer to 300-400ms per host. The difference could be resources on the physical systems or the type of data associated with these hosts.
Would it be possible to get a database backup of one of these systems?
TJ provided me a backup, (THANK YOU!), however I have some bad news. I was able to curl every host (via https://hostname/api/v2/hosts/ID/) in about 240 seconds, about 244ms per host.
There are a couple possibilities i can think of:
a) Hitting the server while other stuff is going on (repos syncing, systems checking in) is causing this to be much slower than on my fresh system
b) the server is resource constrained
It does tell me that the raw data is not what is causing the issue, but not where its slowing down on your individual machines. I'll dig through the cases for foreman-debugs, but any information you can provide regarding cpu usage, io wait, load averages, etc... during an ansible run would be very helpful. Also, the /var/log/foreman/production.log immediately after an ansible run would be helpful as well.
Also, if you could enable slow debug logging in the postgresql configuration:
1. setting log_min_duration_statement to '200' in /var/lib/pgsql/data/postgresql.conf and restarting/reloading postgresql
2. trigger the ansible inventory
3. capture /var/lib/pgsql/data/pg_log/ and upload it
that may be helpful.
this has been uploaded to the support portal. thanks!
I am able to see the slowness in your production.log, which to me indicates that there is likely some slowness in the database layer. It seems its requesting both the full host info (/api/v2/hosts/ID/) and the facts (/api/v2/hosts/ID/facts), and it seems that both of these are quite a bit slower than on my system.
TJ, one thing that i am confused about is that you mentioned that the previous fix, shipped in 6.3.1, showed positive testing. Was this done on the same hardware that you are on now?
Would it be possible to do two things:
Run a disk speed test on the disk device holding /var/lib/pgsql/. See https://access.redhat.com/solutions/3397771 for more information on how to do that.
Can you repeat the steps in comment #11: https://bugzilla.redhat.com/show_bug.cgi?id=1607550#c11 but bump it down to '80' instead of 200.
What user is specified in foreman.ini ? Does it have admin permissions in Satellite?
If not, welcome to : [Bug 1667647] getting host details takes several times more for non-admin user
Created redmine issue https://projects.theforeman.org/issues/27937 from this bug
I have seen previously seen that the performance is 6-8x worse when querying the hosts API as a non-admin user. It is a bad workaround, but potentially changing the satellite user account to be an admin would resolve the performance issue if it is not already. More details at https://bugzilla.redhat.com/show_bug.cgi?id=1667647
Moving this bug to POST for triage into Satellite 6 since the upstream issue https://projects.theforeman.org/issues/27937 has been resolved.
Tested with ~20k hosts DB on Ansible Tower 3.6.2 +patch with these results:
1) Using admin user credentials the sync time has improved 24x without facts, 3.8x faster with facts
2) Using non-admin user credentials the sync time has improved 10x w/o facts, 3.5x faster with facts
3) There is remaining time penalty for the non-admin users due to authorization checks which can not be omitted. This impact grows with hosts count and DB complexity significantly. It is recommended to use admin user for large DBs (thousands of hosts).
To see these improvements you need to have Ansible Tower in 3.6.3 version (or later) and Satellite 6.7.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.