Bugzilla (bugzilla.redhat.com) will be under maintenance for infrastructure upgrades and will not be unavailable on July 31st between 12:30 AM - 05:30 AM UTC. We appreciate your understanding and patience. You can follow status.redhat.com for details.
Bug 1419667 - host list API call takes a long time
Summary: host list API call takes a long time
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Satellite
Classification: Red Hat
Component: Hosts
Version: 6.2.7
Hardware: Unspecified
OS: Unspecified
urgent
high
Target Milestone: Unspecified
Assignee: Shimon Shtein
QA Contact: Roman Plevka
URL:
Whiteboard:
: 1271274 1297683 (view as bug list)
Depends On:
Blocks: 1296845 1426378
TreeView+ depends on / blocked
 
Reported: 2017-02-06 17:13 UTC by Chris Duryee
Modified: 2021-03-11 14:56 UTC (History)
24 users (show)

Fixed In Version: tfm-rubygem-katello-3.0.0.117-1,foreman-1.11.0.73-1,rubygem-katello-3.0.0.117-1
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-05-01 13:59:10 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Foreman Issue Tracker 18136 0 Normal Closed Reports type column should have an index 2020-10-05 20:42:44 UTC
Foreman Issue Tracker 18841 0 Normal Closed Avoid N+1 query in foreman hosts index by declaring proper scope 2020-10-05 20:42:44 UTC
Foreman Issue Tracker 18843 0 Normal Closed Change eager_load to includes statement in hosts api controller's index action 2020-10-05 20:42:44 UTC
Red Hat Bugzilla 1447958 0 urgent CLOSED [regression] host search by organization never finishes but causes mem.leak in foreman process 2021-03-11 15:11:49 UTC
Red Hat Knowledge Base (Solution) 1274723 0 None None None 2018-12-06 20:49:59 UTC
Red Hat Product Errata RHBA-2017:1191 0 normal SHIPPED_LIVE Satellite 6.2.9 Async Bug Release 2017-05-01 17:49:42 UTC

Internal Links: 1447958

Description Chris Duryee 2017-02-06 17:13:00 UTC
Description of problem:

The following call can take a very long time to complete:

curl -k -u admin:password -H "Accept:application/json,version=2" 'https://localhost/api/hosts?per_page=4500'

log result:

2017-02-06 11:46:58 [app] [I]   Rendered api/v2/hosts/index.json.rabl within api/v2/layouts/index_layout (267610.0ms)
2017-02-06 11:46:58 [app] [I] Completed 200 OK in 268842ms (Views: 133061.4ms | ActiveRecord: 134686.3ms)

It looks like a lot of the time is related to looking up errata info. It's not any one particular call, but running sql statements for each host can really add up.

If there was an additional flag to only provide some basic details without the full errata info, that works too. Specifically, the following is helpful:

 * Hostname
 * Subscription status
 * Subnet name

Version-Release number of selected component (if applicable): 6.2.7

Comment 6 Daniel Lobato Garcia 2017-02-13 09:32:50 UTC
Chris - using per_page=4500 is definitely not a good idea. The whole reason why the API is paginated is to avoid these kind of waits. 

What kind of script uses this? Why can it not use the paginated responses?

Comment 9 Chris Duryee 2017-02-13 13:57:30 UTC
re #6, the tricky part is that for scripting purposes, some users may want to just grab everything at once instead of paginating. This is a common issue for users who want to integrate their Satellite's host data with other systems, and have a script that runs via cron job to sync things over.

Usually for these purposes, it's OK to have a much smaller subset of the data. For example, even just a few fields (hostname and one or two others) may be needed. Also, it's OK if it takes a bit of time for the API to return (maybe 10-30 seconds).

Ideally, users would be able to hit the API for this data and not have to run sql. This bug may be more of an RFE to use the API in this way, not sure.

Comment 11 Dan Heath 2017-02-13 16:53:47 UTC
Hi - I should point out that even using pagination and a fairly modest per_page size results in longer than expected response times:

$ time curl -s "https://satellite/api/v2/hosts?page=1&per_page=20" -o /dev/null

real    0m5.258s
user    0m0.085s
sys     0m0.105s

$ time curl -s "https://satellite/api/v2/hosts?page=1&per_page=50" -o /dev/null

real    0m9.011s
user    0m0.099s
sys     0m0.099s

$ time curl -s "https://satellite/api/v2/hosts?page=1&per_page=100" -o /dev/null

real    0m22.034s
user    0m0.081s
sys     0m0.102s

Are those sort of numbers to be expected?  The net result is the same - a full host list can take 15-20 minutes to fetch.

Thanks - Dan.

Comment 13 Shimon Shtein 2017-02-19 19:00:22 UTC
The feature is tracked upstream: http://projects.theforeman.org/issues/3019. Unfortunately it didn't get enough traction till now. I will see if we can improve this sort of queries.

Take into consideration memory usage too - ~4500K records will result in pretty big memory footprint on server side.

I have to admit that on our lab setup, the timings were not that bad - around 1 min for 1000 hosts, or 12 sec for 100 hosts. It's an environment with ~5K hosts total.

From the attached log I can see that each 10 hosts request took around 3 seconds to complete.
I can see that it was running in parallel with GenerateApplicability tasks that are creating load on both memory and database on the server.
Worth trying the same script in a more "quiet" environment. Also I would recommend fetching 100 hosts at a time, since it would do less "handshakes" in order to get the results.

Comment 14 Satellite Program 2017-02-21 15:40:04 UTC
Connecting redmine issue http://projects.theforeman.org/issues/3019 from this bug

Comment 15 Shimon Shtein 2017-02-21 19:53:40 UTC
After investigation, I can see the reason that the query is so slow: It's a huge N+1 problem (around 100 sql queries for 10 hosts page).

The good news are that it's already solved (almost) in upstream:
https://github.com/theforeman/foreman/pull/3887 in Foreman core
https://github.com/Katello/katello/pull/6236 in Katello to fix the UI
and I am working on a similar patch to fix the API part.
The bad news are that we will have to backport it in order to get it in 6.2.z

Comment 20 Peter Vreman 2017-02-24 15:49:28 UTC
I confirm that sometimes it is better for performance to use 1 big query to retrieve all results at once than using multiple queries. This is because each new query has an additional latency.

Example from a reporting tool across multiple satellite instances i have created:
I modified the code to use a single query to grab all facts of all hosts using a query with per_page=999999. This takes 30 seconds to finish.
In the old sutation i queried for each of my 300 hosts. Each individual host query took in avarage 0.75 second, total than in average 250 seconds. the single big  takes only 30 seconds.
That is about 4 minutes faster, makes the reporting feel much more snappier and makes it usefull to run more frequent to see progress during large updates to the landscape.

Comment 21 Shimon Shtein 2017-02-26 07:52:17 UTC
Last part was merged in upstream: https://github.com/Katello/katello/pull/6619

Comment 24 Shimon Shtein 2017-02-28 18:54:44 UTC
For completeness sake, we should also add index to reports table as described in https://github.com/theforeman/foreman/pull/4204
This one can be cherrypicked automatically.

After applying all four patches, I see a significant improvement in response times.

Comment 25 Shimon Shtein 2017-03-05 09:25:09 UTC
*** Bug 1297683 has been marked as a duplicate of this bug. ***

Comment 29 Shimon Shtein 2017-03-14 18:20:50 UTC
After a test round on big database, found two issues that should be addressed too in order to get normal performance:
1. In core need to change eager_load to preload (already done as part of MR 218)
2. Add OpenSCAP's policies table to fix its N+1 query too (attached redmine #18841, hopefully the patch can be applied downstream automatically). Pull request https://github.com/theforeman/foreman_openscap/pull/247.

Comment 32 Shimon Shtein 2017-03-21 13:18:44 UTC
@chris, yes please, send me the backups. I will try to spin up a clean machine and I will test the changes there.

Comment 33 Shimon Shtein 2017-03-29 17:18:19 UTC
@chris, after applying everything from this BZ, we are left with only one N+1 query: foreman_tasks status. There is already an open upstream issue to track it.
I can't say that it's lightning fast now, but it looks pretty much OK to me.

Comment 39 Satellite Program 2017-04-24 18:14:17 UTC
Upstream bug assigned to sshtein@redhat.com

Comment 41 Shimon Shtein 2017-04-26 08:03:23 UTC
Summary:

The cause for slowness was identified as a N+1 query caused by output of fields from relations added by plugins, like Katello and OpenSCAP. To address this problem a change was made to foreman core [1] to allow plugins declaring needed relations before the query is executed. Once the framework was in place relevant plugins could declare the needed fields:
Katello [2]
OpenSCAP [3]


[1] http://projects.theforeman.org/issues/16646 and http://projects.theforeman.org/issues/18843
[2] http://projects.theforeman.org/issues/16010 and http://projects.theforeman.org/issues/18586
[3] http://projects.theforeman.org/issues/18841

Another issue that affected query performance was lack of index in reports table (http://projects.theforeman.org/issues/18136).

There is still one N+1 query in the code, it comes from remote execution status field. Since the fix is not trivial, it will be tracked in its own BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1445623

Comment 44 errata-xmlrpc 2017-05-01 13:59:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:1191

Comment 45 Shimon Shtein 2017-11-26 11:03:19 UTC
*** Bug 1271274 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.