Bug 1334924
| Summary: | Slow response times due to Foreman locking query | ||
|---|---|---|---|
| Product: | Red Hat Satellite | Reporter: | Chris Adams <cadams> |
| Component: | Tasks Plugin | Assignee: | Ivan Necas <inecas> |
| Status: | CLOSED ERRATA | QA Contact: | jcallaha |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 6.1.7 | CC: | bbuckingham, bkearney, cwelton, erinn.looneytriggs, inecas, jcallaha, jentrena, ktordeur, mmccune, oshtaier, rdixon, sabnave, smalik, xdmoon |
| Target Milestone: | Unspecified | Keywords: | Triaged |
| Target Release: | Unused | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | rubygem-foreman-tasks-0.7.14.7-1 | Doc Type: | Bug Fix |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2016-10-04 06:43:38 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Chris Adams
2016-05-10 19:48:56 UTC
One other improvement we made to the base config was adding an Apache conf file in the foreman .d directories which capped the maximum number of processes for Foreman below the overall limit. That reduced the number of 100% CPU Postgres processes but also, more importantly, ensured that non-Foreman Apache requests weren't getting blocked by those Foreman workers running for hours so the other apps didn't stop responding. I don't think it's addressed in 6.2, this should be an easy fix with adding the indexes as described in https://bugzilla.redhat.com/show_bug.cgi?id=1334924#c0 Created redmine issue http://projects.theforeman.org/issues/15283 from this bug Upstream bug component is Remote Execution Upstream bug component is Tasks Plugin Chris, Would it be possible for you to include the config you cite here: https://bugzilla.redhat.com/show_bug.cgi?id=1334924#c1 I'd like to get something like this in place, -Erinn Erinn: the current version of our config is actually quite simple (“PassengerMaxPoolSize 10” dropped into both /etc/httpd/conf.d/05-foreman.d/site-tuning.conf and /etc/httpd/conf.d/05-foreman-ssl.d/site-tuning.conf). That's actually the second pass at that config – one of the earlier troubleshooting steps silently removed the earlier config file – and the values are probably quite conservative. I'm not the owner for our Satellite server & those values were an emergency fix to end the downtime since we have a full rebuild in progress. After bumping up Postgres's shared_buffers and work_mem values and creating those indexes I'd expect the numbers could be increased considerably. Upstream bug assigned to inecas Upstream bug assigned to inecas Moving this bug to POST for triage into Satellite 6 since the upstream issue http://projects.theforeman.org/issues/15283 has been resolved. *** Bug 1378830 has been marked as a duplicate of this bug. *** I don't understand a lot about postgres but I can tell you even after putting those indexes in place I am running into query performance issues, these are consuming all CPU resources on the system:
foreman=# explain analyze SELECT COUNT(*) FROM "foreman_tasks_locks" INNER JOIN "foreman_tasks_tasks" ON "foreman_tasks_tasks"."id" = "foreman_tasks_locks"."task_id" WHERE (foreman_tasks_tasks.state != 'stopped') AND ("foreman_tasks_locks"."task_id" NOT IN ('8abb9ca3-815f-49ea-b97b-6aa505f2a29a')) AND "foreman_tasks_locks"."name" = 'task_owner' AND "foreman_tasks_locks"."resource_id" = 1 AND "foreman_tasks_locks"."resource_type" = 'User' AND "foreman_tasks_locks"."exclusive" = 't';
QUERY PLAN
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
----------------------------------
Aggregate (cost=99686.75..99686.76 rows=1 width=0) (actual time=117418.632..11
7418.633 rows=1 loops=1)
-> Nested Loop (cost=10291.17..99686.75 rows=1 width=0) (actual time=117418
.610..117418.610 rows=0 loops=1)
Join Filter: ((foreman_tasks_locks.task_id)::text = (foreman_tasks_task
s.id)::text)
-> Seq Scan on foreman_tasks_tasks (cost=0.00..28213.80 rows=1 width=
37) (actual time=0.101..857.333 rows=100 loops=1)
Filter: ((state)::text <> 'stopped'::text)
Rows Removed by Filter: 590769
-> Bitmap Heap Scan on foreman_tasks_locks (cost=10291.17..70187.01 r
ows=102875 width=37) (actual time=1165.321..1165.321 rows=0 loops=100)
Recheck Cond: (((name)::text = 'task_owner'::text) AND ((resource
_type)::text = 'User'::text) AND (resource_id = 1))
Rows Removed by Index Recheck: 1197069
Filter: (exclusive AND ((task_id)::text <> '8abb9ca3-815f-49ea-b9
7b-6aa505f2a29a'::text))
Rows Removed by Filter: 1647091
-> Bitmap Index Scan on index_foreman_tasks_locks_name_resource_
type_resource_id (cost=0.00..10265.45 rows=291542 width=0) (actual time=346.569
..346.569 rows=1647091 loops=100)
Index Cond: (((name)::text = 'task_owner'::text) AND ((reso
urce_type)::text = 'User'::text) AND (resource_id = 1))
Total runtime: 117418.692 ms
(14 rows)
As mentioned the indexes are in place, but I really don't understand the above or the internals of postgresql so something is probably missing.
foreman=# select t.relname as table_name,
i.relname as index_name,
array_to_string(array_agg(a.attname), ', ') as column_names
from
pg_class t,
pg_class i,
pg_index ix,
pg_attribute a
where
t.oid = ix.indrelid
and i.oid = ix.indexrelid
and a.attrelid = t.oid
and a.attnum = ANY(ix.indkey)
and t.relkind = 'r'
and t.relname like 'foreman_tasks_%'
group by
t.relname,
i.relname
order by
t.relname,
i.relname;
table_name | index_name
| column_names
----------------------------------+---------------------------------------------
---------------+----------------------------------
foreman_tasks_locks | foreman_tasks_locks_pkey
| id
foreman_tasks_locks | index_foreman_tasks_locks_name_resource_type
_resource_id | resource_id, name, resource_type
foreman_tasks_locks | index_foreman_tasks_locks_on_resource_type_a
nd_resource_id | resource_type, resource_id
foreman_tasks_recurring_logics | foreman_tasks_recurring_logics_pkey
| id
foreman_tasks_task_group_members | foreman_tasks_task_group_members_index
| task_id, task_group_id
foreman_tasks_task_group_members | foreman_tasks_task_group_members_pkey
| id
foreman_tasks_task_groups | foreman_tasks_task_groups_pkey
| id
foreman_tasks_tasks | index_foreman_tasks_id_state
| state, id
foreman_tasks_tasks | index_foreman_tasks_tasks_on_type_and_label
| label, type
foreman_tasks_triggerings | foreman_tasks_triggerings_pkey
| id
(10 rows)
And it looks a lot like I have fixed it (this is 6.2.2 by the way):
foreman=# explain analyze SELECT COUNT(*) FROM "foreman_tasks_locks" INNER JOIN "foreman_tasks_tasks" ON "foreman_tasks_tasks"."id" = "foreman_tasks_locks"."task_id" WHERE (foreman_tasks_tasks.state != 'stopped') AND ("foreman_tasks_locks"."task_id" NOT IN ('8abb9ca3-815f-49ea-b97b-6aa505f2a29a')) AND "foreman_tasks_locks"."name" = 'task_owner' AND "foreman_tasks_locks"."resource_id" = 1 AND "foreman_tasks_locks"."resource_type" = 'User' AND "foreman_tasks_locks"."exclusive" = 't';
QUERY PLAN
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
----------------------------
Aggregate (cost=89414.34..89414.35 rows=1 width=0) (actual time=0.072..0.072 r
ows=1 loops=1)
-> Hash Join (cost=32232.62..89414.31 rows=12 width=0) (actual time=0.069..
0.069 rows=0 loops=1)
Hash Cond: ((foreman_tasks_locks.task_id)::text = (foreman_tasks_tasks.
id)::text)
-> Bitmap Heap Scan on foreman_tasks_locks (cost=4014.51..60810.19 ro
ws=102904 width=37) (actual time=0.069..0.069 rows=0 loops=1)
Recheck Cond: (((name)::text = 'task_owner'::text) AND ((resource
_type)::text = 'User'::text) AND (resource_id = 1))
Filter: (exclusive AND ((task_id)::text <> '8abb9ca3-815f-49ea-b9
7b-6aa505f2a29a'::text))
-> Bitmap Index Scan on index_foreman_tasks_locks_name_resource_
type_resource_id_exclus (cost=0.00..3988.78 rows=102904 width=0) (actual time=0
.067..0.067 rows=0 loops=1)
Index Cond: (((name)::text = 'task_owner'::text) AND ((reso
urce_type)::text = 'User'::text) AND (resource_id = 1) AND (exclusive = true))
-> Hash (cost=28217.12..28217.12 rows=79 width=37) (never executed)
-> Seq Scan on foreman_tasks_tasks (cost=0.00..28217.12 rows=79
width=37) (never executed)
Filter: ((state)::text <> 'stopped'::text)
Total runtime: 0.130 ms
(12 rows)
By creating this index:
foreman=# CREATE INDEX index_foreman_tasks_locks_name_resource_type_resource_id_exclusive ON foreman_tasks_locks (name, resource_type, resource_id, exclusive);NOTICE: identifier "index_foreman_tasks_locks_name_resource_type_resource_id_exclusive" will be truncated to "index_foreman_tasks_locks_name_resource_type_resource_id_exclus"
CREATE INDEX
Obviously the name can't be that, but it might be worth adding this to the pull request that is currently going through QA.
Satellite 6.2.2 Snap 1 results
[root@rhsm-qe-1 ~]# sudo su - postgres -s /bin/bash -c "psql -d foreman -c '\d foreman_tasks_tasks'"
Table "public.foreman_tasks_tasks"
Column | Type | Modifiers
----------------+-----------------------------+-----------
id | character varying(255) | not null
type | character varying(255) | not null
label | character varying(255) |
started_at | timestamp without time zone |
ended_at | timestamp without time zone |
state | character varying(255) | not null
result | character varying(255) | not null
external_id | character varying(255) |
parent_task_id | character varying(255) |
start_at | timestamp without time zone |
start_before | timestamp without time zone |
Indexes:
"index_foreman_tasks_id_state" btree (id, state)
"index_foreman_tasks_tasks_on_type_and_label" btree (type, label)
[root@rhsm-qe-1 ~]# sudo su - postgres -s /bin/bash -c "psql -d foreman -c '\d foreman_tasks_locks'"
Table "public.foreman_tasks_locks"
Column | Type | Modifiers
---------------+------------------------+------------------------------------------------------------------
id | integer | not null default nextval('foreman_tasks_locks_id_seq'::regclass)
task_id | character varying(255) | not null
name | character varying(255) | not null
resource_type | character varying(255) |
resource_id | integer |
exclusive | boolean |
Indexes:
"foreman_tasks_locks_pkey" PRIMARY KEY, btree (id)
"index_foreman_tasks_locks_name_resource_type_resource_id" btree (name, resource_type, resource_id)
"index_foreman_tasks_locks_on_resource_type_and_resource_id" btree (resource_type, resource_id)
Ivan, can you weight in on #21 and #22?
Thanks @Erinn for giving this a try. I've opened additional PR to include this params combination into index as well. https://github.com/theforeman/foreman-tasks/pull/205 Verified in Satellite 6.2.2 Async Snap 2 Since Ivan's original patch is in place and has already shown some significant performance boosts. I will verify this one and vote we push the next patch to 6.2.3. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2016:1993 |