Bug 1334924
Summary: | Slow response times due to Foreman locking query | ||
---|---|---|---|
Product: | Red Hat Satellite | Reporter: | Chris Adams <cadams> |
Component: | Tasks Plugin | Assignee: | Ivan Necas <inecas> |
Status: | CLOSED ERRATA | QA Contact: | jcallaha |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 6.1.7 | CC: | bbuckingham, bkearney, cwelton, erinn.looneytriggs, inecas, jcallaha, jentrena, ktordeur, mmccune, oshtaier, rdixon, sabnave, smalik, xdmoon |
Target Milestone: | Unspecified | Keywords: | Triaged |
Target Release: | Unused | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | rubygem-foreman-tasks-0.7.14.7-1 | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2016-10-04 06:43:38 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Chris Adams
2016-05-10 19:48:56 UTC
One other improvement we made to the base config was adding an Apache conf file in the foreman .d directories which capped the maximum number of processes for Foreman below the overall limit. That reduced the number of 100% CPU Postgres processes but also, more importantly, ensured that non-Foreman Apache requests weren't getting blocked by those Foreman workers running for hours so the other apps didn't stop responding. I don't think it's addressed in 6.2, this should be an easy fix with adding the indexes as described in https://bugzilla.redhat.com/show_bug.cgi?id=1334924#c0 Created redmine issue http://projects.theforeman.org/issues/15283 from this bug Upstream bug component is Remote Execution Upstream bug component is Tasks Plugin Chris, Would it be possible for you to include the config you cite here: https://bugzilla.redhat.com/show_bug.cgi?id=1334924#c1 I'd like to get something like this in place, -Erinn Erinn: the current version of our config is actually quite simple (“PassengerMaxPoolSize 10” dropped into both /etc/httpd/conf.d/05-foreman.d/site-tuning.conf and /etc/httpd/conf.d/05-foreman-ssl.d/site-tuning.conf). That's actually the second pass at that config – one of the earlier troubleshooting steps silently removed the earlier config file – and the values are probably quite conservative. I'm not the owner for our Satellite server & those values were an emergency fix to end the downtime since we have a full rebuild in progress. After bumping up Postgres's shared_buffers and work_mem values and creating those indexes I'd expect the numbers could be increased considerably. Upstream bug assigned to inecas Upstream bug assigned to inecas Moving this bug to POST for triage into Satellite 6 since the upstream issue http://projects.theforeman.org/issues/15283 has been resolved. *** Bug 1378830 has been marked as a duplicate of this bug. *** I don't understand a lot about postgres but I can tell you even after putting those indexes in place I am running into query performance issues, these are consuming all CPU resources on the system: foreman=# explain analyze SELECT COUNT(*) FROM "foreman_tasks_locks" INNER JOIN "foreman_tasks_tasks" ON "foreman_tasks_tasks"."id" = "foreman_tasks_locks"."task_id" WHERE (foreman_tasks_tasks.state != 'stopped') AND ("foreman_tasks_locks"."task_id" NOT IN ('8abb9ca3-815f-49ea-b97b-6aa505f2a29a')) AND "foreman_tasks_locks"."name" = 'task_owner' AND "foreman_tasks_locks"."resource_id" = 1 AND "foreman_tasks_locks"."resource_type" = 'User' AND "foreman_tasks_locks"."exclusive" = 't'; QUERY PLAN -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- ---------------------------------- Aggregate (cost=99686.75..99686.76 rows=1 width=0) (actual time=117418.632..11 7418.633 rows=1 loops=1) -> Nested Loop (cost=10291.17..99686.75 rows=1 width=0) (actual time=117418 .610..117418.610 rows=0 loops=1) Join Filter: ((foreman_tasks_locks.task_id)::text = (foreman_tasks_task s.id)::text) -> Seq Scan on foreman_tasks_tasks (cost=0.00..28213.80 rows=1 width= 37) (actual time=0.101..857.333 rows=100 loops=1) Filter: ((state)::text <> 'stopped'::text) Rows Removed by Filter: 590769 -> Bitmap Heap Scan on foreman_tasks_locks (cost=10291.17..70187.01 r ows=102875 width=37) (actual time=1165.321..1165.321 rows=0 loops=100) Recheck Cond: (((name)::text = 'task_owner'::text) AND ((resource _type)::text = 'User'::text) AND (resource_id = 1)) Rows Removed by Index Recheck: 1197069 Filter: (exclusive AND ((task_id)::text <> '8abb9ca3-815f-49ea-b9 7b-6aa505f2a29a'::text)) Rows Removed by Filter: 1647091 -> Bitmap Index Scan on index_foreman_tasks_locks_name_resource_ type_resource_id (cost=0.00..10265.45 rows=291542 width=0) (actual time=346.569 ..346.569 rows=1647091 loops=100) Index Cond: (((name)::text = 'task_owner'::text) AND ((reso urce_type)::text = 'User'::text) AND (resource_id = 1)) Total runtime: 117418.692 ms (14 rows) As mentioned the indexes are in place, but I really don't understand the above or the internals of postgresql so something is probably missing. foreman=# select t.relname as table_name, i.relname as index_name, array_to_string(array_agg(a.attname), ', ') as column_names from pg_class t, pg_class i, pg_index ix, pg_attribute a where t.oid = ix.indrelid and i.oid = ix.indexrelid and a.attrelid = t.oid and a.attnum = ANY(ix.indkey) and t.relkind = 'r' and t.relname like 'foreman_tasks_%' group by t.relname, i.relname order by t.relname, i.relname; table_name | index_name | column_names ----------------------------------+--------------------------------------------- ---------------+---------------------------------- foreman_tasks_locks | foreman_tasks_locks_pkey | id foreman_tasks_locks | index_foreman_tasks_locks_name_resource_type _resource_id | resource_id, name, resource_type foreman_tasks_locks | index_foreman_tasks_locks_on_resource_type_a nd_resource_id | resource_type, resource_id foreman_tasks_recurring_logics | foreman_tasks_recurring_logics_pkey | id foreman_tasks_task_group_members | foreman_tasks_task_group_members_index | task_id, task_group_id foreman_tasks_task_group_members | foreman_tasks_task_group_members_pkey | id foreman_tasks_task_groups | foreman_tasks_task_groups_pkey | id foreman_tasks_tasks | index_foreman_tasks_id_state | state, id foreman_tasks_tasks | index_foreman_tasks_tasks_on_type_and_label | label, type foreman_tasks_triggerings | foreman_tasks_triggerings_pkey | id (10 rows) And it looks a lot like I have fixed it (this is 6.2.2 by the way): foreman=# explain analyze SELECT COUNT(*) FROM "foreman_tasks_locks" INNER JOIN "foreman_tasks_tasks" ON "foreman_tasks_tasks"."id" = "foreman_tasks_locks"."task_id" WHERE (foreman_tasks_tasks.state != 'stopped') AND ("foreman_tasks_locks"."task_id" NOT IN ('8abb9ca3-815f-49ea-b97b-6aa505f2a29a')) AND "foreman_tasks_locks"."name" = 'task_owner' AND "foreman_tasks_locks"."resource_id" = 1 AND "foreman_tasks_locks"."resource_type" = 'User' AND "foreman_tasks_locks"."exclusive" = 't'; QUERY PLAN -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- ---------------------------- Aggregate (cost=89414.34..89414.35 rows=1 width=0) (actual time=0.072..0.072 r ows=1 loops=1) -> Hash Join (cost=32232.62..89414.31 rows=12 width=0) (actual time=0.069.. 0.069 rows=0 loops=1) Hash Cond: ((foreman_tasks_locks.task_id)::text = (foreman_tasks_tasks. id)::text) -> Bitmap Heap Scan on foreman_tasks_locks (cost=4014.51..60810.19 ro ws=102904 width=37) (actual time=0.069..0.069 rows=0 loops=1) Recheck Cond: (((name)::text = 'task_owner'::text) AND ((resource _type)::text = 'User'::text) AND (resource_id = 1)) Filter: (exclusive AND ((task_id)::text <> '8abb9ca3-815f-49ea-b9 7b-6aa505f2a29a'::text)) -> Bitmap Index Scan on index_foreman_tasks_locks_name_resource_ type_resource_id_exclus (cost=0.00..3988.78 rows=102904 width=0) (actual time=0 .067..0.067 rows=0 loops=1) Index Cond: (((name)::text = 'task_owner'::text) AND ((reso urce_type)::text = 'User'::text) AND (resource_id = 1) AND (exclusive = true)) -> Hash (cost=28217.12..28217.12 rows=79 width=37) (never executed) -> Seq Scan on foreman_tasks_tasks (cost=0.00..28217.12 rows=79 width=37) (never executed) Filter: ((state)::text <> 'stopped'::text) Total runtime: 0.130 ms (12 rows) By creating this index: foreman=# CREATE INDEX index_foreman_tasks_locks_name_resource_type_resource_id_exclusive ON foreman_tasks_locks (name, resource_type, resource_id, exclusive);NOTICE: identifier "index_foreman_tasks_locks_name_resource_type_resource_id_exclusive" will be truncated to "index_foreman_tasks_locks_name_resource_type_resource_id_exclus" CREATE INDEX Obviously the name can't be that, but it might be worth adding this to the pull request that is currently going through QA. Satellite 6.2.2 Snap 1 results [root@rhsm-qe-1 ~]# sudo su - postgres -s /bin/bash -c "psql -d foreman -c '\d foreman_tasks_tasks'" Table "public.foreman_tasks_tasks" Column | Type | Modifiers ----------------+-----------------------------+----------- id | character varying(255) | not null type | character varying(255) | not null label | character varying(255) | started_at | timestamp without time zone | ended_at | timestamp without time zone | state | character varying(255) | not null result | character varying(255) | not null external_id | character varying(255) | parent_task_id | character varying(255) | start_at | timestamp without time zone | start_before | timestamp without time zone | Indexes: "index_foreman_tasks_id_state" btree (id, state) "index_foreman_tasks_tasks_on_type_and_label" btree (type, label) [root@rhsm-qe-1 ~]# sudo su - postgres -s /bin/bash -c "psql -d foreman -c '\d foreman_tasks_locks'" Table "public.foreman_tasks_locks" Column | Type | Modifiers ---------------+------------------------+------------------------------------------------------------------ id | integer | not null default nextval('foreman_tasks_locks_id_seq'::regclass) task_id | character varying(255) | not null name | character varying(255) | not null resource_type | character varying(255) | resource_id | integer | exclusive | boolean | Indexes: "foreman_tasks_locks_pkey" PRIMARY KEY, btree (id) "index_foreman_tasks_locks_name_resource_type_resource_id" btree (name, resource_type, resource_id) "index_foreman_tasks_locks_on_resource_type_and_resource_id" btree (resource_type, resource_id) Ivan, can you weight in on #21 and #22? Thanks @Erinn for giving this a try. I've opened additional PR to include this params combination into index as well. https://github.com/theforeman/foreman-tasks/pull/205 Verified in Satellite 6.2.2 Async Snap 2 Since Ivan's original patch is in place and has already shown some significant performance boosts. I will verify this one and vote we push the next patch to 6.2.3. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2016:1993 |