Note: This bug is displayed in read-only format because
the product is no longer active in Red Hat Bugzilla.
Red Hat Satellite engineering is moving the tracking of its product development work on Satellite to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "Satellite project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs will be migrated starting at the end of May. If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "Satellite project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/SAT-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Description of problem:
Red Hat Satellite 6.7.4 (possibly also 6.7.[0-3]) sees lots of errors logged to /var/log/foreman/dynflow_executor.output with KeyErrors during task-busy periods, mostly Capsule Syncs and multiple concurrent Content View publish/promote tasks.
This has so far been observed by me in Satellites where /etc/sysconfig/dynflowd has EXECUTOR_COUNT=3 only.
/var/log/foreman/dynflow_executor.output shows these errors:
E, [2020-10-14T11:19:54.427792 #96826] ERROR -- /parallel-executor-core/pool default/worker-2: Could not save state (RuntimeError)
E, [2020-10-14T11:19:54.431332 #96826] ERROR -- /parallel-executor-core/pool default/worker-3: Could not save state (RuntimeError)
E, [2020-10-14T11:19:54.434585 #96826] ERROR -- /parallel-executor-core/pool default/worker-2: searching: action by: {:execution_plan_uuid=>"bd04a648-e11b-47bd-9049-c87abc7ec003", :id=>1638} (KeyError)
E, [2020-10-14T11:19:54.442404 #96826] ERROR -- /parallel-executor-core/pool default/worker-2: searching: action by: {:execution_plan_uuid=>"bd04a648-e11b-47bd-9049-c87abc7ec003", :id=>1634} (KeyError)
E, [2020-10-14T11:20:00.826733 #96826] ERROR -- /parallel-executor-core/pool default/worker-2: searching: action by: {:execution_plan_uuid=>"bd04a648-e11b-47bd-9049-c87abc7ec003", :id=>1648} (KeyError)
E, [2020-10-14T11:20:00.833610 #96826] ERROR -- /parallel-executor-core/pool default/worker-2: searching: action by: {:execution_plan_uuid=>"bd04a648-e11b-47bd-9049-c87abc7ec003", :id=>1648} (KeyError)
E, [2020-10-14T11:20:00.843802 #96826] ERROR -- /parallel-executor-core/pool default/worker-2: searching: action by: {:execution_plan_uuid=>"bd04a648-e11b-47bd-9049-c87abc7ec003", :id=>1648} (KeyError)
postgres logs at the same time show these types of errors:
2020-10-14 11:19:54 *TZ* LOG: duration: 700.013 ms statement: UPDATE "dynflow_steps" SET "execution_plan_uuid" = 'bd04a648-e11b-47bd-9049-c87abc7ec003', "id" = 2455, "action_id" = 1636, "data" = NULL, "state" = 'running', "started_at" = '2020-10-14 16:50:43.303331+0000', "ended_at" = '2020-10-14 18:19:37.717563+0000', "real_time" = 5334.414232315, "execution_time" = 114.37665876999998, "progress_done" = 0.1, "progress_weight" = 100, "class" = 'Dynflow::ExecutionPlan::Steps::RunStep', "error" = NULL, "action_class" = 'Actions::Pulp::Consumer::SyncCapsule', "children" = NULL, "queue" = 'default' WHERE (("state" IN ('pending', 'error', 'skipping', 'suspended')) AND ("execution_plan_uuid" = 'bd04a648-e11b-47bd-9049-c87abc7ec003') AND ("id" = 2455))
2020-10-14 11:24:57 *TZ* ERROR: insert or update on table "dynflow_actions" violates foreign key constraint "dynflow_actions_execution_plan_uuid_fkey"
2020-10-14 11:24:57 *TZ* DETAIL: Key (execution_plan_uuid)=(e077c8ff-4ed2-4e19-8e3f-79e89e20c387) is not present in table "dynflow_execution_plans".
2020-10-14 11:24:57 *TZ* STATEMENT: INSERT INTO "dynflow_actions" ("execution_plan_uuid", "id", "data", "input", "output", "caller_execution_plan_id", "caller_action_id", "class", "plan_step_id", "run_step_id", "finalize_step_id") VALUES ('e077c8ff-4ed2-4e19-8e3f-79e89e20c387', 1302, NULL, '{"smart_proxy_id":3,"options":{"force":true,"source_repository":{"id":4210,"name":"Red_Hat_Software_Collections_RPMs_for_Red_Hat_Enterprise_Linux_7_Server_x86_64_7Server","url":"***redacted***"},"matching_content":false,"dependency":null,"repository_creation":false},"dependency":null,"repository_id":4262,"remote_user":"admin","remote_cp_user":"admin","current_request_id":null,"current_timezone":"***redacted***)","current_user_id":6,"current_organization_id":1,"current_location_id":null}', '{"pulp_tasks":[{"exception":null,"task_type":"pulp.server.managers.repo.publish.publish","_href":"/pulp/api/v2/tasks/17577454-2d20-4177-b9bd-491b4487479d/","task_id":"17577454-2d20-4177-b9bd-491b4487479d","tags":["pulp:repository:***redacted***","pulp:action:publish"],"finish_time":null,"_ns":"task_status","start_time":null,"traceback":null,"spawned_tasks":[],"progress_report":{},"queue":"","state":"waiting","worker_name":null,"result":null,"error":null,"_id":{"$oid":"5f8742794e822e97f73a45f9"},"id":"5f8742794e822e97f73a45f9"}],"poll_attempts":{"total":19,"failed":1},"cancelled":true}', NULL, 1301, 'Actions::Pulp::Repository::DistributorPublish', 1953, 1954, NULL) RETURNING "execution_plan_uuid"
2020-10-14 11:24:57 *TZ* ERROR: insert or update on table "dynflow_actions" violates foreign key constraint "dynflow_actions_execution_plan_uuid_fkey"
2020-10-14 11:24:57 *TZ* DETAIL: Key (execution_plan_uuid)=(e077c8ff-4ed2-4e19-8e3f-79e89e20c387) is not present in table "dynflow_execution_plans".
2020-10-14 11:24:57 *TZ* STATEMENT: INSERT INTO "dynflow_actions" ("execution_plan_uuid", "id", "data", "input", "output", "caller_execution_plan_id", "caller_action_id", "class", "plan_step_id", "run_step_id", "finalize_step_id") VALUES ('e077c8ff-4ed2-4e19-8e3f-79e89e20c387', 1290, NULL, '{"smart_proxy_id":3,"options":{"force":true,"source_repository":{"id":2009,"name":"***redacted***","url":"***redacted***"},"matching_content":false,"dependency":null,"repository_creation":false},"dependency":null,"repository_id":1888,"remote_user":"admin","remote_cp_user":"admin","current_request_id":null,"current_timezone":"***redacted***","current_user_id":6,"current_organization_id":1,"current_location_id":null}', '{"pulp_tasks":[{"exception":null,"task_type":"pulp.server.managers.repo.publish.publish","_href":"/pulp/api/v2/tasks/3a58aa55-e4c3-4440-af36-f9a6abf9d3f6/","task_id":"3a58aa55-e4c3-4440-af36-f9a6abf9d3f6","tags":["pulp:repository:***redacted***","pulp:action:publish"],"finish_time":null,"_ns":"task_status","start_time":null,"traceback":null,"spawned_tasks":[],"progress_report":{},"queue":"","state":"waiting","worker_name":null,"result":null,"error":null,"_id":{"$oid":"5f8742794e822e97f73a4609"},"id":"5f8742794e822e97f73a4609"}],"poll_attempts":{"total":19,"failed":1},"cancelled":true}', NULL, 1289, 'Actions::Pulp::Repository::DistributorPublish', 1935, 1936, NULL) RETURNING "execution_plan_uuid"
2020-10-14 11:24:57 *TZ* ERROR: insert or update on table "dynflow_actions" violates foreign key constraint "dynflow_actions_execution_plan_uuid_fkey"
2020-10-14 11:24:57 *TZ* DETAIL: Key (execution_plan_uuid)=(e077c8ff-4ed2-4e19-8e3f-79e89e20c387) is not present in table "dynflow_execution_plans".
2020-10-14 11:24:57 *TZ* STATEMENT: INSERT INTO "dynflow_actions" ("execution_plan_uuid", "id", "data", "input", "output", "caller_execution_plan_id", "caller_action_id", "class", "plan_step_id", "run_step_id", "finalize_step_id") VALUES ('e077c8ff-4ed2-4e19-8e3f-79e89e20c387', 1294, NULL, '{"smart_proxy_id":3,"options":{"force":true,"source_repository":{"id":2009,"name":"***redacted***","url":"***redacted***"},"matching_content":false,"dependency":null,"repository_creation":false},"dependency":null,"repository_id":1849,"remote_user":"admin","remote_cp_user":"admin","current_request_id":null,"current_timezone":"***redacted***","current_user_id":6,"current_organization_id":1,"current_location_id":null}', '{"pulp_tasks":[{"exception":null,"task_type":"pulp.server.managers.repo.publish.publish","_href":"/pulp/api/v2/tasks/f0e74643-b5f0-465b-a77e-1857d496f5cb/","task_id":"f0e74643-b5f0-465b-a77e-1857d496f5cb","tags":["pulp:repository:***redacted***","pulp:action:publish"],"finish_time":null,"_ns":"task_status","start_time":null,"traceback":null,"spawned_tasks":[],"progress_report":{},"queue":"","state":"waiting","worker_name":null,"result":null,"error":null,"_id":{"$oid":"5f8742794e822e97f73a45fe"},"id":"5f8742794e822e97f73a45fe"}],"poll_attempts":{"total":19,"failed":1},"cancelled":true}', NULL, 1293, 'Actions::Pulp::Repository::DistributorPublish', 1941, 1942, NULL) RETURNING "execution_plan_uuid"
Both Satellites are also somewhat big machines using the 64GB tuning settings.
Version-Release number of selected component (if applicable):
rpm -qa '*dynflow*' '*foreman-tasks*'
tfm-rubygem-dynflow-1.4.1-2.el7sat.noarch
rubygem-smart_proxy_dynflow-0.2.4-1.el7sat.noarch
tfm-rubygem-foreman-tasks-core-0.3.3-1.el7sat.noarch
tfm-rubygem-smart_proxy_dynflow_core-0.2.4-1.el7sat.noarch
tfm-rubygem-foreman-tasks-0.17.5.7-1.el7sat.noarch
How reproducible:
On these Satellites, every time dynflow is busy with "lots" of tasks.
Steps to Reproduce:
1a. Trigger multiple (like 5+) large capsule syncs at the same time
...or...
1b. Trigger 3+ concurrent Content View publish tasks.
Actual results:
Tasks move very slowly, dynflow_executors show very intense CPU usage, /var/log/foreman/dynflow_executor.output and postgres logs show error messages like the ones above.
Expected results:
Tasks move at the same speed as when dynflow is not too busy, tasks complete successfully, no database-related errors.
Additional info:
On of the systems where this was observed had a single dynflow_executor out of 3 be very busy with 100% CPU usage whereas the remaining 2 executors were at 1% or 2% during these busy times. I'm not sure this is related to the issue but I'm hoping it's useful information.
Closing this bug out since we do not believe it will be fixed in the next multiple releases. This was open some years ago, since then we switched to a new deployment model and I don't recall anyone complaining about this kind of issue on currently supported versions. If this is a large impact on you, please feel free to re-open.
Description of problem: Red Hat Satellite 6.7.4 (possibly also 6.7.[0-3]) sees lots of errors logged to /var/log/foreman/dynflow_executor.output with KeyErrors during task-busy periods, mostly Capsule Syncs and multiple concurrent Content View publish/promote tasks. This has so far been observed by me in Satellites where /etc/sysconfig/dynflowd has EXECUTOR_COUNT=3 only. /var/log/foreman/dynflow_executor.output shows these errors: E, [2020-10-14T11:19:54.427792 #96826] ERROR -- /parallel-executor-core/pool default/worker-2: Could not save state (RuntimeError) E, [2020-10-14T11:19:54.431332 #96826] ERROR -- /parallel-executor-core/pool default/worker-3: Could not save state (RuntimeError) E, [2020-10-14T11:19:54.434585 #96826] ERROR -- /parallel-executor-core/pool default/worker-2: searching: action by: {:execution_plan_uuid=>"bd04a648-e11b-47bd-9049-c87abc7ec003", :id=>1638} (KeyError) E, [2020-10-14T11:19:54.442404 #96826] ERROR -- /parallel-executor-core/pool default/worker-2: searching: action by: {:execution_plan_uuid=>"bd04a648-e11b-47bd-9049-c87abc7ec003", :id=>1634} (KeyError) E, [2020-10-14T11:20:00.826733 #96826] ERROR -- /parallel-executor-core/pool default/worker-2: searching: action by: {:execution_plan_uuid=>"bd04a648-e11b-47bd-9049-c87abc7ec003", :id=>1648} (KeyError) E, [2020-10-14T11:20:00.833610 #96826] ERROR -- /parallel-executor-core/pool default/worker-2: searching: action by: {:execution_plan_uuid=>"bd04a648-e11b-47bd-9049-c87abc7ec003", :id=>1648} (KeyError) E, [2020-10-14T11:20:00.843802 #96826] ERROR -- /parallel-executor-core/pool default/worker-2: searching: action by: {:execution_plan_uuid=>"bd04a648-e11b-47bd-9049-c87abc7ec003", :id=>1648} (KeyError) postgres logs at the same time show these types of errors: 2020-10-14 11:19:54 *TZ* LOG: duration: 700.013 ms statement: UPDATE "dynflow_steps" SET "execution_plan_uuid" = 'bd04a648-e11b-47bd-9049-c87abc7ec003', "id" = 2455, "action_id" = 1636, "data" = NULL, "state" = 'running', "started_at" = '2020-10-14 16:50:43.303331+0000', "ended_at" = '2020-10-14 18:19:37.717563+0000', "real_time" = 5334.414232315, "execution_time" = 114.37665876999998, "progress_done" = 0.1, "progress_weight" = 100, "class" = 'Dynflow::ExecutionPlan::Steps::RunStep', "error" = NULL, "action_class" = 'Actions::Pulp::Consumer::SyncCapsule', "children" = NULL, "queue" = 'default' WHERE (("state" IN ('pending', 'error', 'skipping', 'suspended')) AND ("execution_plan_uuid" = 'bd04a648-e11b-47bd-9049-c87abc7ec003') AND ("id" = 2455)) 2020-10-14 11:24:57 *TZ* ERROR: insert or update on table "dynflow_actions" violates foreign key constraint "dynflow_actions_execution_plan_uuid_fkey" 2020-10-14 11:24:57 *TZ* DETAIL: Key (execution_plan_uuid)=(e077c8ff-4ed2-4e19-8e3f-79e89e20c387) is not present in table "dynflow_execution_plans". 2020-10-14 11:24:57 *TZ* STATEMENT: INSERT INTO "dynflow_actions" ("execution_plan_uuid", "id", "data", "input", "output", "caller_execution_plan_id", "caller_action_id", "class", "plan_step_id", "run_step_id", "finalize_step_id") VALUES ('e077c8ff-4ed2-4e19-8e3f-79e89e20c387', 1302, NULL, '{"smart_proxy_id":3,"options":{"force":true,"source_repository":{"id":4210,"name":"Red_Hat_Software_Collections_RPMs_for_Red_Hat_Enterprise_Linux_7_Server_x86_64_7Server","url":"***redacted***"},"matching_content":false,"dependency":null,"repository_creation":false},"dependency":null,"repository_id":4262,"remote_user":"admin","remote_cp_user":"admin","current_request_id":null,"current_timezone":"***redacted***)","current_user_id":6,"current_organization_id":1,"current_location_id":null}', '{"pulp_tasks":[{"exception":null,"task_type":"pulp.server.managers.repo.publish.publish","_href":"/pulp/api/v2/tasks/17577454-2d20-4177-b9bd-491b4487479d/","task_id":"17577454-2d20-4177-b9bd-491b4487479d","tags":["pulp:repository:***redacted***","pulp:action:publish"],"finish_time":null,"_ns":"task_status","start_time":null,"traceback":null,"spawned_tasks":[],"progress_report":{},"queue":"","state":"waiting","worker_name":null,"result":null,"error":null,"_id":{"$oid":"5f8742794e822e97f73a45f9"},"id":"5f8742794e822e97f73a45f9"}],"poll_attempts":{"total":19,"failed":1},"cancelled":true}', NULL, 1301, 'Actions::Pulp::Repository::DistributorPublish', 1953, 1954, NULL) RETURNING "execution_plan_uuid" 2020-10-14 11:24:57 *TZ* ERROR: insert or update on table "dynflow_actions" violates foreign key constraint "dynflow_actions_execution_plan_uuid_fkey" 2020-10-14 11:24:57 *TZ* DETAIL: Key (execution_plan_uuid)=(e077c8ff-4ed2-4e19-8e3f-79e89e20c387) is not present in table "dynflow_execution_plans". 2020-10-14 11:24:57 *TZ* STATEMENT: INSERT INTO "dynflow_actions" ("execution_plan_uuid", "id", "data", "input", "output", "caller_execution_plan_id", "caller_action_id", "class", "plan_step_id", "run_step_id", "finalize_step_id") VALUES ('e077c8ff-4ed2-4e19-8e3f-79e89e20c387', 1290, NULL, '{"smart_proxy_id":3,"options":{"force":true,"source_repository":{"id":2009,"name":"***redacted***","url":"***redacted***"},"matching_content":false,"dependency":null,"repository_creation":false},"dependency":null,"repository_id":1888,"remote_user":"admin","remote_cp_user":"admin","current_request_id":null,"current_timezone":"***redacted***","current_user_id":6,"current_organization_id":1,"current_location_id":null}', '{"pulp_tasks":[{"exception":null,"task_type":"pulp.server.managers.repo.publish.publish","_href":"/pulp/api/v2/tasks/3a58aa55-e4c3-4440-af36-f9a6abf9d3f6/","task_id":"3a58aa55-e4c3-4440-af36-f9a6abf9d3f6","tags":["pulp:repository:***redacted***","pulp:action:publish"],"finish_time":null,"_ns":"task_status","start_time":null,"traceback":null,"spawned_tasks":[],"progress_report":{},"queue":"","state":"waiting","worker_name":null,"result":null,"error":null,"_id":{"$oid":"5f8742794e822e97f73a4609"},"id":"5f8742794e822e97f73a4609"}],"poll_attempts":{"total":19,"failed":1},"cancelled":true}', NULL, 1289, 'Actions::Pulp::Repository::DistributorPublish', 1935, 1936, NULL) RETURNING "execution_plan_uuid" 2020-10-14 11:24:57 *TZ* ERROR: insert or update on table "dynflow_actions" violates foreign key constraint "dynflow_actions_execution_plan_uuid_fkey" 2020-10-14 11:24:57 *TZ* DETAIL: Key (execution_plan_uuid)=(e077c8ff-4ed2-4e19-8e3f-79e89e20c387) is not present in table "dynflow_execution_plans". 2020-10-14 11:24:57 *TZ* STATEMENT: INSERT INTO "dynflow_actions" ("execution_plan_uuid", "id", "data", "input", "output", "caller_execution_plan_id", "caller_action_id", "class", "plan_step_id", "run_step_id", "finalize_step_id") VALUES ('e077c8ff-4ed2-4e19-8e3f-79e89e20c387', 1294, NULL, '{"smart_proxy_id":3,"options":{"force":true,"source_repository":{"id":2009,"name":"***redacted***","url":"***redacted***"},"matching_content":false,"dependency":null,"repository_creation":false},"dependency":null,"repository_id":1849,"remote_user":"admin","remote_cp_user":"admin","current_request_id":null,"current_timezone":"***redacted***","current_user_id":6,"current_organization_id":1,"current_location_id":null}', '{"pulp_tasks":[{"exception":null,"task_type":"pulp.server.managers.repo.publish.publish","_href":"/pulp/api/v2/tasks/f0e74643-b5f0-465b-a77e-1857d496f5cb/","task_id":"f0e74643-b5f0-465b-a77e-1857d496f5cb","tags":["pulp:repository:***redacted***","pulp:action:publish"],"finish_time":null,"_ns":"task_status","start_time":null,"traceback":null,"spawned_tasks":[],"progress_report":{},"queue":"","state":"waiting","worker_name":null,"result":null,"error":null,"_id":{"$oid":"5f8742794e822e97f73a45fe"},"id":"5f8742794e822e97f73a45fe"}],"poll_attempts":{"total":19,"failed":1},"cancelled":true}', NULL, 1293, 'Actions::Pulp::Repository::DistributorPublish', 1941, 1942, NULL) RETURNING "execution_plan_uuid" Both Satellites are also somewhat big machines using the 64GB tuning settings. Version-Release number of selected component (if applicable): rpm -qa '*dynflow*' '*foreman-tasks*' tfm-rubygem-dynflow-1.4.1-2.el7sat.noarch rubygem-smart_proxy_dynflow-0.2.4-1.el7sat.noarch tfm-rubygem-foreman-tasks-core-0.3.3-1.el7sat.noarch tfm-rubygem-smart_proxy_dynflow_core-0.2.4-1.el7sat.noarch tfm-rubygem-foreman-tasks-0.17.5.7-1.el7sat.noarch How reproducible: On these Satellites, every time dynflow is busy with "lots" of tasks. Steps to Reproduce: 1a. Trigger multiple (like 5+) large capsule syncs at the same time ...or... 1b. Trigger 3+ concurrent Content View publish tasks. Actual results: Tasks move very slowly, dynflow_executors show very intense CPU usage, /var/log/foreman/dynflow_executor.output and postgres logs show error messages like the ones above. Expected results: Tasks move at the same speed as when dynflow is not too busy, tasks complete successfully, no database-related errors. Additional info: On of the systems where this was observed had a single dynflow_executor out of 3 be very busy with 100% CPU usage whereas the remaining 2 executors were at 1% or 2% during these busy times. I'm not sure this is related to the issue but I'm hoping it's useful information.