Bug 2093884
Summary: | Every CV Publish+Promote action followed by an automated Capsule sync task generates a huge traceback "(ActiveRecord::RecordNotFound): Couldn't find ForemanTasks::Task::DynflowTask" in Satellite 6.11 | ||
---|---|---|---|
Product: | Red Hat Satellite | Reporter: | Sayan Das <saydas> |
Component: | Dynflow | Assignee: | Samir Jha <sajha> |
Status: | CLOSED ERRATA | QA Contact: | Lukáš Hellebrandt <lhellebr> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 6.11.0 | CC: | aruzicka, sajha, zhunting |
Target Milestone: | 6.12.0 | Keywords: | Regression, Triaged |
Target Release: | Unused | ||
Hardware: | All | ||
OS: | All | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2022-11-16 13:33:51 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Sayan Das
2022-06-06 09:50:27 UTC
Attaching my preliminary analysis: 1) You kick off Actions::Katello::ContentView::Publish 2) CV::Publish runs its #run 3) CV::Publish starts its #finalize - this starts a transaction - the transaction is kept open until all #finalize methods of all actions in CV::Publish's execution plan are executed 4) In #finalize, CV::Publish kicks off ContentView::CapsuleSync (CV::Publish L124) 5) CV::CapsuleSync goes through planning and gets executed 6) CV::CapsuleSync runs and starts spawning its sub tasks 7) Once a sub task goes into planning, a hook kicks in to create a Task object for it - as part of this, we try to link the newly created task against its parent task - this is where it fails However, the parent task is created inside the transaction started in 4), but that transaction is not yet committed when 7 is being done. Dynflow uses its own set of database connections, so the sub tasks in 6 and 7 do not see the parent yet. Possible solutions from cleanest to worst: 1) Do not kick off new tasks from #finalize 2) If we have to kick off new tasks from #finalize, do it over a new DB connection so it lives outside the transaction 3) Modify foreman-tasks to expect this to happen and handle it gracefully No matter which solution we pick, dynflow is not the right component. If we go with one of the better ones, then probably repositories, if we go with 3 then foreman tasks would be the right one. @Samir any opinions? @aruzicka Thanks for the analysis and suggestions. We can try implementing #2 to begin with based on your suggestions of using hooks and explore using event queue. @Sayan, Could you confirm that you don't see this on the normal promotion in 6.9, 6.10 etc. This is not new code so I would be interested in making sure this is not a 6.11 regression. I confirm that I have never seen this happening in earlier versions. I just now tested on 6.10 and I can't see the traceback there. -- Sayan Hey Sayan, Can you confirm if the capsule sync finishes correctly despite the log error messages? It definitely does i.e. the Sync task will be finished successfully. From logs, It's visible that : ** The task was created but without any label: 2022-06-04T17:43:18 [I|bac|c3da4261] Task {label: , id: 64303b6d-b56e-49fd-bde4-c79a2f70d61f, execution_plan_id: 0157df0c-2200-4340-9e55-48e86246d729} state changed: pending Sudden trigger of the Backtrace: 2022-06-04T17:43:18 [W|dyn|c3da4261] Error on on_execution_plan_save event 2022-06-04T17:43:18 [I|dyn|c3da4261] Backtrace for 'Error on on_execution_plan_save event' error (ActiveRecord::RecordNotFound): Couldn't find ForemanTasks::Task::DynflowTask with [WHERE "foreman_tasks_tasks"."type" = $1 AND "foreman_tasks_tasks"."external_id" = $2] c3da4261 | /usr/share/gems/gems/activerecord-6.0.4.7/lib/active_record/relation/finder_methods.rb:336:in `raise_record_not_found_exception!' c3da4261 | /usr/share/gems/gems/activerecord-6.0.4.7/lib/active_record/relation/finder_methods.rb:127:in `first!' c3da4261 | /usr/share/gems/gems/foreman-tasks-5.2.3/app/models/foreman_tasks/task/dynflow_task.rb:19:in `update_from_dynflow' c3da4261 | /usr/share/gems/gems/foreman-tasks-5.2.3/lib/foreman_tasks/dynflow/persistence.rb:36:in `on_execution_plan_save' c3da4261 | /usr/share/gems/gems/foreman-tasks-5.2.3/lib/foreman_tasks/dynflow/persistence.rb:11:in `block in save_execution_plan' c3da4261 | /usr/share/gems/gems/foreman-tasks-5.2.3/lib/foreman_tasks/dynflow/persistence.rb:10:in `tap' c3da4261 | /usr/share/gems/gems/foreman-tasks-5.2.3/lib/foreman_tasks/dynflow/persistence.rb:10:in `save_execution_plan' As soon as the backtrace ends, I get to see the task in a healthy state and running : .. .. c3da4261 | /usr/share/gems/gems/sidekiq-5.2.10/lib/sidekiq/util.rb:25:in `block in safe_thread' c3da4261 | /usr/share/gems/gems/logging-2.3.0/lib/logging/diagnostic_context.rb:474:in `block in create_with_logging_context' 2022-06-04T17:43:18 [I|bac|c3da4261] Task {label: Actions::Katello::CapsuleContent::Sync, id: 64303b6d-b56e-49fd-bde4-c79a2f70d61f, execution_plan_id: 0157df0c-2200-4340-9e55-48e86246d729} state changed: planning And finally completed: # hammer task info --id 64303b6d-b56e-49fd-bde4-c79a2f70d61f ID: 64303b6d-b56e-49fd-bde4-c79a2f70d61f Action: Synchronize capsule 'breakfix0-cap.xx.xx.xx.com' State: stopped Result: success Started at: 2022/06/04 17:43:18 Ended at: 2022/06/04 17:55:44 Duration: 00:12:26.17 Owner: admin Task errors: So I don't see any issues from a Capsule sync point of view but That traceback is huge and generated with every attempt of automated Capsule sync after CV Publish+Promote. It could easily fill up production.log if people have daily repo sync configured and capsule has Library env added to it. Sayan, Thanks for the information. I can see the behavior on master and the task completes correctly after the log messages. Added it our board and we'll work on it. Thank you for catching and raising this! :) Created redmine issue https://projects.theforeman.org/issues/35025 from this bug Moving this bug to POST for triage into Satellite since the upstream issue https://projects.theforeman.org/issues/35025 has been resolved. Verified with Sat 6.12 snap 12.0. Created a Sat and Capsule for that sat. Imported manifest, refreshed manifest, synced some RH repos. Created a LE, created a CV. Created an AK with that LE and CV. Synced RH repos. Added those repos to the CV. Published the CV in Library. Created a host, registered it to the Capsule with the AK. Can't register the host to the repos since they are not in the same CV and LE. Published the CV, checked "Promote" and selected the LE. Enabled the repos on the host. Successfully installed the package from the repo. => The capsule has indeed been synced. Took a look at production.log. Didn't notice any traceback and the whole log doesn't contain "RecordNotFound". => QED Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Satellite 6.12 Release), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:8506 |