Red Hat Satellite engineering is moving the tracking of its product development work on Satellite to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "Satellite project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs will be migrated starting at the end of May. If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "Satellite project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/SAT-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 2093884 - Every CV Publish+Promote action followed by an automated Capsule sync task generates a huge traceback "(ActiveRecord::RecordNotFound): Couldn't find ForemanTasks::Task::DynflowTask" in Satellite 6.11
Summary: Every CV Publish+Promote action followed by an automated Capsule sync task ge...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Satellite
Classification: Red Hat
Component: Dynflow
Version: 6.11.0
Hardware: All
OS: All
high
high
Target Milestone: 6.12.0
Assignee: Samir Jha
QA Contact: Lukáš Hellebrandt
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-06-06 09:50 UTC by Sayan Das
Modified: 2022-11-16 13:34 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-11-16 13:33:51 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Foreman Issue Tracker 35025 0 Normal New Every CV Publish+Promote action followed by an automated Capsule sync task generates a huge traceback "(ActiveRecord::Re... 2022-06-06 16:28:33 UTC
Red Hat Issue Tracker SAT-10811 0 None None None 2022-09-23 18:33:04 UTC
Red Hat Product Errata RHSA-2022:8506 0 None None None 2022-11-16 13:34:04 UTC

Description Sayan Das 2022-06-06 09:50:27 UTC
Description of problem:

In Satellite 6.11, We get a very nice feature to Publish a CV and then promote it as the same time at once in just one click. 

While doing so, Even if the task gets completed without any issues and then automatically initiates the related Actions::Katello::ContentView::CapsuleSync tasks, At some point dynflow raises a huge traceback which could easily flood the production.log


Version-Release number of selected component (if applicable):

satellite-6.11.0-1.el8sat.noarch
rubygem-dynflow-1.6.4-1.el8sat.noarch
katello-4.3.0-2.el8sat.noarch

How reproducible:

Always in latest snap of 6.11

Steps to Reproduce:
1. INstall a Satellite and Capsule 6.11 on RHEL 8
2. Import a manifest, Enable and sync some repos.
3. Create a Lifecycle called QA, after Library and add both of the Environments to the external capsule for content syncing. 
4. Add those two repos in a CV --> Publish a new version and also select the promote option to promote the CV version to QA at the same time.
5. Monitor /var/log/foreman/production.log file while the task executes and then initiates the related capsule sync tasks. 

Actual results:

2022-06-04T17:43:18 [I|bac|c3da4261] Task {label: Actions::Katello::ContentView::CapsuleSync, id: 0c79aac5-b798-4457-b848-0fd258dbc6fb, execution_plan_id: fd8833cf-1298-4b8d-b8de-98c37d4628fe} state changed: planning
2022-06-04T17:43:18 [I|bac|c3da4261] Task {label: Actions::Katello::ContentView::CapsuleSync, id: 0c79aac5-b798-4457-b848-0fd258dbc6fb, execution_plan_id: fd8833cf-1298-4b8d-b8de-98c37d4628fe} state changed: planned
2022-06-04T17:43:18 [I|bac|c3da4261] Task {label: Actions::Katello::ContentView::CapsuleSync, execution_plan_id: fd8833cf-1298-4b8d-b8de-98c37d4628fe} state changed: running
2022-06-04T17:43:18 [I|bac|c3da4261] Task {label: , id: 64303b6d-b56e-49fd-bde4-c79a2f70d61f, execution_plan_id: 0157df0c-2200-4340-9e55-48e86246d729} state changed: pending
2022-06-04T17:43:18 [W|dyn|c3da4261] Error on on_execution_plan_save event
2022-06-04T17:43:18 [I|dyn|c3da4261] Backtrace for 'Error on on_execution_plan_save event' error (ActiveRecord::RecordNotFound): Couldn't find ForemanTasks::Task::DynflowTask with [WHERE "foreman_tasks_tasks"."type" = $1 AND "foreman_tasks_tasks"."external_id" = $2]
 c3da4261 | /usr/share/gems/gems/activerecord-6.0.4.7/lib/active_record/relation/finder_methods.rb:336:in `raise_record_not_found_exception!'
 c3da4261 | /usr/share/gems/gems/activerecord-6.0.4.7/lib/active_record/relation/finder_methods.rb:127:in `first!'
 c3da4261 | /usr/share/gems/gems/foreman-tasks-5.2.3/app/models/foreman_tasks/task/dynflow_task.rb:19:in `update_from_dynflow'
 c3da4261 | /usr/share/gems/gems/foreman-tasks-5.2.3/lib/foreman_tasks/dynflow/persistence.rb:36:in `on_execution_plan_save'
 c3da4261 | /usr/share/gems/gems/foreman-tasks-5.2.3/lib/foreman_tasks/dynflow/persistence.rb:11:in `block in save_execution_plan'
 c3da4261 | /usr/share/gems/gems/foreman-tasks-5.2.3/lib/foreman_tasks/dynflow/persistence.rb:10:in `tap'
 c3da4261 | /usr/share/gems/gems/foreman-tasks-5.2.3/lib/foreman_tasks/dynflow/persistence.rb:10:in `save_execution_plan'
 c3da4261 | /usr/share/gems/gems/dynflow-1.6.4/lib/dynflow/persistence.rb:84:in `save_execution_plan'
 c3da4261 | /usr/share/gems/gems/dynflow-1.6.4/lib/dynflow/execution_plan.rb:450:in `save'
 c3da4261 | /usr/share/gems/gems/dynflow-1.6.4/lib/dynflow/execution_plan.rb:145:in `update_state'
 c3da4261 | /usr/share/gems/gems/dynflow-1.6.4/lib/dynflow/execution_plan.rb:286:in `plan'
..
..
 c3da4261 | /usr/share/gems/gems/sidekiq-5.2.10/lib/sidekiq/util.rb:25:in `block in safe_thread'
 c3da4261 | /usr/share/gems/gems/logging-2.3.0/lib/logging/diagnostic_context.rb:474:in `block in create_with_logging_context'
2022-06-04T17:43:18 [I|bac|c3da4261] Task {label: Actions::Katello::CapsuleContent::Sync, id: 64303b6d-b56e-49fd-bde4-c79a2f70d61f, execution_plan_id: 0157df0c-2200-4340-9e55-48e86246d729} state changed: planning 


Expected results:

Even if the task actually gets processed, the traceback should not be there. Dynflow should be able to handle the traceback or the root cause behind it carefully. 


Additional info:

Related logs will be attached to the BZ soon.

I am setting the severity to high as if this continues to happen, It can easily flood the production.log file.

Comment 3 Adam Ruzicka 2022-06-06 10:17:00 UTC
Attaching my preliminary analysis:

1) You kick off Actions::Katello::ContentView::Publish
2) CV::Publish runs its #run
3) CV::Publish starts its #finalize
  - this starts a transaction
  - the transaction is kept open until all #finalize methods of all actions in CV::Publish's execution plan are executed
4) In #finalize, CV::Publish kicks off ContentView::CapsuleSync (CV::Publish L124)
5) CV::CapsuleSync goes through planning and gets executed
6) CV::CapsuleSync runs and starts spawning its sub tasks
7) Once a sub task goes into planning, a hook kicks in to create a Task object for it
  - as part of this, we try to link the newly created task against its parent task
  - this is where it fails

However, the parent task is created inside the transaction started in 4), but
that transaction is not yet committed when 7 is being done. Dynflow uses its own
set of database connections, so the sub tasks in 6 and 7 do not see the parent
yet.

Possible solutions from cleanest to worst:
1) Do not kick off new tasks from #finalize
2) If we have to kick off new tasks from #finalize, do it over a new DB connection so it lives outside the transaction
3) Modify foreman-tasks to expect this to happen and handle it gracefully

No matter which solution we pick, dynflow is not the right component. If we go with one of the better ones, then probably repositories, if we go with 3 then foreman tasks would be the right one.

@Samir any opinions?

Comment 4 Samir Jha 2022-06-06 13:38:26 UTC
@aruzicka Thanks for the analysis and suggestions. We can try implementing #2 to begin with based on your suggestions of using hooks and explore using event queue.

@Sayan, Could you confirm that you don't see this on the normal promotion in 6.9, 6.10 etc. This is not new code so I would be interested in making sure this is not a 6.11 regression.

Comment 5 Sayan Das 2022-06-06 13:49:55 UTC
I confirm that I have never seen this happening in earlier versions. I just now tested on 6.10 and I can't see the traceback there.


-- Sayan

Comment 6 Samir Jha 2022-06-06 15:00:21 UTC
Hey Sayan,

Can you confirm if the capsule sync finishes correctly despite the log error messages?

Comment 7 Sayan Das 2022-06-06 15:08:57 UTC
It definitely does i.e. the Sync task will be finished successfully.

From logs, It's visible that :


** The task was created but without any label:


2022-06-04T17:43:18 [I|bac|c3da4261] Task {label: , id: 64303b6d-b56e-49fd-bde4-c79a2f70d61f, execution_plan_id: 0157df0c-2200-4340-9e55-48e86246d729} state changed: pending

Sudden trigger of the Backtrace:


2022-06-04T17:43:18 [W|dyn|c3da4261] Error on on_execution_plan_save event
2022-06-04T17:43:18 [I|dyn|c3da4261] Backtrace for 'Error on on_execution_plan_save event' error (ActiveRecord::RecordNotFound): Couldn't find ForemanTasks::Task::DynflowTask with [WHERE "foreman_tasks_tasks"."type" = $1 AND "foreman_tasks_tasks"."external_id" = $2]
 c3da4261 | /usr/share/gems/gems/activerecord-6.0.4.7/lib/active_record/relation/finder_methods.rb:336:in `raise_record_not_found_exception!'
 c3da4261 | /usr/share/gems/gems/activerecord-6.0.4.7/lib/active_record/relation/finder_methods.rb:127:in `first!'
 c3da4261 | /usr/share/gems/gems/foreman-tasks-5.2.3/app/models/foreman_tasks/task/dynflow_task.rb:19:in `update_from_dynflow'
 c3da4261 | /usr/share/gems/gems/foreman-tasks-5.2.3/lib/foreman_tasks/dynflow/persistence.rb:36:in `on_execution_plan_save'
 c3da4261 | /usr/share/gems/gems/foreman-tasks-5.2.3/lib/foreman_tasks/dynflow/persistence.rb:11:in `block in save_execution_plan'
 c3da4261 | /usr/share/gems/gems/foreman-tasks-5.2.3/lib/foreman_tasks/dynflow/persistence.rb:10:in `tap'
 c3da4261 | /usr/share/gems/gems/foreman-tasks-5.2.3/lib/foreman_tasks/dynflow/persistence.rb:10:in `save_execution_plan'


As soon as the backtrace ends, I get to see the task in a healthy state and running :

..
..
 c3da4261 | /usr/share/gems/gems/sidekiq-5.2.10/lib/sidekiq/util.rb:25:in `block in safe_thread'
 c3da4261 | /usr/share/gems/gems/logging-2.3.0/lib/logging/diagnostic_context.rb:474:in `block in create_with_logging_context'

2022-06-04T17:43:18 [I|bac|c3da4261] Task {label: Actions::Katello::CapsuleContent::Sync, id: 64303b6d-b56e-49fd-bde4-c79a2f70d61f, execution_plan_id: 0157df0c-2200-4340-9e55-48e86246d729} state changed: planning 


And finally completed:


# hammer task info --id 64303b6d-b56e-49fd-bde4-c79a2f70d61f
ID:          64303b6d-b56e-49fd-bde4-c79a2f70d61f
Action:      Synchronize capsule 'breakfix0-cap.xx.xx.xx.com'
State:       stopped
Result:      success
Started at:  2022/06/04 17:43:18
Ended at:    2022/06/04 17:55:44
Duration:    00:12:26.17
Owner:       admin
Task errors:



So I don't see any issues from a Capsule sync point of view but That traceback is huge and generated with every attempt of automated Capsule sync after CV Publish+Promote. It could easily fill up production.log if people have daily repo sync configured and capsule has Library env added to it.

Comment 8 Samir Jha 2022-06-06 15:42:51 UTC
Sayan,

Thanks for the information. I can see the behavior on master and the task completes correctly after the log messages. Added it our board and we'll work on it. Thank you for catching and raising this! :)

Comment 9 Samir Jha 2022-06-06 16:28:32 UTC
Created redmine issue https://projects.theforeman.org/issues/35025 from this bug

Comment 10 Bryan Kearney 2022-06-13 20:04:34 UTC
Moving this bug to POST for triage into Satellite since the upstream issue https://projects.theforeman.org/issues/35025 has been resolved.

Comment 11 Lukáš Hellebrandt 2022-09-23 18:18:43 UTC
Verified with Sat 6.12 snap 12.0.

Created a Sat and Capsule for that sat. Imported manifest, refreshed manifest, synced some RH repos. Created a LE, created a CV. Created an AK with that LE and CV. Synced RH repos. Added those repos to the CV. Published the CV in Library.
Created a host, registered it to the Capsule with the AK. Can't register the host to the repos since they are not in the same CV and LE.
Published the CV, checked "Promote" and selected the LE.
Enabled the repos on the host. Successfully installed the package from the repo.
=> The capsule has indeed been synced.
Took a look at production.log. Didn't notice any traceback and the whole log doesn't contain "RecordNotFound".
=> QED

Comment 15 errata-xmlrpc 2022-11-16 13:33:51 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Satellite 6.12 Release), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:8506


Note You need to log in before you can comment on or make changes to this bug.