Bug 1390931

Summary: World invalidation can fail, when execution plans are missing
Product: Red Hat Satellite Reporter: Ivan Necas <inecas>
Component: Tasks PluginAssignee: Ivan Necas <inecas>
Status: CLOSED ERRATA QA Contact: Renzo Nuccitelli <rnuccite>
Severity: medium Docs Contact:
Priority: medium    
Version: 6.2.0CC: anerurka, bbuckingham, bkearney, egolov, ehelms, hmore, inecas, jberry86, jbhatia, jcallaha, mtenheuv, mverma, oshtaier, pierre-yves.goubet, pmoravec, rnuccite, vgunasek, will_darton
Target Milestone: UnspecifiedKeywords: Reopened, Triaged
Target Release: Unused   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: rubygem-dynflow-0.8.13.4-1,rubygem-dynflow-0.8.13.5-1 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1426398 (view as bug list) Environment:
Last Closed: 2017-08-08 16:05:33 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 1426398    

Description Ivan Necas 2016-11-02 09:29:48 UTC
Description of problem:
Under some circumstances (such as manually deleting data from dynflow_execution_plan), 

Version-Release number of selected component (if applicable):


How reproducible:
under special circumstances

Steps to Reproduce:
1. trigger a task 
2. while the task is runnint, delete data from dynflow manually (CAUTION: THIS IS BY NO MEANS A RECOMMENDED WAY OF DEALING WITH TASKS - FOR REPRODUCER PURPOSES ONLY):
psql foreman
delete from foreman_tasks_tasks;
delete from foreman_tasks_locks;
delete from dynflow_steps;
delete from dynflow_actions;
delete from dynflow_execution_plans
exit
3. force kill the dynflow executor process
4. restart the foreman-tasks service

Actual results:
in logs, there is invalid worlds found message, where at the terminated world uuid, there ie  "searching: 'execution_plan by: {:uuid=>\"'..."
the world
/foreman_tasks/dynflow/worlds still shows the world in the list

Expected results:
dynlfow is able to handle this situation, by skipping the deleted plans

Comment 1 Ivan Necas 2016-11-02 09:30:35 UTC
Created redmine issue http://projects.theforeman.org/issues/17177 from this bug

Comment 6 Bryan Kearney 2016-11-07 13:02:06 UTC
Upstream bug assigned to inecas

Comment 7 Bryan Kearney 2016-11-07 13:02:08 UTC
Moving this bug to POST for triage into Satellite 6 since the upstream issue http://projects.theforeman.org/issues/17177 has been resolved.

Comment 11 Ivan Necas 2017-02-01 11:16:16 UTC
*** Bug 1416177 has been marked as a duplicate of this bug. ***

Comment 14 Satellite Program 2017-02-23 21:10:06 UTC
Please add verifications steps for this bug to help QE verify

Comment 15 Ivan Necas 2017-03-03 09:40:49 UTC
The verification steps are in https://bugzilla.redhat.com/show_bug.cgi?id=1390931#c0

Comment 16 Ivan Necas 2017-03-03 10:26:12 UTC
If this situation happens before it's possible to upgrade to a version that has the fix, one can run:

cat <<EOF | foreman-rake console
w = ForemanTasks.dynflow.world
w.coordinator.find_locks(class: Dynflow::Coordinator::ExecutionLock.name).each do |l|
  exists = w.persistence.load_execution_plan(l.execution_plan_id) rescue nil
  unless exists
    puts "#{l.execution_plan_id} doesn't exist: deleting the lock"
    w.coordinator.delete_record(l)
  end
end; puts "finished"
EOF



After going so, the invalid locks should be removed from the system, and the nest time foreman-tasks service
is started, this issue should be resolved.

Comment 20 Renzo Nuccitelli 2017-04-03 16:53:15 UTC
I followed the steps and found error msgs on logs:

grep worlds production.log

2017-04-03 12:42:02  [foreman-tasks/dynflow] [E] invalid worlds found {"a97cd2c2-a86b-4309-aa0c-edd7ed1c6c9f"=>"Value (NilClass) '' is not any of: Dynflow::ExecutionPlan::Steps::Abstract.", "22cf8a95-f734-4145-9af9-f2dd0baf93e7"=>:valid}
 | /opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-0.8.13.4/lib/dynflow/world.rb:328:in `block in worlds_validity_check'
 | /opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-0.8.13.4/lib/dynflow/world.rb:322:in `worlds_validity_check'
2017-04-03 12:44:37  [foreman-tasks/dynflow] [E] invalid worlds found {"a97cd2c2-a86b-4309-aa0c-edd7ed1c6c9f"=>"Value (NilClass) '' is not any of: Dynflow::ExecutionPlan::Steps::Abstract.", "22cf8a95-f734-4145-9af9-f2dd0baf93e7"=>:valid}
 | /opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-0.8.13.4/lib/dynflow/world.rb:328:in `block in worlds_validity_check'
 | /opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-0.8.13.4/lib/dynflow/world.rb:322:in `worlds_validity_check'
2017-04-03 12:45:15  [foreman-tasks/dynflow] [E] invalid worlds found {"a97cd2c2-a86b-4309-aa0c-edd7ed1c6c9f"=>"Value (NilClass) '' is not any of: Dynflow::ExecutionPlan::Steps::Abstract.", "22cf8a95-f734-4145-9af9-f2dd0baf93e7"=>:valid}

I have tested it on Satellite 6.2.9 snap 2. Maybe we had not cherry-picked this fix. So I am moving this back to ASSIGNED

Comment 21 Renzo Nuccitelli 2017-04-03 17:07:29 UTC
Providing more info on https://bugzilla.redhat.com/show_bug.cgi?id=1390931#c20. I started repo sync to start the task. After testing steps I tried start the synchronization again with no success. Same error in logs seems to prevent it:

2017-04-03 13:03:51  [foreman-tasks/dynflow] [E] invalid worlds found {"a97cd2c2-a86b-4309-aa0c-edd7ed1c6c9f"=>"Value (NilClass) '' is not any of: Dynflow::ExecutionPlan::Steps::Abstract.", "8b55723d-6b00-4b41-acdf-f7f90bfe0b48"=>:valid, "f9fac667-d073-4ace-a576-face1d515626"=>:valid, "e5dbdb21-8133-41a6-9941-70af16fd95ed"=>:valid}
 | /opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-0.8.13.4/lib/dynflow/world.rb:328:in `block in worlds_validity_check'
 | /opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-0.8.13.4/lib/dynflow/world.rb:322:in `worlds_validity_check'
2017-04-03 13:04:28  [foreman-tasks/dynflow] [E] invalid worlds found {"a97cd2c2-a86b-4309-aa0c-edd7ed1c6c9f"=>"Value (NilClass) '' is not any of: Dynflow::ExecutionPlan::Steps::Abstract.", "8b55723d-6b00-4b41-acdf-f7f90bfe0b48"=>:valid, "f9fac667-d073-4ace-a576-face1d515626"=>:valid, "e5dbdb21-8133-41a6-9941-70af16fd95ed"=>:valid}

Comment 23 Ivan Necas 2017-04-03 21:22:46 UTC
I was able to reproduce (see http://projects.theforeman.org/issues/19146), when I deleted data from dynflow_steps, but not from dynflow_execution_plans. Although I believe this was a bit different situation as before, I'm ok with keeping it as part of this BZ, tracking though as differnt issue in upstream.

The proposed patch is available in https://github.com/Dynflow/dynflow/pull/227

Comment 29 errata-xmlrpc 2017-05-01 13:56:18 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:1191

Comment 31 Renzo Nuccitelli 2017-05-30 18:37:16 UTC
No more rerror messages found after running steps from comment #21. So I am moving this bug to VERIFIED.

Comment 32 Bryan Kearney 2017-06-20 17:53:59 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:1553

Comment 34 Ivan Necas 2017-08-07 07:43:27 UTC
Connecting redmine issue http://projects.theforeman.org/issues/20002 from this bug