Bug 1655245

Summary: Locks for tasks are created after planning phase, allowing 2 concurrent tasks on a "locked" object
Product: Red Hat Satellite Reporter: Pavel Moravec <pmoravec>
Component: Tasks PluginAssignee: Adam Ruzicka <aruzicka>
Status: CLOSED ERRATA QA Contact: Peter Ondrejka <pondrejk>
Severity: urgent Docs Contact:
Priority: high    
Version: 6.4.0CC: aruzicka, ehelms, inecas, ktordeur
Target Milestone: 6.10.0Keywords: Triaged
Target Release: Unused   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: tfm-rubygem-foreman-tasks-4.0.0 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-11-16 14:08:27 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Pavel Moravec 2018-12-01 20:35:29 UTC
Description of problem:
There is a use case when manifest refresh task (or probably also other manifest related tasks) creates lock preventing concurrent manifest tasks - but too late. This allows multiple concurrent manifest tasks running.

The use case requires enabling or disabling some Red Hat repos, such that the task creates more Actions::Pulp::Repository::[UpdateImporter|RefreshDistributor] dynflow steps. This is a must for reproducer.


Version-Release number of selected component (if applicable):
Sat 6.4.0


How reproducible:
100%


Steps to Reproduce:
1. Enable some random Red Hat repositories (at least 5)
2. Invoke two manifest refreshes almost concurrently
3. Try the same once again but with concurrent manifest refresh AND manifest import (this might trigger dangerous consequences, I guess).


Actual results:
2. and also 3. allows multiple manifest tasks running concurrently (and taking much more time than usual).


Expected results:
Neither 2. or 3. allows concurrent manifest tasks running.


Additional info:
hint for engineering: when you dont enable repos before the concurrent manifest refresh, the later task triggered will fail with "Required lock: .." error, as expected. So I think locking is done *after* generation of dynflow steps while it must be done *before* them..? (but if I am right, also concurrent CV tasks would be possible the same way..?)

Comment 3 Pavel Moravec 2018-12-01 21:46:01 UTC
OK, this is generic problem and it can be REALLY dangerous.

I just "successfully" run a CV publish _and_ another promote (not to Library) concurrently. Just by firing the 2nd task while the first one was still in Planning phase. It was enough to have 10+ repos in a CV and fire publish + promote (almost) concurrently.


So _any_ type of task suffer from the same. If another task requiring a lock on an object (CV, manifest, host,..) is fired during Planning phase of another task that requires the same locks, _both_ tasks will "acquire" even write lock (or at least tasking system says so to me as an observer).

So one can run more tasks on same object concurrently, with any fatal consequences (delete a CV version while promoting it, update a host concurrently, etc.).

Comment 4 Pavel Moravec 2018-12-01 21:50:05 UTC
For QE:
much better reproducer (though worth trying both below and original one):

- have "dev" Lifecycle Environment
- publish a CV cv_many_repos with many repos (30 "zoo" pulp repos used in my case)
- once the publish finishes, run concurrently:

hammer content-view publish --name cv_many_repos --organization-id 1 &

hammer content-view version promote --content-view cv_many_repos --version 1.0 --organization-id 1 --to-lifecycle-environment dev &

- check if both tasks will be running concurrently

Comment 5 Pavel Moravec 2018-12-02 11:16:00 UTC
Another reproducer where I already get corrupted pulp+katello repositories for a CV:

concurrently delete a CV version && promote the same version to an LE.

Both tasks were running concurrently, both failed on an error.

Comment 6 Adam Ruzicka 2018-12-03 11:45:42 UTC
Created redmine issue http://projects.theforeman.org/issues/25608 from this bug

Comment 7 Adam Ruzicka 2018-12-03 12:03:45 UTC
We're checking if a lock is available in active record validation. Since it is not enforced by the database we might run into race condition issues. What might happen is that

1) Task A creates a lock L1 for R1
2) Task B creates a lock L2 for R1
3) Task A checks if L1 is available (it is)
4) Task B checks if L2 is available (it is)
5) Task A acquires L1
6) Task B acquires L2
7) Both tasks hold exclusive lock on R1

Comment 8 Adam Ruzicka 2018-12-03 13:15:11 UTC
A reproducer for usage in the rails console. Just make sure you have at least two tasks and one host.

Expected results:
Only one task holds the locks

Actual results:
Both tasks hold the locks

vvvvv SCRIPT vvvvv

task1 = ForemanTasks::Task.first
task2 = ForemanTasks::Task.last
subject = Host.first

ForemanTasks::Lock.where(:task_id => task1.id).destroy_all
ForemanTasks::Lock.where(:task_id => task2.id).destroy_all
ForemanTasks::Lock.where(:resource_id => subject.id, :resource_type => subject.class.to_s).destroy_all

thread = Thread.new do
  ActiveRecord::Base.transaction do
    ForemanTasks::Lock.exclusive!(subject, task1.id)
    sleep 2
  end
end

ActiveRecord::Base.transaction do
  ForemanTasks::Lock.exclusive!(subject, task2.id)
  sleep 2
end
thread.join

ForemanTasks::Lock.where(:resource_id => subject.id)

^^^^^ SCRIPT ^^^^^

Comment 9 Adam Ruzicka 2018-12-03 13:23:31 UTC
Forgot one thing, to make it easier to reproduce using the script described in #8, edit the active scope in app/models/foreman_tasks/lock.rb to

scope :active, -> { joins(:task) }

Comment 13 Bryan Kearney 2019-01-07 21:12:22 UTC
Upstream bug assigned to aruzicka

Comment 14 Bryan Kearney 2019-01-07 21:12:23 UTC
Upstream bug assigned to aruzicka

Comment 15 Bryan Kearney 2021-01-11 21:50:37 UTC
Moving this bug to POST for triage into Satellite since the upstream issue https://projects.theforeman.org/issues/25608 has been resolved.

Comment 16 Peter Ondrejka 2021-07-22 14:16:32 UTC
Verified on Satellite 8.10 sn 10, I didn't reproduce the concurrent lock issue with none of the aforementioned reporoducers

Comment 19 errata-xmlrpc 2021-11-16 14:08:27 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Satellite 6.10 Release), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:4702