Bug 1655245

Summary:	Locks for tasks are created after planning phase, allowing 2 concurrent tasks on a "locked" object
Product:	Red Hat Satellite	Reporter:	Pavel Moravec <pmoravec>
Component:	Tasks Plugin	Assignee:	Adam Ruzicka <aruzicka>
Status:	CLOSED ERRATA	QA Contact:	Peter Ondrejka <pondrejk>
Severity:	urgent	Docs Contact:
Priority:	high
Version:	6.4.0	CC:	aruzicka, ehelms, inecas, ktordeur
Target Milestone:	6.10.0	Keywords:	Triaged
Target Release:	Unused
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:	tfm-rubygem-foreman-tasks-4.0.0	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2021-11-16 14:08:27 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Pavel Moravec 2018-12-01 20:35:29 UTC

Description of problem:
There is a use case when manifest refresh task (or probably also other manifest related tasks) creates lock preventing concurrent manifest tasks - but too late. This allows multiple concurrent manifest tasks running.

The use case requires enabling or disabling some Red Hat repos, such that the task creates more Actions::Pulp::Repository::[UpdateImporter|RefreshDistributor] dynflow steps. This is a must for reproducer.


Version-Release number of selected component (if applicable):
Sat 6.4.0


How reproducible:
100%


Steps to Reproduce:
1. Enable some random Red Hat repositories (at least 5)
2. Invoke two manifest refreshes almost concurrently
3. Try the same once again but with concurrent manifest refresh AND manifest import (this might trigger dangerous consequences, I guess).


Actual results:
2. and also 3. allows multiple manifest tasks running concurrently (and taking much more time than usual).


Expected results:
Neither 2. or 3. allows concurrent manifest tasks running.


Additional info:
hint for engineering: when you dont enable repos before the concurrent manifest refresh, the later task triggered will fail with "Required lock: .." error, as expected. So I think locking is done *after* generation of dynflow steps while it must be done *before* them..? (but if I am right, also concurrent CV tasks would be possible the same way..?)

Comment 3 Pavel Moravec 2018-12-01 21:46:01 UTC

OK, this is generic problem and it can be REALLY dangerous.

I just "successfully" run a CV publish _and_ another promote (not to Library) concurrently. Just by firing the 2nd task while the first one was still in Planning phase. It was enough to have 10+ repos in a CV and fire publish + promote (almost) concurrently.


So _any_ type of task suffer from the same. If another task requiring a lock on an object (CV, manifest, host,..) is fired during Planning phase of another task that requires the same locks, _both_ tasks will "acquire" even write lock (or at least tasking system says so to me as an observer).

So one can run more tasks on same object concurrently, with any fatal consequences (delete a CV version while promoting it, update a host concurrently, etc.).

Comment 4 Pavel Moravec 2018-12-01 21:50:05 UTC

For QE:
much better reproducer (though worth trying both below and original one):

- have "dev" Lifecycle Environment
- publish a CV cv_many_repos with many repos (30 "zoo" pulp repos used in my case)
- once the publish finishes, run concurrently:

hammer content-view publish --name cv_many_repos --organization-id 1 &

hammer content-view version promote --content-view cv_many_repos --version 1.0 --organization-id 1 --to-lifecycle-environment dev &

- check if both tasks will be running concurrently

Comment 5 Pavel Moravec 2018-12-02 11:16:00 UTC

Another reproducer where I already get corrupted pulp+katello repositories for a CV:

concurrently delete a CV version && promote the same version to an LE.

Both tasks were running concurrently, both failed on an error.

Comment 6 Adam Ruzicka 2018-12-03 11:45:42 UTC

Created redmine issue http://projects.theforeman.org/issues/25608 from this bug

Comment 7 Adam Ruzicka 2018-12-03 12:03:45 UTC

We're checking if a lock is available in active record validation. Since it is not enforced by the database we might run into race condition issues. What might happen is that

1) Task A creates a lock L1 for R1
2) Task B creates a lock L2 for R1
3) Task A checks if L1 is available (it is)
4) Task B checks if L2 is available (it is)
5) Task A acquires L1
6) Task B acquires L2
7) Both tasks hold exclusive lock on R1

Comment 8 Adam Ruzicka 2018-12-03 13:15:11 UTC

A reproducer for usage in the rails console. Just make sure you have at least two tasks and one host.

Expected results:
Only one task holds the locks

Actual results:
Both tasks hold the locks

vvvvv SCRIPT vvvvv

task1 = ForemanTasks::Task.first
task2 = ForemanTasks::Task.last
subject = Host.first

ForemanTasks::Lock.where(:task_id => task1.id).destroy_all
ForemanTasks::Lock.where(:task_id => task2.id).destroy_all
ForemanTasks::Lock.where(:resource_id => subject.id, :resource_type => subject.class.to_s).destroy_all

thread = Thread.new do
  ActiveRecord::Base.transaction do
    ForemanTasks::Lock.exclusive!(subject, task1.id)
    sleep 2
  end
end

ActiveRecord::Base.transaction do
  ForemanTasks::Lock.exclusive!(subject, task2.id)
  sleep 2
end
thread.join

ForemanTasks::Lock.where(:resource_id => subject.id)

^^^^^ SCRIPT ^^^^^

Comment 9 Adam Ruzicka 2018-12-03 13:23:31 UTC

Forgot one thing, to make it easier to reproduce using the script described in #8, edit the active scope in app/models/foreman_tasks/lock.rb to

scope :active, -> { joins(:task) }

Comment 13 Bryan Kearney 2019-01-07 21:12:22 UTC

Upstream bug assigned to aruzicka

Comment 14 Bryan Kearney 2019-01-07 21:12:23 UTC

Upstream bug assigned to aruzicka

Comment 15 Bryan Kearney 2021-01-11 21:50:37 UTC

Moving this bug to POST for triage into Satellite since the upstream issue https://projects.theforeman.org/issues/25608 has been resolved.

Comment 16 Peter Ondrejka 2021-07-22 14:16:32 UTC

Verified on Satellite 8.10 sn 10, I didn't reproduce the concurrent lock issue with none of the aforementioned reporoducers

Comment 19 errata-xmlrpc 2021-11-16 14:08:27 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Satellite 6.10 Release), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:4702