Bug 1684051

Summary:	`foreman-rake foreman_tasks:cleanup ...` goes on and on for orphaned task locks when time window is before than the earliest task.
Product:	Red Hat Satellite	Reporter:	Jan Hutař <jhutar>
Component:	Tasks Plugin	Assignee:	Ivan Necas <inecas>
Status:	CLOSED ERRATA	QA Contact:	Jan Hutař <jhutar>
Severity:	medium	Docs Contact:
Priority:	unspecified
Version:	Unspecified	CC:	aruzicka, inecas, mmccune, zhunting
Target Milestone:	Released	Keywords:	Regression, Triaged
Target Release:	Unused
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	tfm-rubygem-foreman-tasks-0.14.4.6-1	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2019-05-14 19:57:52 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Jan Hutař 2019-02-28 10:23:58 UTC

Description of problem:
`foreman-rake foreman_tasks:cleanup ...` goes on and on for orphaned task locks


Version-Release number of selected component (if applicable):
tfm-rubygem-foreman_remote_execution-1.6.7-8.el7sat.noarch


How reproducible:
always


Steps to Reproduce:
1. # foreman-rake foreman_tasks:cleanup TASK_SEARCH='label ~ *' AFTER='1000h' VERBOSE=true


Actual results:
It says it is going to delete "About to remove 3517 orphaned task locks", but after 4 batches of 1000, it keeps doing something.

Also when I kill it with "Ctrl+C", it still continues on the background (see below).


Expected results:
It should do only what it needs to do and Ctrl+C should be able to kill it completely


Additional info:
[root@satellite ~]# foreman-rake foreman_tasks:cleanup TASK_SEARCH='label ~ *' AFTER='1000h' VERBOSE=true
/usr/share/foreman/lib/foreman.rb:8: warning: already initialized constant Foreman::UUID_REGEXP
/usr/share/foreman/lib/foreman.rb:8: warning: previous definition of UUID_REGEXP was here
API controllers newer than Apipie cache! Run apipie:cache rake task to regenerate cache.
/opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.10.0.21/app/models/katello/concerns/content_facet_host_extensions.rb:7: warning: already initialized constant Katello::Concerns::ContentFacetHostExtensions::ERRATA_STATUS_MAP
/opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.10.0.21/app/models/katello/concerns/content_facet_host_extensions.rb:7: warning: previous definition of ERRATA_STATUS_MAP was here
/opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.10.0.21/app/models/katello/concerns/content_facet_host_extensions.rb:14: warning: already initialized constant Katello::Concerns::ContentFacetHostExtensions::TRACE_STATUS_MAP
/opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.10.0.21/app/models/katello/concerns/content_facet_host_extensions.rb:14: warning: previous definition of TRACE_STATUS_MAP was here
/opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.10.0.21/app/models/katello/concerns/subscription_facet_host_extensions.rb:13: warning: already initialized constant Katello::Concerns::SubscriptionFacetHostExtensions::SUBSCRIPTION_STATUS_MAP
/opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.10.0.21/app/models/katello/concerns/subscription_facet_host_extensions.rb:13: warning: previous definition of SUBSCRIPTION_STATUS_MAP was here
/opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.10.0.21/app/models/katello/concerns/subscription_facet_host_extensions.rb:21: warning: already initialized constant Katello::Concerns::SubscriptionFacetHostExtensions::SLA_STATUS_MAP
/opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.10.0.21/app/models/katello/concerns/subscription_facet_host_extensions.rb:21: warning: previous definition of SLA_STATUS_MAP was here
/opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.10.0.21/app/models/katello/concerns/subscription_facet_host_extensions.rb:26: warning: already initialized constant Katello::Concerns::SubscriptionFacetHostExtensions::USAGE_STATUS_MAP
/opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.10.0.21/app/models/katello/concerns/subscription_facet_host_extensions.rb:26: warning: previous definition of USAGE_STATUS_MAP was here
/opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.10.0.21/app/models/katello/concerns/subscription_facet_host_extensions.rb:31: warning: already initialized constant Katello::Concerns::SubscriptionFacetHostExtensions::ROLE_STATUS_MAP
/opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.10.0.21/app/models/katello/concerns/subscription_facet_host_extensions.rb:31: warning: previous definition of ROLE_STATUS_MAP was here
/opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.10.0.21/app/models/katello/concerns/subscription_facet_host_extensions.rb:36: warning: already initialized constant Katello::Concerns::SubscriptionFacetHostExtensions::ADDONS_STATUS_MAP
/opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.10.0.21/app/models/katello/concerns/subscription_facet_host_extensions.rb:36: warning: previous definition of ADDONS_STATUS_MAP was here
/opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.10.0.21/app/models/katello/concerns/subscription_facet_host_extensions.rb:41: warning: already initialized constant Katello::Concerns::SubscriptionFacetHostExtensions::PURPOSE_STATUS_MAP
/opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.10.0.21/app/models/katello/concerns/subscription_facet_host_extensions.rb:41: warning: previous definition of PURPOSE_STATUS_MAP was here
No tasks matching filter found, skipping.
About to remove 3517 orphaned task locks
0/3517
1000/3517
2000/3517
3000/3517
4000/3517
5000/3517
6000/3517
7000/3517
8000/3517
9000/3517
10000/3517
11000/3517
12000/3517
13000/3517
14000/3517
15000/3517
16000/3517
17000/3517
18000/3517
19000/3517
20000/3517
^C
Session terminated, killing shell... ...killed.
[root@satellite ~]# 21000/3517
22000/3517
23000/3517
24000/3517
25000/3517
26000/3517
27000/3517
...

Comment 3 Mike McCune 2019-03-15 20:18:02 UTC

I've never seen anyone use the '1000h' method for calculating time window for cleaning up tasks. Everyone just uses days.

I ran it on your Satellite with the 780,000 tasks with a 7 day window and got:

# foreman-rake foreman_tasks:cleanup TASK_SEARCH='label ~ *' AFTER='7d'  VERBOSE=true

[root@satellite ~]# foreman-rake foreman_tasks:cleanup TASK_SEARCH='label ~ *' AFTER='7d'  VERBOSE=true
About to remove 738041 tasks matching filter
0/738041
1000/738041
2000/738041
3000/738041
4000/738041
5000/738041
6000/738041
7000/738041
8000/738041
....738000/738041
738041/738041
Deleted 738041 tasks matching filter
About to remove 2526939 orphaned task locks
0/2526939
1000/2526939
2000/2526939
3000/2526939
4000/2526939
5000/2526939
6000/2526939
....

2525000/2526939
2526000/2526939
2526939/2526939
Deleted 2526939 orphaned task locks

About to remove 1002 orphaned execution plans
0/1002
1000/1002
1002/1002
Deleted 1002 orphaned execution plans
No oprhaned job invocations found, skipping.
[root@satellite ~]# 

worked just fine for me.

Comment 4 Mike McCune 2019-03-15 20:22:45 UTC

Oh, I see what you did, you put the time window *past* any tasks that existed.

What you are telling the cleanup to do is only delete tasks that are *OLDER* than 1000h (~41 days), not all tasks up to 1000h. 

Since your Satellite was new, there were no tasks older than 41days but it tried to delete orphan task locks and got confused.

This is still a bug, but not as severe or should block 6.5 IMO because the use case isn't common.

Comment 5 Mike McCune 2019-03-15 20:25:05 UTC

Actually, I'm going to revise my comment above. This should still block 6.5 because if anyone sets up the automatic cron based cleanup configuration on a freshly installed 6.5 server, the rake task will run forever because it will be deleting tasks older than 30 days even when none exist.

Flagging as a regression as this does not occur on 6.4 or 6.3

Comment 6 Jan Hutař 2019-03-16 21:29:24 UTC

Thank you for your investigation!

comment #3: Yes, I know I can use days, but used hours, not sure why :)
comment #4 and #5: I was preparing automated job to do so and first wanted to try running it so it wont delete anything without that dry run mode.

Comment 7 Jan Hutař 2019-03-17 07:20:04 UTC

I have tried again (sat updated to snap 20) and it is taking more than 2 hours without any deleting which sounds wrong:

[root@satellite ~]# time foreman-rake foreman_tasks:cleanup TASK_SEARCH='label ~ *' AFTER='10d' VERBOSE=true
No tasks matching filter found, skipping.
No orphaned task locks found, skipping.
No orphaned execution plans found, skipping.
No oprhaned job invocations found, skipping.

real	137m42.629s
user	0m29.073s
sys	0m3.902s

Comment 8 Jan Hutař 2019-03-17 22:31:58 UTC

So I have ran:

    # katello-service stop
    # systemctl start postgresql
    # su - postgres
    $ vacuumdb --all --verbose --echo

and again, foreman_tasks:cleanup took almost 3 hours. I'm now trying with:

    $ vacuumdb --all --full --analyze

Note to myself - is this optimal way to get count?

2019-03-17 18:28:25 EDT LOG:  duration: 1039.402 ms  statement: SELECT COUNT(*) FROM (SELECT  1 AS one FROM "foreman_tasks_locks" LEFT OUTER JOIN "foreman_tasks_tasks" ON "foreman_tasks_tasks"."id" = "foreman_tasks_locks"."task_id" WHERE "foreman_tasks_tasks"."id" IS NULL LIMIT 1000) subquery_for_count;

Comment 9 Jan Hutař 2019-03-18 06:32:31 UTC

Done full vacuum (DB dir size changed from 16 GB to 4.7 GB) and reran tasks cleanup and again, it seems quite slow. After 30min, there is this and it is still running:

[root@satellite ~]# time foreman-rake foreman_tasks:cleanup TASK_SEARCH='label ~ *' AFTER='10d' VERBOSE=true
About to remove 6460 tasks matching filter
0/6460
1000/6460
2000/6460
3000/6460
4000/6460
5000/6460
6000/6460
6460/6460
Deleted 6460 tasks matching filter
About to remove 11298 orphaned task locks
0/11298
1000/11298
2000/11298
3000/11298
4000/11298
5000/11298
6000/11298
7000/11298
8000/11298
9000/11298
10000/11298
11000/11298
11298/11298
Deleted 11298 orphaned task locks

Comment 10 Ivan Necas 2019-04-01 13:10:45 UTC

The performance issue seems to be related to this BZ https://bugzilla.redhat.com/show_bug.cgi?id=155706 and particularly related to the fact that there is too many orphaned execution plans in the reproducing system.

I was not able to reproduce the issue https://bugzilla.redhat.com/show_bug.cgi?id=1684051#c5, mike were you able to reproduce on your setup?

Comment 11 Ivan Necas 2019-04-02 13:15:59 UTC

I've found the reason why the cleanup job was running slowly: seems like a regression of https://bugzilla.redhat.com/show_bug.cgi?id=1557067 when there is large amount of tasks, the query for the orhpaned execution plans goes forever. Will send a patch for optimized query.

I was not able to reproduce the issue with cleanup older than the satellite. Jan: 10.16.31.67 is patched to fix the perf issue: could you try reporducing the other stuff?

Comment 12 Ivan Necas 2019-04-02 13:20:58 UTC

Created redmine issue https://projects.theforeman.org/issues/26511 from this bug

Comment 13 Jan Hutař 2019-04-03 07:04:13 UTC

I have tried on different satellite (which is installed about 2 weeks back - i.e. 336 hours ago) and currently is snap #22 and was not able to reproduce:

[root@satellite ~]# foreman-rake foreman_tasks:cleanup TASK_SEARCH='label ~ *' AFTER='1000h' VERBOSE=true
No tasks matching filter found, skipping.
No orphaned task locks found, skipping.
No orphaned execution plans found, skipping.
No oprhaned job invocations found, skipping.

So I think we can stop tracking that issue.

Comment 14 Bryan Kearney 2019-04-03 08:07:24 UTC

Moving this bug to POST for triage into Satellite 6 since the upstream issue https://projects.theforeman.org/issues/26511 has been resolved.

Comment 17 Bryan Kearney 2019-05-14 19:57:52 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2019:1222