Bug 1684051
Summary: | `foreman-rake foreman_tasks:cleanup ...` goes on and on for orphaned task locks when time window is before than the earliest task. | ||
---|---|---|---|
Product: | Red Hat Satellite | Reporter: | Jan Hutař <jhutar> |
Component: | Tasks Plugin | Assignee: | Ivan Necas <inecas> |
Status: | CLOSED ERRATA | QA Contact: | Jan Hutař <jhutar> |
Severity: | medium | Docs Contact: | |
Priority: | unspecified | ||
Version: | Unspecified | CC: | aruzicka, inecas, mmccune, zhunting |
Target Milestone: | Released | Keywords: | Regression, Triaged |
Target Release: | Unused | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | tfm-rubygem-foreman-tasks-0.14.4.6-1 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2019-05-14 19:57:52 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Jan Hutař
2019-02-28 10:23:58 UTC
I've never seen anyone use the '1000h' method for calculating time window for cleaning up tasks. Everyone just uses days. I ran it on your Satellite with the 780,000 tasks with a 7 day window and got: # foreman-rake foreman_tasks:cleanup TASK_SEARCH='label ~ *' AFTER='7d' VERBOSE=true [root@satellite ~]# foreman-rake foreman_tasks:cleanup TASK_SEARCH='label ~ *' AFTER='7d' VERBOSE=true About to remove 738041 tasks matching filter 0/738041 1000/738041 2000/738041 3000/738041 4000/738041 5000/738041 6000/738041 7000/738041 8000/738041 ....738000/738041 738041/738041 Deleted 738041 tasks matching filter About to remove 2526939 orphaned task locks 0/2526939 1000/2526939 2000/2526939 3000/2526939 4000/2526939 5000/2526939 6000/2526939 .... 2525000/2526939 2526000/2526939 2526939/2526939 Deleted 2526939 orphaned task locks About to remove 1002 orphaned execution plans 0/1002 1000/1002 1002/1002 Deleted 1002 orphaned execution plans No oprhaned job invocations found, skipping. [root@satellite ~]# worked just fine for me. Oh, I see what you did, you put the time window *past* any tasks that existed. What you are telling the cleanup to do is only delete tasks that are *OLDER* than 1000h (~41 days), not all tasks up to 1000h. Since your Satellite was new, there were no tasks older than 41days but it tried to delete orphan task locks and got confused. This is still a bug, but not as severe or should block 6.5 IMO because the use case isn't common. Actually, I'm going to revise my comment above. This should still block 6.5 because if anyone sets up the automatic cron based cleanup configuration on a freshly installed 6.5 server, the rake task will run forever because it will be deleting tasks older than 30 days even when none exist. Flagging as a regression as this does not occur on 6.4 or 6.3 Thank you for your investigation! comment #3: Yes, I know I can use days, but used hours, not sure why :) comment #4 and #5: I was preparing automated job to do so and first wanted to try running it so it wont delete anything without that dry run mode. I have tried again (sat updated to snap 20) and it is taking more than 2 hours without any deleting which sounds wrong: [root@satellite ~]# time foreman-rake foreman_tasks:cleanup TASK_SEARCH='label ~ *' AFTER='10d' VERBOSE=true No tasks matching filter found, skipping. No orphaned task locks found, skipping. No orphaned execution plans found, skipping. No oprhaned job invocations found, skipping. real 137m42.629s user 0m29.073s sys 0m3.902s So I have ran: # katello-service stop # systemctl start postgresql # su - postgres $ vacuumdb --all --verbose --echo and again, foreman_tasks:cleanup took almost 3 hours. I'm now trying with: $ vacuumdb --all --full --analyze Note to myself - is this optimal way to get count? 2019-03-17 18:28:25 EDT LOG: duration: 1039.402 ms statement: SELECT COUNT(*) FROM (SELECT 1 AS one FROM "foreman_tasks_locks" LEFT OUTER JOIN "foreman_tasks_tasks" ON "foreman_tasks_tasks"."id" = "foreman_tasks_locks"."task_id" WHERE "foreman_tasks_tasks"."id" IS NULL LIMIT 1000) subquery_for_count; Done full vacuum (DB dir size changed from 16 GB to 4.7 GB) and reran tasks cleanup and again, it seems quite slow. After 30min, there is this and it is still running: [root@satellite ~]# time foreman-rake foreman_tasks:cleanup TASK_SEARCH='label ~ *' AFTER='10d' VERBOSE=true About to remove 6460 tasks matching filter 0/6460 1000/6460 2000/6460 3000/6460 4000/6460 5000/6460 6000/6460 6460/6460 Deleted 6460 tasks matching filter About to remove 11298 orphaned task locks 0/11298 1000/11298 2000/11298 3000/11298 4000/11298 5000/11298 6000/11298 7000/11298 8000/11298 9000/11298 10000/11298 11000/11298 11298/11298 Deleted 11298 orphaned task locks The performance issue seems to be related to this BZ https://bugzilla.redhat.com/show_bug.cgi?id=155706 and particularly related to the fact that there is too many orphaned execution plans in the reproducing system. I was not able to reproduce the issue https://bugzilla.redhat.com/show_bug.cgi?id=1684051#c5, mike were you able to reproduce on your setup? I've found the reason why the cleanup job was running slowly: seems like a regression of https://bugzilla.redhat.com/show_bug.cgi?id=1557067 when there is large amount of tasks, the query for the orhpaned execution plans goes forever. Will send a patch for optimized query. I was not able to reproduce the issue with cleanup older than the satellite. Jan: 10.16.31.67 is patched to fix the perf issue: could you try reporducing the other stuff? Created redmine issue https://projects.theforeman.org/issues/26511 from this bug I have tried on different satellite (which is installed about 2 weeks back - i.e. 336 hours ago) and currently is snap #22 and was not able to reproduce: [root@satellite ~]# foreman-rake foreman_tasks:cleanup TASK_SEARCH='label ~ *' AFTER='1000h' VERBOSE=true No tasks matching filter found, skipping. No orphaned task locks found, skipping. No orphaned execution plans found, skipping. No oprhaned job invocations found, skipping. So I think we can stop tracking that issue. Moving this bug to POST for triage into Satellite 6 since the upstream issue https://projects.theforeman.org/issues/26511 has been resolved. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2019:1222 |