Bug 2180875

Summary: Recurring tasks not rescheduled to future during upgrade.
Product: Red Hat Satellite Reporter: Lukáš Hellebrandt <lhellebr>
Component: Tasks PluginAssignee: satellite6-bugs <satellite6-bugs>
Status: CLOSED WONTFIX QA Contact: Satellite QE Team <sat-qe-bz-list>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 6.13.0CC: ahumbe, amiagarw, aruzicka, ehelms, mvanderw, peter.vreman, rkhadkik, rlavi
Target Milestone: Unspecified   
Target Release: Unused   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2024-04-03 11:49:36 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Lukáš Hellebrandt 2023-03-22 13:43:18 UTC
Description of problem:
Recurring tasks do not get rescheduled to future during upgrade. That may lead to them being scheduled to past once the upgrade finishes, so they never run. When disabled, they can't be enabled again. Error:

ERF28-1357 [ForemanTasks::RecurringLogicCancelledException]: Cannot update a cancelled Recurring Logic.

Version-Release number of selected component (if applicable):
Sat 6.13.0 snap 15.0 and upgrade path 6.12.3 snap 2.0 -> 6.13.0 snap 15.0. Not a regression.

How reproducible:
Deterministic

Steps to Reproduce:
1) Create a recurring task (All hosts -> <host> -> Schedule a job) with cron "*/2 * * * *"
2) Create a sync plan (Content -> Sync plans -> Create sync plan) on some repo with the same cron
3) Run the upgrade to 6.13
4) Go to Recurring logics and disable both
5) Enable them again

Actual results:
A sync plan created in 2) was rescheduled to future during upgrade.
A recurring task created in 1) was NOT rescheduled to future during upgrade. After disabling it manually, it can't be enabled again due to above error. It doesn't run at specified time.

Expected results:
A recurring task created in 2) is also rescheduled to future during upgrade

Comment 1 Adam Ruzicka 2023-03-22 14:32:22 UTC
This is interesting, there should be no difference between recurring tasks and sync plans as sync plans use recurring tasks as the backend. Do you still have the machine?

Comment 2 Lukáš Hellebrandt 2023-03-28 10:21:59 UTC
Answer provided on Slack

Comment 3 Adam Ruzicka 2023-06-05 09:07:18 UTC
I might have a hunch what's going on, but it is a bit of a stretch. I believe what I wrote in https://bugzilla.redhat.com/show_bug.cgi?id=2131839#c15 still holds, however sync plans are handled in a special way during the upgrade. Before the upgrade starts, all sync plans are disabled and then re-enabled again once the upgrade is done. We don't do anything like this for other things using recurring logics. The upgrade itself isn't exactly gentle and services get restarted quite a lot.

Right now the root cause seems to be that services are restarted before the fix for 2131839 is deployed which triggers https://bugzilla.redhat.com/show_bug.cgi?id=2131839#c10 . And because things get broken before the upgrade, deploying a fixed version during the upgrade doesn't really help. Sync plans seem to be immune to this because they are disabled in this stage.

There are still some gaps I cannot really explain right now, but confirming or refuting this theory should be quite straightforward, albeit a bit time consuming.

Comment 4 Lukáš Hellebrandt 2023-06-07 16:47:56 UTC
I tried with 6.13.1 -> 6.14 and wasn't able to reproduce.

I used reproducer from OP on one machine. On another machine, before upgrade, I stopped Satellite services over time when the tasks were supposed to run. After starting the services again, everything got rescheduled to the future properly. Then even after upgrade of this second instance, everything still works.

Comment 5 Adam Ruzicka 2023-06-08 12:48:56 UTC
That somewhat confirms I was on to something in #3 and that the fix for the original BZ still helps here. 

The easiest way out seems to be delivering updated dynflow (or just picking the fix for the original bz) into all currently supported satellite versions, however one would have to either update dynflow by hand before continuing with the rest of the upgrade or disabling (probably) all recurring logics prior to the upgrade so the bug wouldn't get triggered when trying to deploy the fixed version. Alternatively, foreman-maintain could be made to do that.

Or, considering noone really complained about the issue (specifically about REX), we could treat this as resolved in current release, although disabling all recurring logics during upgrade might not be a bad idea.

Comment 7 Peter Vreman 2023-07-12 12:13:09 UTC
My case is attached to this BZ, it has for 6.12 as simple reproducer for the rh_cloud jobs:

My reproducer and the results were in Feb-2023 on Sat6.12
Test from yesterday-today overnight:
- Feb 01 14:11  - Stop Satellite 'satellite-maintain service stop'
<kept it stopped over night>
- Feb 02 09:29 - Started Satellite 'satellite-maintain service start'

Result in recurring logic (see also attach screenshot)
- Last occurrence 'Feb 02 09:30'
- Next occurrence 'Feb 01 xx:yy'


Day1 During the day stop Satellite '

Comment 10 Adam Ruzicka 2024-04-03 11:49:36 UTC
It seems we missed the boat with this a little bit. This should be fixed in dynflow that is shipped with 6.13, so all upgrades from 6.13+ should be safe. The only place where this can manifest is when updating from 6.12 to 6.13, but at this point, we can't really fix it there. With that being said, I'll go ahead and close this.

Comment 11 Red Hat Bugzilla 2024-08-02 04:25:04 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days