Bug 2180875
| Summary: | Recurring tasks not rescheduled to future during upgrade. | ||
|---|---|---|---|
| Product: | Red Hat Satellite | Reporter: | Lukáš Hellebrandt <lhellebr> |
| Component: | Tasks Plugin | Assignee: | satellite6-bugs <satellite6-bugs> |
| Status: | CLOSED WONTFIX | QA Contact: | Satellite QE Team <sat-qe-bz-list> |
| Severity: | medium | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 6.13.0 | CC: | ahumbe, amiagarw, aruzicka, ehelms, mvanderw, peter.vreman, rkhadkik, rlavi |
| Target Milestone: | Unspecified | ||
| Target Release: | Unused | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2024-04-03 11:49:36 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Lukáš Hellebrandt
2023-03-22 13:43:18 UTC
This is interesting, there should be no difference between recurring tasks and sync plans as sync plans use recurring tasks as the backend. Do you still have the machine? Answer provided on Slack I might have a hunch what's going on, but it is a bit of a stretch. I believe what I wrote in https://bugzilla.redhat.com/show_bug.cgi?id=2131839#c15 still holds, however sync plans are handled in a special way during the upgrade. Before the upgrade starts, all sync plans are disabled and then re-enabled again once the upgrade is done. We don't do anything like this for other things using recurring logics. The upgrade itself isn't exactly gentle and services get restarted quite a lot. Right now the root cause seems to be that services are restarted before the fix for 2131839 is deployed which triggers https://bugzilla.redhat.com/show_bug.cgi?id=2131839#c10 . And because things get broken before the upgrade, deploying a fixed version during the upgrade doesn't really help. Sync plans seem to be immune to this because they are disabled in this stage. There are still some gaps I cannot really explain right now, but confirming or refuting this theory should be quite straightforward, albeit a bit time consuming. I tried with 6.13.1 -> 6.14 and wasn't able to reproduce. I used reproducer from OP on one machine. On another machine, before upgrade, I stopped Satellite services over time when the tasks were supposed to run. After starting the services again, everything got rescheduled to the future properly. Then even after upgrade of this second instance, everything still works. That somewhat confirms I was on to something in #3 and that the fix for the original BZ still helps here. The easiest way out seems to be delivering updated dynflow (or just picking the fix for the original bz) into all currently supported satellite versions, however one would have to either update dynflow by hand before continuing with the rest of the upgrade or disabling (probably) all recurring logics prior to the upgrade so the bug wouldn't get triggered when trying to deploy the fixed version. Alternatively, foreman-maintain could be made to do that. Or, considering noone really complained about the issue (specifically about REX), we could treat this as resolved in current release, although disabling all recurring logics during upgrade might not be a bad idea. My case is attached to this BZ, it has for 6.12 as simple reproducer for the rh_cloud jobs: My reproducer and the results were in Feb-2023 on Sat6.12 Test from yesterday-today overnight: - Feb 01 14:11 - Stop Satellite 'satellite-maintain service stop' <kept it stopped over night> - Feb 02 09:29 - Started Satellite 'satellite-maintain service start' Result in recurring logic (see also attach screenshot) - Last occurrence 'Feb 02 09:30' - Next occurrence 'Feb 01 xx:yy' Day1 During the day stop Satellite ' It seems we missed the boat with this a little bit. This should be fixed in dynflow that is shipped with 6.13, so all upgrades from 6.13+ should be safe. The only place where this can manifest is when updating from 6.12 to 6.13, but at this point, we can't really fix it there. With that being said, I'll go ahead and close this. The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days |