Bug 2131839
Summary: | re-enabling sync plans [FAIL] Could not update the sync plan: ERF28-1357 [ForemanTasks::RecurringLogicCancelledException]: Cannot update a cancelled Recurring Logic. | |||
---|---|---|---|---|
Product: | Red Hat Satellite | Reporter: | William Staten <william_staten> | |
Component: | Dynflow | Assignee: | Adam Ruzicka <aruzicka> | |
Status: | CLOSED ERRATA | QA Contact: | Lukáš Hellebrandt <lhellebr> | |
Severity: | high | Docs Contact: | ||
Priority: | high | |||
Version: | 6.11.3 | CC: | ahumbe, aruzicka, dhjoshi, egolov, ehelms, jbhatia, jbreitwe, momran, mvanderw, pcreech, peter.vreman, pmoravec, saydas, sdturne | |
Target Milestone: | 6.13.0 | Keywords: | PrioBumpGSS, Triaged, Upgrades | |
Target Release: | Unused | |||
Hardware: | Unspecified | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | rubygem-dynflow-1.6.10 | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 2184125 (view as bug list) | Environment: | ||
Last Closed: | 2023-05-03 13:22:11 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: |
Description
William Staten
2022-10-03 21:26:19 UTC
Hi William, Thanks for raising the bugzilla with your finding. Do you already have a case open on this with support? If not, I'd recommend that we begin there. I have already opened a case with support. I also experienced this problem for which I opened the following case record: https://access.redhat.com/support/cases/#/case/03373216 I suspect it was caused by the upgrade of our Satellite from 6.10 to 6.11. I worked around the problem by deleting the offending sync plan and re-creating an identical plan (but which obviously has a new and unique recurring logics ID. I hit it on my own Satellite as well. I *think* the sufficient reproducer is: - have Sat 6.11.[0-3] - have a sync plan - disable it - run upgrade to 6.11.4 Hello, I have Hilti AG reporting a similar issue with the upgrade to 6.12.1 and with the Recurring Logic for Inventory sync action. So I tested with Sync Plan stuff ( on 6.12.0 and 6.12.1 ): * Created a sync plan with custom cron to be executed just after 2 mins * Stop all services * Bring up the services after 3 mins * Check the sync plan and RL, the next sync date\time is still OLD * Disable the RL, Try to reenable it, and fails with "ERF28-1357 [ForemanTasks::RecurringLogicCancelledException]: Cannot update a cancelled Recurring Logic." The same was reproducible for Inventory Scheduled Sync Recurring Logic as well. While for a sync plan, It's easy to fix from UI, but for Inventory Scheduled Sync Action, It's not since the cronline for the Recurring Logic is not editable ( yet ). So we have to clear that canceled logic and recreate a new from rake console The fact that Satellite cannot gracefully handle the RL execution and Status if the "Next Run" falls into a downtime ( when services will be down ), that makes it nearly impossible for users to identify the source of the issue and then fix it somehow. Let me know if a new BZ is needed for this . Dynflow has a built in subcomponent which runs inside the orchestrator and periodically dispatches delayed execution plans scheduled for the future. Once the delayed plan is properly planned, the delay record (the thing saying "an execution plan $X should be executed at time $T") is destroyed. There are some safeguards in place to ensure a single delayed plan does not get planned multiple times. Issue 1: In sidekiq-based deployments, the delayed plan dispatching subcomponent is started too early, while the rest of the orchestrator is still doing world validity checks. This can lead to a situation where the subcomponent dispatches a single delayed plan multiple times. Issue 2: When delayed plans get dispatched multiple times, the safeguards are not handling it properly. The safeguards essentially act as an early return in case the plan in question is already being planned, however, as soon as the early return happens, the delayed record is removed. This breaks planning of the next repetition, which relies on data from it. Verified with Sat 6.13.0 snap 15.0 and upgrade path 6.12.3 snap 2.0 -> 6.13.0 snap 15.0. 1) Create a recurring task (All hosts -> <host> -> Schedule a job) with cron "*/2 * * * *" 2) Create a sync plan (Content -> Sync plans -> Create sync plan) on some repo with the same cron 3) Run the upgrade to 6.13 4) Go to Recurring logics and disable both 5) Enable them again A sync plan created in 2) was rescheduled to future during upgrade. A recurring task created in 1) was NOT rescheduled to future during upgrade. After disabling it manually, it can't be enabled again. It doesn't run at specified time. This BZ specifically mentions sync plans so I'm verifying it and filing a followup BZ for general recurring tasks, like running hosts jobs. (In reply to Lukáš Hellebrandt from comment #12) > Verified with Sat 6.13.0 snap 15.0 and upgrade path 6.12.3 snap 2.0 -> > 6.13.0 snap 15.0. > > 1) Create a recurring task (All hosts -> <host> -> Schedule a job) with cron > "*/2 * * * *" > 2) Create a sync plan (Content -> Sync plans -> Create sync plan) on some > repo with the same cron > 3) Run the upgrade to 6.13 > 4) Go to Recurring logics and disable both > 5) Enable them again > > A sync plan created in 2) was rescheduled to future during upgrade. > > A recurring task created in 1) was NOT rescheduled to future during upgrade. > After disabling it manually, it can't be enabled again. It doesn't run at > specified time. > > This BZ specifically mentions sync plans so I'm verifying it and filing a > followup BZ for general recurring tasks, like running hosts jobs. Hi Adam, Do you think this will affect The InventorySync Recurring logic as well in the same way it affected a normal recurring REX task ? @Lukas, Would it be possible to do a second test ? * Ensure that "Inventory Scheduled Sync" is scheduled to execute tonight and note down the time\date * Leave all services of satellite overnight --> 'satellite-maintain service stop' * Start back the services next day or few mins after the RL was supposed to be executed --> 'satellite-maintain service start' * Verify if "Inventory Scheduled Sync" continues to work or the RL is just not working any more. -- Sayan > Do you think this will affect The InventorySync Recurring logic as well in the same way it affected a normal recurring REX task ?
I'm afraid right now I can't give you a better answer than maybe. All three (sync plans, rex jobs and inventory sync) use the same code paths under the hood and as of now I don't see how it could work for one but not for others.
This is the followup BZ: https://bugzilla.redhat.com/show_bug.cgi?id=2180875 @Sayan done. After starting the services, the sync plans ran once and were scheduled for the next regular occurence. Sat 6.13 snap 18.0. (In reply to Lukáš Hellebrandt from comment #17) > @Sayan done. After starting the services, the sync plans ran once and were > scheduled for the next regular occurence. Sat 6.13 snap 18.0. Thanks Lukas. But, Was it Sync plan or "Inventory Scheduled Sync" that you tested this time ? -- Sayan I supposed you meant Sync Plan, i.e. repository sync. Did you really mean Insights inventory sync? Because that is only mildly related to this BZ which is about product sync plans. (In reply to Lukáš Hellebrandt from comment #19) > I supposed you meant Sync Plan, i.e. repository sync. Did you really mean > Insights inventory sync? Because that is only mildly related to this BZ > which is about product sync plans. Well, You already had confirmed about Repo Sync plans on Comment 12 but if you see above in Comment 7 , We also had discussed the same issue for "Inventory Scheduled Sync" as well and the BZ fix is technically expected to cover that task\action. The way you found that REX-related recurring logics were affected, Can you perhaps repeat a similar test for "Inventory Scheduled Sync" ( InventorySync::Async::InventoryScheduledSync ) as well to confirm whether it's working or It also needs to fixed via followup BZ https://bugzilla.redhat.com/show_bug.cgi?id=2180875 ? -- Sayan In 6.13, I can see task "Wait and Inventory scheduled sync". When I used your reproducer, that task ran and was correctly rescheduled. Is that what you need? Yup yup.. As long as "Inventory scheduled sync" continues to reschedule, I guess the fix is working.. Do you think this info needs to be posted in the follow up BZ as well ? -- Sayan Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Satellite 6.13 Release), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2023:2097 |