Bug 1476796 - LOCE task can stick around after restart
Summary: LOCE task can stick around after restart
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Satellite
Classification: Red Hat
Component: Tasks Plugin
Version: 6.2.10
Hardware: Unspecified
OS: Unspecified
medium
medium vote
Target Milestone: 6.4.0
Assignee: Adam Ruzicka
QA Contact: Ales Dujicek
URL:
Whiteboard:
: 1605025 (view as bug list)
Depends On:
Blocks: 1642369
TreeView+ depends on / blocked
 
Reported: 2017-07-31 13:31 UTC by Chris Duryee
Modified: 2022-03-13 14:22 UTC (History)
11 users (show)

Fixed In Version: dynflow-0.8.31
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1642369 (view as bug list)
Environment:
Last Closed: 2018-10-16 19:01:53 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Foreman Issue Tracker 21207 0 Normal Closed Long running tasks like LOCE can stick around after restart 2020-11-30 13:38:51 UTC
Foreman Issue Tracker 21261 0 Normal Closed Long running tasks should use Dynflow::Action::Singleton 2020-11-30 13:39:18 UTC

Description Chris Duryee 2017-07-31 13:31:25 UTC
Description of problem:

(note: this bug is related to the way a couple of tasks are managed, not the tasks plugin itself. I wasn't sure of the best category)

If the Satellite server has issues that result in unclean shutdown, its possible for the long-running tasks to not get cleaned up at termination. This prevents the long-running tasks from restarting upon server restart. Users have to shut down foreman-tasks again, clean the long-running tasks, and then restart.

The long-running tasks are "Listen On Candlepin Events" and "Monitor Event Queue". I haven't seen issues with "Insights Email Notifications" but it may fall into the same boat.

Version-Release number of selected component (if applicable): 6.2.10

Comment 5 Adam Ruzicka 2017-08-31 07:37:53 UTC
Steps to reproduce:
1) Take a look at LOCE, it should be in running-pending.
2) systemctl kill -s 9 foreman-tasks
3) Take a look at LOCE, it is still in running-pending even though the executor is dead now (simulates unlcean shutdown), note uuid of the LOCE task
4) systemctl restart foreman-tasks
5) Wait for a while (to let foreman-tasks fully initialize)
6) Refresh tasks list

Actual result:
LOCE task sticks around and is still kept in running-pending.

Expected results:
LOCE task is switched to stopped-$whatever, anoter LOCE task is spawned and is in running-pending.

Comment 7 Adam Ruzicka 2017-10-05 10:49:54 UTC
Created redmine issue http://projects.theforeman.org/issues/21207 from this bug

Comment 8 Satellite Program 2017-10-10 08:07:26 UTC
Upstream bug assigned to aruzicka

Comment 9 Satellite Program 2017-10-10 08:07:29 UTC
Upstream bug assigned to aruzicka

Comment 10 Adam Ruzicka 2017-10-10 08:16:03 UTC
Please disregard the previous "steps to reproduce" in comment #5

I've failed to reproduce this issue, but I've it happen several times. Basically there are two variants of this bug

After an unclean shutdown
1) there is no LOCE running.
2) there is a couple of LOCEs running.

The new way of handling long running tasks in Dynflow should take care of both of those and make sure there is always exactly one instance running.

Comment 12 Satellite Program 2018-02-21 16:54:37 UTC
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA.
> > 
> > For information on the advisory, and where to find the updated files, follow the link below.
> > 
> > If the solution does not work for you, open a new bug report.
> > 
> > https://access.redhat.com/errata/RHSA-2018:0336

Comment 13 Pavel Moravec 2018-04-19 13:22:44 UTC
Reopening, since the katello patch isn't present in 6.3.1. And that's why no LOCE task after foreman-tasks restart happened at a customer and also on some internal system.

Since:
- no LOCE task can have serious consequences (e.g. candlepin in maintenance mode, so no new systems register)
- backport seems easy

I am asking for z-stream.

(sadly, there is no reproducer available ATM)

Comment 16 Ben 2018-07-09 11:09:11 UTC
I'm running into this on 6.2.15.  How do I get LOCE into running/pending state again, please?  I did a "shutdown -h now" and assumed this would run a "katello-service stop" or similar.  In any event I appear to have had an unclean shutdown and I'm stuck with a LOCE in paused/pending state after the reboot.

What steps are required?

Comment 17 Adam Ruzicka 2018-07-23 12:44:46 UTC
*** Bug 1605025 has been marked as a duplicate of this bug. ***

Comment 20 Bryan Kearney 2018-10-16 19:01:53 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2927


Note You need to log in before you can comment on or make changes to this bug.