Bug 1476796

Summary: LOCE task can stick around after restart
Product: Red Hat Satellite Reporter: Chris Duryee <cduryee>
Component: Tasks PluginAssignee: Adam Ruzicka <aruzicka>
Status: CLOSED ERRATA QA Contact: Ales Dujicek <adujicek>
Severity: medium Docs Contact:
Priority: medium    
Version: 6.2.10CC: adujicek, aruzicka, bbuckingham, ben.argyle, bkearney, egolov, hyu, inecas, jcallaha, pmoravec, sokeeffe
Target Milestone: 6.4.0Keywords: FieldEngineering, PrioBumpField, Reopened, Triaged
Target Release: Unused   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: dynflow-0.8.31 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1642369 (view as bug list) Environment:
Last Closed: 2018-10-16 19:01:53 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1642369    

Description Chris Duryee 2017-07-31 13:31:25 UTC
Description of problem:

(note: this bug is related to the way a couple of tasks are managed, not the tasks plugin itself. I wasn't sure of the best category)

If the Satellite server has issues that result in unclean shutdown, its possible for the long-running tasks to not get cleaned up at termination. This prevents the long-running tasks from restarting upon server restart. Users have to shut down foreman-tasks again, clean the long-running tasks, and then restart.

The long-running tasks are "Listen On Candlepin Events" and "Monitor Event Queue". I haven't seen issues with "Insights Email Notifications" but it may fall into the same boat.

Version-Release number of selected component (if applicable): 6.2.10

Comment 5 Adam Ruzicka 2017-08-31 07:37:53 UTC
Steps to reproduce:
1) Take a look at LOCE, it should be in running-pending.
2) systemctl kill -s 9 foreman-tasks
3) Take a look at LOCE, it is still in running-pending even though the executor is dead now (simulates unlcean shutdown), note uuid of the LOCE task
4) systemctl restart foreman-tasks
5) Wait for a while (to let foreman-tasks fully initialize)
6) Refresh tasks list

Actual result:
LOCE task sticks around and is still kept in running-pending.

Expected results:
LOCE task is switched to stopped-$whatever, anoter LOCE task is spawned and is in running-pending.

Comment 7 Adam Ruzicka 2017-10-05 10:49:54 UTC
Created redmine issue http://projects.theforeman.org/issues/21207 from this bug

Comment 8 Satellite Program 2017-10-10 08:07:26 UTC
Upstream bug assigned to aruzicka

Comment 9 Satellite Program 2017-10-10 08:07:29 UTC
Upstream bug assigned to aruzicka

Comment 10 Adam Ruzicka 2017-10-10 08:16:03 UTC
Please disregard the previous "steps to reproduce" in comment #5

I've failed to reproduce this issue, but I've it happen several times. Basically there are two variants of this bug

After an unclean shutdown
1) there is no LOCE running.
2) there is a couple of LOCEs running.

The new way of handling long running tasks in Dynflow should take care of both of those and make sure there is always exactly one instance running.

Comment 12 Satellite Program 2018-02-21 16:54:37 UTC
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA.
> > 
> > For information on the advisory, and where to find the updated files, follow the link below.
> > 
> > If the solution does not work for you, open a new bug report.
> > 
> > https://access.redhat.com/errata/RHSA-2018:0336

Comment 13 Pavel Moravec 2018-04-19 13:22:44 UTC
Reopening, since the katello patch isn't present in 6.3.1. And that's why no LOCE task after foreman-tasks restart happened at a customer and also on some internal system.

Since:
- no LOCE task can have serious consequences (e.g. candlepin in maintenance mode, so no new systems register)
- backport seems easy

I am asking for z-stream.

(sadly, there is no reproducer available ATM)

Comment 16 Ben 2018-07-09 11:09:11 UTC
I'm running into this on 6.2.15.  How do I get LOCE into running/pending state again, please?  I did a "shutdown -h now" and assumed this would run a "katello-service stop" or similar.  In any event I appear to have had an unclean shutdown and I'm stuck with a LOCE in paused/pending state after the reboot.

What steps are required?

Comment 17 Adam Ruzicka 2018-07-23 12:44:46 UTC
*** Bug 1605025 has been marked as a duplicate of this bug. ***

Comment 20 Bryan Kearney 2018-10-16 19:01:53 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2927