Bug 1628638

Summary: The termination procedure after memory threshold exceeded can get stuck, waiting infinitely for some events to occur
Product: Red Hat Satellite Reporter: Ivan Necas <inecas>
Component: Tasks PluginAssignee: satellite6-bugs <satellite6-bugs>
Status: CLOSED ERRATA QA Contact: Jan Hutaƙ <jhutar>
Severity: high Docs Contact:
Priority: unspecified    
Version: UnspecifiedCC: andrew.schofield, aruzicka, gapatil, inecas, jhutar, kabbott, ktordeur, mmccune, pcreech, pdragun, pmoravec, sthirugn
Target Milestone: 6.5.0Keywords: Triaged
Target Release: Unused   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: tfm-rubygem-dynflow-1.1.1 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1661291 1684687 (view as bug list) Environment:
Last Closed: 2019-05-14 12:38:03 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1654217, 1665461    
Bug Blocks:    

Description Ivan Necas 2018-09-13 15:41:00 UTC
Description of problem:


Version-Release number of selected component (if applicable):


How reproducible:
Occasionally

Steps to Reproduce:
1. set a memory threshold on dynflow executor (EXECUTOR_MEMORY_LIMIT=2gb in /etc/sysconfig/foreman-tasks in sat 6.3, /etc/sysconfig/dynflowd in 6.4

2. run some substantial load on the tasking system continuous resyncing and publishing of CVs might be a good start
3. wait for the memory limit of the dynflow process to cross the limit - watch production.log for 'Memory level exceeded' message

Actual results:
The process still keeps running, termination is not finished, the tasks are not proceeding anymore

Expected results:
The process exits in timely manner and new one gets started

Additional info:
I've written the reproduced based on customer observations, haven't reproducer it locally in production just yet, we however know about places where we wait in the termination phase without any timeout

Comment 2 sthirugn@redhat.com 2018-09-14 15:31:56 UTC
Increasing the severity.

Comment 4 Ivan Necas 2018-09-24 12:23:47 UTC
Created redmine issue https://projects.theforeman.org/issues/25021 from this bug

Comment 5 Ivan Necas 2018-10-11 16:22:43 UTC
Fixed upstream in https://github.com/Dynflow/dynflow/pull/297

Comment 28 Mike McCune 2019-03-01 22:16:01 UTC
This bug was cloned and is still going to be included in the 6.4.3 release. It no longer has the sat-6.4.z+ flag and 6.4.3 Target Milestone Set which are now on the 6.4.z cloned bug. Please see the Clones field to track the progress of this bug in the 6.4.3 release.

Comment 31 errata-xmlrpc 2019-05-14 12:38:03 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2019:1222