Bug 1877917

Summary: dynflow_executor.output grows extremely large in short period of time.
Product: Red Hat Satellite Reporter: Dylan Gross <dgross>
Component: DynflowAssignee: Adam Ruzicka <aruzicka>
Status: CLOSED ERRATA QA Contact: Lukáš Hellebrandt <lhellebr>
Severity: high Docs Contact:
Priority: high    
Version: 6.7.0CC: aganbat, ahumbe, arahaman, aruzicka, bsawyers, dhjoshi, dmule, dsynk, egolov, gopinath.perumal, gscarbor, jbhatia, jjeffers, kkohli, ktordeur, mmccune, sadas, saydas, sboyron
Target Milestone: 6.9.3Keywords: PrioBumpGSS, Triaged
Target Release: Unused   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: tfm-rubygem-dynflow-1.4.8 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1962840 (view as bug list) Environment:
Last Closed: 2021-07-01 14:56:48 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Dylan Gross 2020-09-10 18:37:03 UTC
Description of problem:

   The /var/log/foreman/dynflow_executor.output grows extremely large in short period of time.   (600+ GB in under 24 hours.)

Version-Release number of selected component (if applicable):

   Red Hat Satellite 6.7.2

How reproducible:   Unknown - Not reproducible at will


Actual results:

   The dynflow_executor.output grew over half a TB in a day.

Expected results:

   The dynflow_executor.output logs relevant info and remains at a reasonable size.

Additional info:

  There was obviously some underlying condition with the dynflow_executor that caused this massive amount of atypical logging.   

  Impact to the system came from two different scenarios.   While this particular situation had logging in an alternate location that could accomodate  a few hundred GB, this would likely file /var on most systems.    The second scenario is executing a foreman-debug, which runs xzcat against the admittedly impressively compressed file, and expands it in the foreman-debug location (default=/var/tmp).  (This second scenario can be worked around by specifying an alternate location, if you're aware of the situation ahead of time).

Comment 13 asml_gperumal 2021-04-08 08:07:35 UTC
Hi,
is this bug applicable for Satellite 6.7.5?

Comment 17 Lukáš Hellebrandt 2021-06-10 10:10:02 UTC
Verified with Sat 6.9.3 snap 1.0.

Used reproducer (don't forget to change the TASKS_ROOT to current foreman-tasks version): https://gist.github.com/adamruzicka/3a4681f488e5978c7bd49e544f1ce124 (thanks Adam)

In 6.8, upon invoking the reproducer, there were 5816 lines added to the production.log (basically an absurdly long traceback). In 6.9.3, it's only 59 lines (a shorter and perhaps more useful traceback). No regression was found (manually, automation results not available) => VERIFIED.

Comment 18 Mike McCune 2021-06-10 22:25:14 UTC
Looking at 6.8.6 and 6.9.2 they each have the same Dynflow version:


# rpm -q satellite tfm-rubygem-dynflow
satellite-6.8.6-1.el7sat.noarch
tfm-rubygem-dynflow-1.4.7-1.fm2_1.el7sat.noarch


# rpm -q satellite tfm-rubygem-dynflow
satellite-6.9.2-1.el7sat.noarch
tfm-rubygem-dynflow-1.4.7-1.fm2_1.el7sat.noarch

so, whatever change resolved this issue, it isn't in Dynflow proper.

Adam, any ideas on what else may have resolved this?

Comment 19 Adam Ruzicka 2021-06-11 06:34:31 UTC
@Mike You tried reproducing it or is it just a response to what Lukas wrote in #17?

Sidekiq-based deployments (6.8 and up) seem to be immune to this due to the split of workers into separate processes. To verify this on 6.9.3 we had to jump through a few hoops to trigger the bug at all.

Comment 24 errata-xmlrpc 2021-07-01 14:56:48 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Satellite 6.9.3 Async Bug Fix Update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:2636

Comment 26 Red Hat Bugzilla 2023-09-15 00:47:54 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days