1434069 – [RFE] max_memory_per_executor support

Red Hat Satellite engineering is moving the tracking of its product development work on Satellite to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "Satellite project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs will be migrated starting at the end of May. If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "Satellite project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/SAT-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1434069 - [RFE] max_memory_per_executor support

Summary: [RFE] max_memory_per_executor support

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Satellite
Classification:	Red Hat
Component:	Tasks Plugin
Sub Component:
Version:	6.2.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	Unspecified
Assignee:	Ivan Necas
QA Contact:	Nikhil Kathole
Docs Contact:
URL:
Whiteboard:
Duplicates (2):	1416241 1492768 (view as bug list)
Depends On:
Blocks:	1353215
TreeView+	depends on / blocked

Reported:	2017-03-20 16:41 UTC by Mike McCune
Modified:	2021-09-09 12:14 UTC (History)
CC List:	11 users (show)
Fixed In Version:	dynflow-0.8.30
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-02-21 12:38:27 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
screenrecord of real memory size(RSS) for dynflow_executor (314.71 KB, application/octet-stream) 2018-02-07 14:25 UTC, Nikhil Kathole	no flags	Details
screenrecord for EXECUTORS_COUNT=3 (432.60 KB, application/octet-stream) 2018-02-07 14:26 UTC, Nikhil Kathole	no flags	Details
View All

Links
System	ID	Priority	Status	Summary	Last Updated
Foreman Issue Tracker	17175	Normal	Closed	max_memory_per_executor support	2021-01-22 20:40:57 UTC
Foreman Issue Tracker	20875	Normal	Closed	max_memory_per_executor can lead to stuck executor, waiting for an event that would not arrive	2021-01-22 20:40:57 UTC
Red Hat Issue Tracker	SAT-4984	None	None	None	2021-09-09 12:14:13 UTC
Red Hat Product Errata	RHSA-2018:0336	normal	SHIPPED_LIVE	Important: Satellite 6.3 security, bug fix, and enhancement update	2018-02-21 22:43:42 UTC

Description Mike McCune 2017-03-20 16:41:17 UTC

Given once Ruby allocates some memory, it doesn't give it back, bigger
set of larger actions can lead to quite big memory consumption that
persists and can accumulate over time. With this, it's hard to keep
memory consumption fully under control, especially in an environment
with other systems (passenger, pulp, candlepin, qpid). Since the
executors can terminate nicely without affecting the tasks itselves,
it should be pretty easy to extend it to watch the memory consumption.

The idea:

1. config options:
max_memory_per_executor - the threshold for the memory size per executor
min_executors_count - minimal count executors (default 1)
minimal_executor_age - the period it will check whether the memory consumption didn't grow (default 1h)

2. the executor will periodically check it's memory usage,
(http://stackoverflow.com/a/24423978/457560 seems to be a sane
approach for us)

3. if memory usage exceeds `max_memory_per_executor`, the executor is
older than `minimal_executor_age` (to prevent situation, where the
memory would grow too fast over the max_memory_per_executor, which
would mean we wouldn't do anything than restarting the executors
without getting anything done and the amount of current executors
would not go under `min_executors_count`, politely terminate executor

4. the polite termination should be able to hand over all the tasks to
the other executors and once everything is finalized on the executor, it would just exit

5. the daemon monitor would notice the executor getting closed and running a new executor

It would be configurable, turned off by default (for development) but we would configure
this in production, where we can rely on the monitor being present.

Comment 1 Mike McCune 2017-03-20 16:41:22 UTC

Created from redmine issue http://projects.theforeman.org/issues/17175

Comment 2 Mike McCune 2017-03-20 16:41:27 UTC

Upstream bug assigned to sshtein

Comment 4 Satellite Program 2017-05-22 16:18:55 UTC

Moving this bug to POST for triage into Satellite 6 since the upstream issue http://projects.theforeman.org/issues/17175 has been resolved.

Comment 5 Ivan Necas 2017-08-16 12:15:28 UTC

Version Tested: Satellite-6.3 Snap 11

I've set this in /etc/sysconfig/foreman-tasks:

  EXECUTOR_MEMORY_LIMIT=400MB
  EXECUTOR_MEMORY_MONITOR_DELAY=60

Than I've followed https://bugzilla.redhat.com/show_bug.cgi?id=1406489#c19, while
watching

  watch 'ps aux | grep "\bdynflow_executor\b"'

The executor get though the 400MB threshold, it paused the task, but it didn't finished termination of dynflow process, this is due to the fact that the termination wait for some actions to finish without defining any timeouts, so it can hang forever.

Comment 6 Satellite Program 2017-09-07 16:19:06 UTC

Upstream bug assigned to inecas

Comment 7 Satellite Program 2017-09-07 16:19:10 UTC

Upstream bug assigned to inecas

Comment 8 Ivan Necas 2017-09-19 16:25:55 UTC

*** Bug 1492768 has been marked as a duplicate of this bug. ***

Comment 10 Justin Sherrill 2017-12-18 15:37:30 UTC

*** Bug 1416241 has been marked as a duplicate of this bug. ***

Comment 11 Nikhil Kathole 2018-02-07 14:23:47 UTC

VERIFIED

Version tested:
Satellite 6.3 snap 35

# rpm -qa | grep get_process_mem
tfm-rubygem-get_process_mem-0.2.1-1.el7sat.noarch

#rpm -q tfm-rubygem-foreman-tasks
tfm-rubygem-foreman-tasks-0.9.6.4-1.fm1_15.el7sat.noarch

# rpm -q tfm-rubygem-dynflow
tfm-rubygem-dynflow-0.8.34-1.fm1_15.el7sat.noarch

Steps:
1. Configured /etc/sysconfig/foreman-tasks:
  EXECUTOR_MEMORY_LIMIT=400MB
  EXECUTOR_MEMORY_MONITOR_DELAY=10

2. Run Remote Execution job on 350 hosts
3.  watch 'ps aux | grep "\bdynflow_executor\b"'

Found termination of dynflow process occured once memory usage reached to 400 MB (see attachment).

Also tried configuring :
   EXECUTOR_MEMORY_MONITOR_INTERVAL=15
   EXECUTORS_COUNT=3

Once memory usage exceeds limit for a executor, another executor started running. (see attachment 2 [details]).

Comment 12 Nikhil Kathole 2018-02-07 14:25:34 UTC

Created attachment 1392687 [details]
screenrecord of real memory size(RSS) for dynflow_executor

Comment 13 Nikhil Kathole 2018-02-07 14:26:15 UTC

Created attachment 1392688 [details]
screenrecord for EXECUTORS_COUNT=3

Comment 16 errata-xmlrpc 2018-02-21 12:38:27 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:0336

Note You need to log in before you can comment on or make changes to this bug.