Red Hat Bugzilla – Bug 1388631
Enable Process Recycling for Pulp Worker Processes
Last modified: 2018-09-19 11:13:01 EDT
Upstream Pulp added a new feature which should reduce the memory used by Pulp workers. It does this using process recycling. See the upstream bug for more details. To include this in downstream you should: 1. cherry pick the 3 commits attached to the upstream bug 2. Enable the feature (see the upstream docs on how to do this) I recommend a value of of < 10 for process recycling. A value of 2 would probably be good. Also Katello should likely enable this also when they rebase onto upstream 2.11+.
The Pulp upstream bug status is at POST. Updating the external tracker on this bug.
The Pulp upstream bug priority is at Normal. Updating the external tracker on this bug.
The Pulp upstream bug status is at MODIFIED. Updating the external tracker on this bug.
The Pulp upstream bug status is at ON_QA. Updating the external tracker on this bug.
Created attachment 1215814 [details] Sample memory usage of the workers at customers Satellite 6.2
How to test: The maximum tasks per worker should be unlimited by default to keep default behavour from previous Sat versions. [root@dell-pe1950-06 ~]# satellite-installer Installing Done [100%] [...........................................................................................................................................................................................] Success! * Satellite is running at https://hostname * To install additional capsule on separate machine continue by running: capsule-certs-generate --capsule-fqdn "$CAPSULE" --certs-tar "~/$CAPSULE-certs.tar" The full log is at /var/log/foreman-installer/satellite.log [root@dell-pe1950-06 ~]# vim /etc/default/pulp_workers # Configuration file for Pulp's Celery workers # Define the number of worker nodes you wish to have here. This defaults to the number of processors # that are detected on the system if left commented here. PULP_CONCURRENCY=4 # Configure Python's encoding for writing all logs, stdout and stderr PYTHONIOENCODING="UTF-8" # To avoid memory leaks, Pulp can terminate and replace a worker after processing X tasks. If # left commented, process recycling is disabled. PULP_MAX_TASKS_PER_CHILD must be > 0. # PULP_MAX_TASKS_PER_CHILD=2 %> Check the config was extended and the entry is remarked. Also check that the workers run without --maxtasksperchild [root@dell-pe1950-06 ~]# ps -fax|grep worker 5839 ? Ssl 0:02 /usr/bin/python /usr/bin/celery worker -A pulp.server.async.app -n resource_manager@%h -Q resource_manager -c 1 --events --umask 18 --pidfile=/var/run/pulp/resource_manager.pid --heartbeat-interval=30 5990 ? S 0:00 \_ /usr/bin/python /usr/bin/celery worker -A pulp.server.async.app -n resource_manager@%h -Q resource_manager -c 1 --events --umask 18 --pidfile=/var/run/pulp/resource_manager.pid --heartbeat-interval=30 5876 ? Ssl 0:03 /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-0@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-0.pid --heartbeat-interval=30 5996 ? Sl 0:01 \_ /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-0@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-0.pid --heartbeat-interval=30 5879 ? Ssl 0:03 /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-1@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-1.pid --heartbeat-interval=30 5994 ? Sl 0:01 \_ /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-1@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-1.pid --heartbeat-interval=30 5882 ? Ssl 0:03 /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-2@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-2.pid --heartbeat-interval=30 5995 ? Sl 0:01 \_ /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-2@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-2.pid --heartbeat-interval=30 5888 ? Ssl 0:03 /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-3@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-3.pid --heartbeat-interval=30 5992 ? Sl 0:01 \_ /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-3@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-3.pid --heartbeat-interval=30 When the maximum is set the config should be modified accordingly and the workers should be started with proper --maxtasksperchild parameter: [root@dell-pe1950-06 ~]# satellite-installer --katello-max-tasks-per-pulp-worker 2 Installing Done [100%] [...........................................................................................................................................................................................] Success! * Satellite is running at https://hostname * To install additional capsule on separate machine continue by running: capsule-certs-generate --capsule-fqdn "$CAPSULE" --certs-tar "~/$CAPSULE-certs.tar" The full log is at /var/log/foreman-installer/satellite.log [root@dell-pe1950-06 ~]# ps -fax|grep worker 9610 ? Ssl 0:02 /usr/bin/python /usr/bin/celery worker -A pulp.server.async.app -n resource_manager@%h -Q resource_manager -c 1 --events --umask 18 --pidfile=/var/run/pulp/resource_manager.pid --heartbeat-interval=30 9783 ? S 0:00 \_ /usr/bin/python /usr/bin/celery worker -A pulp.server.async.app -n resource_manager@%h -Q resource_manager -c 1 --events --umask 18 --pidfile=/var/run/pulp/resource_manager.pid --heartbeat-interval=30 9671 ? Ssl 0:01 /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-0@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-0.pid --heartbeat-interval=30 --maxtasksperchild=2 9785 ? S 0:00 \_ /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-0@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-0.pid --heartbeat-interval=30 --maxtasksperchild=2 9674 ? Ssl 0:02 /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-1@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-1.pid --heartbeat-interval=30 --maxtasksperchild=2 9787 ? S 0:00 \_ /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-1@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-1.pid --heartbeat-interval=30 --maxtasksperchild=2 9677 ? Ssl 0:01 /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-2@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-2.pid --heartbeat-interval=30 --maxtasksperchild=2 9789 ? S 0:00 \_ /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-2@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-2.pid --heartbeat-interval=30 --maxtasksperchild=2 9680 ? Ssl 0:02 /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-3@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-3.pid --heartbeat-interval=30 --maxtasksperchild=2 9791 ? S 0:00 \_ /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-3@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-3.pid --heartbeat-interval=30 --maxtasksperchild=2 [root@dell-pe1950-06 ~]# cat /etc/default/pulp_workers # Configuration file for Pulp's Celery workers # Define the number of worker nodes you wish to have here. This defaults to the number of processors # that are detected on the system if left commented here. PULP_CONCURRENCY=4 # Configure Python's encoding for writing all logs, stdout and stderr PYTHONIOENCODING="UTF-8" # To avoid memory leaks, Pulp can terminate and replace a worker after processing X tasks. If # left commented, process recycling is disabled. PULP_MAX_TASKS_PER_CHILD must be > 0. PULP_MAX_TASKS_PER_CHILD=2
Created redmine issue http://projects.theforeman.org/issues/17298 from this bug
The Pulp upstream bug status is at VERIFIED. Updating the external tracker on this bug.
FYI, quite probable reproducer for pulp celery worker memory leak: sync and publish a bigger repo, repeatedly. In Sat world, create content view, add RHEL7 base repo (feel free to use a bigger one), publish the CV, and delete it. Do it in a cycle. Particular commands: hmr="hammer -u admin -p redhat" while true; do echo "$(date): creating&publishing&deleting a content view with RHEL7 repo" $hmr content-view create --name cv_rhel7_test --organization-id=1 $hmr content-view add-repository --organization-id=1 --repository="Red Hat Enterprise Linux 7 Server RPMs x86_64 7Server" --name=cv_rhel7_test --product="Red Hat Enterprise Linux Server" $hmr content-view publish --name=cv_rhel7_test --organization-id=1 $hmr content-view remove-from-environment --name=cv_rhel7_test --organization-id=1 --lifecycle-environment=Library $hmr content-view delete --name=cv_rhel7_test --organization-id=1 echo "$(date): sleeping" sleep 10 done
(In reply to Pavel Moravec from comment #16) > FYI, quite probable reproducer for pulp celery worker memory leak: sync and > publish a bigger repo, repeatedly. > > In Sat world, create content view, add RHEL7 base repo (feel free to use a > bigger one), publish the CV, and delete it. Do it in a cycle. Particular > commands: > > hmr="hammer -u admin -p redhat" > > while true; do > echo "$(date): creating&publishing&deleting a content view with RHEL7 repo" > $hmr content-view create --name cv_rhel7_test --organization-id=1 > $hmr content-view add-repository --organization-id=1 --repository="Red Hat > Enterprise Linux 7 Server RPMs x86_64 7Server" --name=cv_rhel7_test > --product="Red Hat Enterprise Linux Server" > $hmr content-view publish --name=cv_rhel7_test --organization-id=1 > $hmr content-view remove-from-environment --name=cv_rhel7_test > --organization-id=1 --lifecycle-environment=Library > $hmr content-view delete --name=cv_rhel7_test --organization-id=1 > echo "$(date): sleeping" > sleep 10 > done I apologize, that increases memory to some extent but stabilizes after a while - ne leak.
The Pulp upstream bug status is at CLOSED - CURRENTRELEASE. Updating the external tracker on this bug.
small docs update to account for change to set to '2' by default.
VERIFIED. @Sat6.2.7-Snap2 pulp-server-2.8.7.4-1.el7sat.noarch # grep ^PULP_MAX_TASKS_PER_CHILD /etc/default/pulp_workers PULP_MAX_TASKS_PER_CHILD=2 # ps -efH|grep -c maxtasksperchild=[2] 16 >>> all workers-{0..7} are using the new option and repetitive syncs of big rhel repositories work fine, celery workers consumes ~ 1G of memory # satellite-installer --katello-max-tasks-per-pulp-worker 3 Installing Done [100%] [..........................................................................................................] Success! * Satellite is running at https://<SATFQDN> * To install additional capsule on separate machine continue by running: capsule-certs-generate --capsule-fqdn "$CAPSULE" --certs-tar "~/$CAPSULE-certs.tar" The full log is at /var/log/foreman-installer/satellite.log # grep ^PULP_MAX_TASKS_PER_CHILD /etc/default/pulp_workers PULP_MAX_TASKS_PER_CHILD=3 # ps -efH|grep -c maxtasksperchild=[3] 16 >>> Using installer option one is able to customize the PULP_MAX_TASKS_PER_CHILD value # satellite-installer --katello-max-tasks-per-pulp-worker undef Installing Done [100%] [..........................................................................................................] Success! * Satellite is running at https://<SATFQDN> * To install additional capsule on separate machine continue by running: capsule-certs-generate --capsule-fqdn "$CAPSULE" --certs-tar "~/$CAPSULE-certs.tar" The full log is at /var/log/foreman-installer/satellite.log # grep PULP_MAX_TASKS_PER_CHILD /etc/default/pulp_workers # left commented, process recycling is disabled. PULP_MAX_TASKS_PER_CHILD must be > 0. # PULP_MAX_TASKS_PER_CHILD=2 # ps -efH|grep -c [m]axtasksperchild 0 >>> Using installer option one is able to change the behavior to how it was before the fix
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:0197