Bug 1388631
| Summary: | Enable Process Recycling for Pulp Worker Processes | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Satellite | Reporter: | Brian Bouterse <bmbouter> | ||||
| Component: | Pulp | Assignee: | Martin Bacovsky <mbacovsk> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Lukas Pramuk <lpramuk> | ||||
| Severity: | high | Docs Contact: | |||||
| Priority: | high | ||||||
| Version: | Unspecified | CC: | aperotti, bbuckingham, bkearney, bmbouter, brubisch, cduryee, chrobert, cwelton, daniele, daviddavis, dkliban, egolov, ggainey, ipanova, jentrena, lpramuk, mbacovsk, mhrivnak, mmccune, mtenheuv, oshtaier, pcreech, pmoravec, pmorey, ramsingh, rchan, ttereshc, zhunting | ||||
| Target Milestone: | Unspecified | Keywords: | Triaged | ||||
| Target Release: | Unused | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | pulp-2.8.7.4-1 | Doc Type: | Enhancement | ||||
| Doc Text: |
Feature:
Satellite allows configuration of maximum tasks after pulp workers are recycled and release allocated memory to the system. This is set to '2' by default. To disable this completely, users can set '--katello-max-tasks-per-pulp-worker' to 'undef'.
Reason:
Python doesn't release allocated memory to the system even when it was freed and pulp workers keep reserving big amount of memory after processing certain memory hungry tasks.
Result:
With satellite-installer --katello-max-tasks-per-pulp-worker 2 each Pulp worker is restarted after every second task processed and the allocated memory is returned to the system.
|
Story Points: | --- | ||||
| Clone Of: | |||||||
| : | 1393409 1405513 (view as bug list) | Environment: | |||||
| Last Closed: | 2017-01-26 10:43:27 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | |||||||
| Bug Blocks: | 1393409, 1405513 | ||||||
| Attachments: |
|
||||||
|
Description
Brian Bouterse
2016-10-25 19:25:00 UTC
The Pulp upstream bug status is at POST. Updating the external tracker on this bug. The Pulp upstream bug priority is at Normal. Updating the external tracker on this bug. The Pulp upstream bug status is at MODIFIED. Updating the external tracker on this bug. The Pulp upstream bug status is at ON_QA. Updating the external tracker on this bug. Created attachment 1215814 [details]
Sample memory usage of the workers at customers Satellite 6.2
How to test: The maximum tasks per worker should be unlimited by default to keep default behavour from previous Sat versions. [root@dell-pe1950-06 ~]# satellite-installer Installing Done [100%] [...........................................................................................................................................................................................] Success! * Satellite is running at https://hostname * To install additional capsule on separate machine continue by running: capsule-certs-generate --capsule-fqdn "$CAPSULE" --certs-tar "~/$CAPSULE-certs.tar" The full log is at /var/log/foreman-installer/satellite.log [root@dell-pe1950-06 ~]# vim /etc/default/pulp_workers # Configuration file for Pulp's Celery workers # Define the number of worker nodes you wish to have here. This defaults to the number of processors # that are detected on the system if left commented here. PULP_CONCURRENCY=4 # Configure Python's encoding for writing all logs, stdout and stderr PYTHONIOENCODING="UTF-8" # To avoid memory leaks, Pulp can terminate and replace a worker after processing X tasks. If # left commented, process recycling is disabled. PULP_MAX_TASKS_PER_CHILD must be > 0. # PULP_MAX_TASKS_PER_CHILD=2 %> Check the config was extended and the entry is remarked. Also check that the workers run without --maxtasksperchild [root@dell-pe1950-06 ~]# ps -fax|grep worker 5839 ? Ssl 0:02 /usr/bin/python /usr/bin/celery worker -A pulp.server.async.app -n resource_manager@%h -Q resource_manager -c 1 --events --umask 18 --pidfile=/var/run/pulp/resource_manager.pid --heartbeat-interval=30 5990 ? S 0:00 \_ /usr/bin/python /usr/bin/celery worker -A pulp.server.async.app -n resource_manager@%h -Q resource_manager -c 1 --events --umask 18 --pidfile=/var/run/pulp/resource_manager.pid --heartbeat-interval=30 5876 ? Ssl 0:03 /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-0@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-0.pid --heartbeat-interval=30 5996 ? Sl 0:01 \_ /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-0@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-0.pid --heartbeat-interval=30 5879 ? Ssl 0:03 /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-1@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-1.pid --heartbeat-interval=30 5994 ? Sl 0:01 \_ /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-1@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-1.pid --heartbeat-interval=30 5882 ? Ssl 0:03 /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-2@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-2.pid --heartbeat-interval=30 5995 ? Sl 0:01 \_ /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-2@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-2.pid --heartbeat-interval=30 5888 ? Ssl 0:03 /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-3@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-3.pid --heartbeat-interval=30 5992 ? Sl 0:01 \_ /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-3@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-3.pid --heartbeat-interval=30 When the maximum is set the config should be modified accordingly and the workers should be started with proper --maxtasksperchild parameter: [root@dell-pe1950-06 ~]# satellite-installer --katello-max-tasks-per-pulp-worker 2 Installing Done [100%] [...........................................................................................................................................................................................] Success! * Satellite is running at https://hostname * To install additional capsule on separate machine continue by running: capsule-certs-generate --capsule-fqdn "$CAPSULE" --certs-tar "~/$CAPSULE-certs.tar" The full log is at /var/log/foreman-installer/satellite.log [root@dell-pe1950-06 ~]# ps -fax|grep worker 9610 ? Ssl 0:02 /usr/bin/python /usr/bin/celery worker -A pulp.server.async.app -n resource_manager@%h -Q resource_manager -c 1 --events --umask 18 --pidfile=/var/run/pulp/resource_manager.pid --heartbeat-interval=30 9783 ? S 0:00 \_ /usr/bin/python /usr/bin/celery worker -A pulp.server.async.app -n resource_manager@%h -Q resource_manager -c 1 --events --umask 18 --pidfile=/var/run/pulp/resource_manager.pid --heartbeat-interval=30 9671 ? Ssl 0:01 /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-0@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-0.pid --heartbeat-interval=30 --maxtasksperchild=2 9785 ? S 0:00 \_ /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-0@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-0.pid --heartbeat-interval=30 --maxtasksperchild=2 9674 ? Ssl 0:02 /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-1@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-1.pid --heartbeat-interval=30 --maxtasksperchild=2 9787 ? S 0:00 \_ /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-1@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-1.pid --heartbeat-interval=30 --maxtasksperchild=2 9677 ? Ssl 0:01 /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-2@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-2.pid --heartbeat-interval=30 --maxtasksperchild=2 9789 ? S 0:00 \_ /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-2@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-2.pid --heartbeat-interval=30 --maxtasksperchild=2 9680 ? Ssl 0:02 /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-3@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-3.pid --heartbeat-interval=30 --maxtasksperchild=2 9791 ? S 0:00 \_ /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-3@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-3.pid --heartbeat-interval=30 --maxtasksperchild=2 [root@dell-pe1950-06 ~]# cat /etc/default/pulp_workers # Configuration file for Pulp's Celery workers # Define the number of worker nodes you wish to have here. This defaults to the number of processors # that are detected on the system if left commented here. PULP_CONCURRENCY=4 # Configure Python's encoding for writing all logs, stdout and stderr PYTHONIOENCODING="UTF-8" # To avoid memory leaks, Pulp can terminate and replace a worker after processing X tasks. If # left commented, process recycling is disabled. PULP_MAX_TASKS_PER_CHILD must be > 0. PULP_MAX_TASKS_PER_CHILD=2 Created redmine issue http://projects.theforeman.org/issues/17298 from this bug The Pulp upstream bug status is at VERIFIED. Updating the external tracker on this bug. FYI, quite probable reproducer for pulp celery worker memory leak: sync and publish a bigger repo, repeatedly. In Sat world, create content view, add RHEL7 base repo (feel free to use a bigger one), publish the CV, and delete it. Do it in a cycle. Particular commands: hmr="hammer -u admin -p redhat" while true; do echo "$(date): creating&publishing&deleting a content view with RHEL7 repo" $hmr content-view create --name cv_rhel7_test --organization-id=1 $hmr content-view add-repository --organization-id=1 --repository="Red Hat Enterprise Linux 7 Server RPMs x86_64 7Server" --name=cv_rhel7_test --product="Red Hat Enterprise Linux Server" $hmr content-view publish --name=cv_rhel7_test --organization-id=1 $hmr content-view remove-from-environment --name=cv_rhel7_test --organization-id=1 --lifecycle-environment=Library $hmr content-view delete --name=cv_rhel7_test --organization-id=1 echo "$(date): sleeping" sleep 10 done (In reply to Pavel Moravec from comment #16) > FYI, quite probable reproducer for pulp celery worker memory leak: sync and > publish a bigger repo, repeatedly. > > In Sat world, create content view, add RHEL7 base repo (feel free to use a > bigger one), publish the CV, and delete it. Do it in a cycle. Particular > commands: > > hmr="hammer -u admin -p redhat" > > while true; do > echo "$(date): creating&publishing&deleting a content view with RHEL7 repo" > $hmr content-view create --name cv_rhel7_test --organization-id=1 > $hmr content-view add-repository --organization-id=1 --repository="Red Hat > Enterprise Linux 7 Server RPMs x86_64 7Server" --name=cv_rhel7_test > --product="Red Hat Enterprise Linux Server" > $hmr content-view publish --name=cv_rhel7_test --organization-id=1 > $hmr content-view remove-from-environment --name=cv_rhel7_test > --organization-id=1 --lifecycle-environment=Library > $hmr content-view delete --name=cv_rhel7_test --organization-id=1 > echo "$(date): sleeping" > sleep 10 > done I apologize, that increases memory to some extent but stabilizes after a while - ne leak. The Pulp upstream bug status is at CLOSED - CURRENTRELEASE. Updating the external tracker on this bug. small docs update to account for change to set to '2' by default. VERIFIED. @Sat6.2.7-Snap2 pulp-server-2.8.7.4-1.el7sat.noarch # grep ^PULP_MAX_TASKS_PER_CHILD /etc/default/pulp_workers PULP_MAX_TASKS_PER_CHILD=2 # ps -efH|grep -c maxtasksperchild=[2] 16 >>> all workers-{0..7} are using the new option and repetitive syncs of big rhel repositories work fine, celery workers consumes ~ 1G of memory # satellite-installer --katello-max-tasks-per-pulp-worker 3 Installing Done [100%] [..........................................................................................................] Success! * Satellite is running at https://<SATFQDN> * To install additional capsule on separate machine continue by running: capsule-certs-generate --capsule-fqdn "$CAPSULE" --certs-tar "~/$CAPSULE-certs.tar" The full log is at /var/log/foreman-installer/satellite.log # grep ^PULP_MAX_TASKS_PER_CHILD /etc/default/pulp_workers PULP_MAX_TASKS_PER_CHILD=3 # ps -efH|grep -c maxtasksperchild=[3] 16 >>> Using installer option one is able to customize the PULP_MAX_TASKS_PER_CHILD value # satellite-installer --katello-max-tasks-per-pulp-worker undef Installing Done [100%] [..........................................................................................................] Success! * Satellite is running at https://<SATFQDN> * To install additional capsule on separate machine continue by running: capsule-certs-generate --capsule-fqdn "$CAPSULE" --certs-tar "~/$CAPSULE-certs.tar" The full log is at /var/log/foreman-installer/satellite.log # grep PULP_MAX_TASKS_PER_CHILD /etc/default/pulp_workers # left commented, process recycling is disabled. PULP_MAX_TASKS_PER_CHILD must be > 0. # PULP_MAX_TASKS_PER_CHILD=2 # ps -efH|grep -c [m]axtasksperchild 0 >>> Using installer option one is able to change the behavior to how it was before the fix Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:0197 |