Description of problem: The celery worker is consuming about 60MB RAM initially. After running regenerating applicability for a consumer that binds to 9 repositories, it increased to about 350MB+ and the RAM will never be freed. I think below are the reason of high memory consumption. Pulp is fetching the pkglist from all the repositories that a particular Erratum is associated to. This is expensive and the results may contain a lot of duplicate pkglist. For example, Pulp makes this query: db.erratum_pkglists.find({"errata_id": "RHBA-2016:1886"}).count() 3 Instead of doing the following: db.erratum_pkglists.find({"errata_id": "RHBA-2016:1886", "repo_id" : "my_org-Red_Hat_Enterprise_Linux_Server-Red_Hat_Satellite_Tools_6_2_for_RHEL_7_Server_RPMs_x86_64"}).count() 1 After amending the "erratum_pkglists" query to filter the errata by repository, the memory consumption and the speed are reduced by 80% I think I understand why Pulp don't filter the pkglist by repository when regenerating applicability. It is due to the fact that one entry may not contain all the pkglist since an erratum can be copied accross repositories. I made the following change to retrieve only the "nevra" of the errata pkglist when regenerating applicability for consumer. This patch can reduce the memory consumption by ~50% (350MB to 150MB) for a consumer with 9 repositories. https://github.com/hao-yu/pulp_rpm/commit/9f5a52823afee80b31c1e3aa14f4f65fc85f9be9
The Pulp upstream bug status is at NEW. Updating the external tracker on this bug.
The Pulp upstream bug priority is at Normal. Updating the external tracker on this bug.
The Pulp upstream bug status is at ASSIGNED. Updating the external tracker on this bug.
Apparently, it is likely that repository applicability may be affected as well. Cheers,
Another possible side-effect symptom: high CPU usage (accompanied with the high memory requirements).
Do users seeing this bug have worker recycling disabled? Celery workers should recycle every 2 tasks completed if the value of this configuration setting is > 0: # grep PULP_MAX_TASKS_PER_CHILD /etc//default/pulp_workers # left commented, process recycling is disabled. PULP_MAX_TASKS_PER_CHILD must be > 0. PULP_MAX_TASKS_PER_CHILD=2 if this is commented out, I could expect worker processes to consume large amounts of ram over a period of days and end up being killed by the OOM process.
No. The recycling is enabled and currently set to 2 (default).
(In reply to Pavel Moravec from comment #11) > Another possible side-effect symptom: high CPU usage (accompanied with the > high memory requirements). The patch proposed by the reporter helps a lot to memory consumption, but it speeds up the task to some smaller extent only. E.g. I have a reproducer where original reg.app. task for repos: { "content_unit_counts" : { "package_group" : 202, "package_category" : 10, "rpm" : 20029, "yum_repo_metadata_file" : 1, "erratum" : 4151 }, "repo_id" : "ORG-Linux-Red_Hat_Enterprise_Linux_Server-Red_Hat_Enterprise_Linux_6_Server_RPMs_x86_64_6Server" } { "content_unit_counts" : { "package_group" : 2, "package_category" : 1, "rpm" : 11612, "yum_repo_metadata_file" : 1, "erratum" : 2233 }, "repo_id" : "ORG-Linux-Red_Hat_Enterprise_Linux_Server-Red_Hat_Enterprise_Linux_6_Server_-_Optional_RPMs_x86_64_6Server" } takes 8minutes without the patch and 6 minutes with the patch - on idle Satellite. That's still too much for quite normal deployment and just 2 repos bound. (for each of the repos, it took approx. 3mins to calculate their applicability for the consumer, per some debugs). Most of the time, the patched pulp was caught executing these 3 lines: + for pkglist in pkglists: + for collection in pkglist['collections']: + for package in collection.get('packages', []):
Very commonly seen backtrace with the patch applied: #1 PyEval_EvalFrameEx (f=f@entry=Frame 0x1afd820, for file /usr/lib/python2.7/site-packages/mongoengine/base/datastructures.py, line 113, in __getitem__ (self=<BaseList(_name=u'collections.0.packages.9.sum' #8 0x00007f31d2219d4c in PyEval_EvalFrameEx (f=f@entry=Frame 0x1b0cc80, for file /usr/lib/python2.7/site-packages/mongoengine/base/datastructures.py, line 130, in __iter__ (self=<BaseList(_name=u'collections.0.packages.9.sum' #11 0x00007f31d2219c41 in PyEval_EvalFrameEx (f=f@entry=Frame 0x1b893f0, for file /usr/lib/python2.7/site-packages/mongoengine/dereference.py, line 99, in _find_references (self=<DeReference(object_map={} #15 PyEval_EvalFrameEx (f=f@entry=Frame 0x1b32c50, for file /usr/lib/python2.7/site-packages/mongoengine/dereference.py, line 119, in _find_references (self=<DeReference(object_map={} #19 PyEval_EvalFrameEx (f=f@entry=Frame 0x1b329b0, for file /usr/lib/python2.7/site-packages/mongoengine/dereference.py, line 119, in _find_references (self=<DeReference(object_map={} #23 PyEval_EvalFrameEx (f=f@entry=Frame 0x1b33630, for file /usr/lib/python2.7/site-packages/mongoengine/dereference.py, line 119, in _find_references (self=<DeReference(object_map={} #27 PyEval_EvalFrameEx (f=f@entry=Frame 0x1b54740, for file /usr/lib/python2.7/site-packages/mongoengine/dereference.py, line 119, in _find_references (self=<DeReference(object_map={} #31 PyEval_EvalFrameEx (f=f@entry=Frame 0x1b06ce0, for file /usr/lib/python2.7/site-packages/mongoengine/dereference.py, line 76, in __call__ (self=<DeReference(object_map={} #41 PyEval_EvalFrameEx (f=f@entry=Frame 0x1aff2c0, for file /usr/lib/python2.7/site-packages/mongoengine/base/fields.py, line 278, in __get__ (self=<ListField(db_field='collections' #49 PyEval_EvalFrameEx (f=f@entry=Frame 0x7f31af778d38, for file /usr/lib/python2.7/site-packages/mongoengine/base/document.py, line 224, in __getitem__ (self=<ErratumPkglist(_cls='ErratumPkglist') at remote 0x7f31c4b044d0> #56 0x00007f31d2219d4c in PyEval_EvalFrameEx (f=f@entry=Frame 0x1b46ba0, for file /usr/lib/python2.7/site-packages/pulp_rpm/plugins/db/models.py, line 718, in rpms_generator (cls=<TopLevelDocumentMetaclass(errata_from=<StringField(regex=None or similar one, where simply: pkglists = ErratumPkglist.objects(errata_id=errata_id).only('collections') found = set() fields = ('name', 'epoch', 'version', 'release', 'arch') for pkglist in pkglists: for collection in pkglist['collections']: for package in collection.get('packages', []): So (my guess) the pkglist is being slowly fetched by many mongo queries in a row, and these queries and namely processing them takes so much time? (cant be this somehow optimised further? mongo has unique index with key 'errata_id' already..)
(In reply to Hao Chang Yu from comment #0) > Pulp is fetching the pkglist from all the repositories that a particular > Erratum is associated to. This is expensive and the results may contain a > lot of duplicate pkglist. Hao Chang, would these duplicates explain errors like the error reported in bug 1503027 ? 2017-10-17 04:22:34 EDT ERROR: duplicate key value violates unique constraint "index_katello_rpms_on_uuid" 2017-10-17 04:22:34 EDT DETAIL: Key (uuid)=(7fe43d69-c8fc-44ec-99ba-709da6b77bbe) already exists. 2017-10-17 04:22:34 EDT STATEMENT: INSERT INTO "katello_rpms" ("uuid", "created_at", "updated_at") VALUES ($1, $2, $3) RETURNING "id"
(In reply to Julio Entrena Perez from comment #23) > (In reply to Hao Chang Yu from comment #0) > > > Pulp is fetching the pkglist from all the repositories that a particular > > Erratum is associated to. This is expensive and the results may contain a > > lot of duplicate pkglist. > > Hao Chang, would these duplicates explain errors like the error reported in > bug 1503027 ? > > 2017-10-17 04:22:34 EDT ERROR: duplicate key value violates unique > constraint "index_katello_rpms_on_uuid" > 2017-10-17 04:22:34 EDT DETAIL: Key > (uuid)=(7fe43d69-c8fc-44ec-99ba-709da6b77bbe) already exists. > 2017-10-17 04:22:34 EDT STATEMENT: INSERT INTO "katello_rpms" ("uuid", > "created_at", "updated_at") VALUES ($1, $2, $3) RETURNING "id" Hi Julio. No. This is different issue with bug 1503027. This bug is about how errata pkglist are stored in Pulp causing performance and high memory usage.
The Pulp upstream bug status is at POST. Updating the external tracker on this bug.
The Pulp upstream bug status is at MODIFIED. Updating the external tracker on this bug.
All upstream Pulp bugs are at MODIFIED+. Moving this bug to POST.
The Pulp upstream bug status is at CLOSED - CURRENTRELEASE. Updating the external tracker on this bug.
Verified in Satellite 6.5.0 Snap 15 Steps: Attached 12 repositories to a host. Performed an applicability regen on the host, while monitoring celery workers ps -aux | grep celery Result: There was no significant increase in memory consumption from the celery workers. See attached video for verification.
Created attachment 1536025 [details] verification video
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2019:1222