Bug 1523433 - Celery worker consumes large number of memory when regenerating applicability for a consumer that binds to many repositories with many errata.
Summary: Celery worker consumes large number of memory when regenerating applicability...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Satellite
Classification: Red Hat
Component: Pulp
Version: 6.2.12
Hardware: Unspecified
OS: Linux
high
high
Target Milestone: 6.5.0
Assignee: satellite6-bugs
QA Contact: jcallaha
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-12-08 00:26 UTC by Hao Chang Yu
Modified: 2021-12-10 15:28 UTC (History)
27 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-05-14 12:36:46 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
verification video (330.33 KB, video/webm)
2019-02-18 16:10 UTC, jcallaha
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Pulp Redmine 3172 0 Normal CLOSED - CURRENTRELEASE Celery worker consumes large number of memory when regenerating applicability for a consumer that binds to many reposito... 2018-07-09 15:07:00 UTC
Pulp Redmine 3795 0 Normal CLOSED - CURRENTRELEASE Errata are not shown as applicable if epoch info is absent in pkglist 2018-07-09 15:05:24 UTC
Red Hat Bugzilla 1573892 0 high CLOSED regenerate applicability of a consumer takes many minutes 2023-03-24 14:04:23 UTC
Red Hat Product Errata RHSA-2019:1222 0 None None None 2019-05-14 12:36:54 UTC

Internal Links: 1573892

Description Hao Chang Yu 2017-12-08 00:26:40 UTC
Description of problem:

The celery worker is consuming about 60MB RAM initially. After running regenerating applicability for a consumer that binds to 9 repositories, it increased to about 350MB+ and the RAM will never be freed.

I think below are the reason of high memory consumption.

Pulp is fetching the pkglist from all the repositories that a particular Erratum is associated to. This is expensive and the results may contain a lot of duplicate pkglist.

For example, Pulp makes this query:

    db.erratum_pkglists.find({"errata_id": "RHBA-2016:1886"}).count()

3

Instead of doing the following:

    db.erratum_pkglists.find({"errata_id": "RHBA-2016:1886", "repo_id" : "my_org-Red_Hat_Enterprise_Linux_Server-Red_Hat_Satellite_Tools_6_2_for_RHEL_7_Server_RPMs_x86_64"}).count()

1

After amending the "erratum_pkglists" query to filter the errata by repository, the memory consumption and the speed are reduced by 80%

I think I understand why Pulp don't filter the pkglist by repository when regenerating applicability. It is due to the fact that one entry may not contain all the pkglist since an erratum can be copied accross repositories.

I made the following change to retrieve only the "nevra" of the errata pkglist when regenerating applicability for consumer. This patch can reduce the memory consumption by ~50% (350MB to 150MB) for a consumer with 9 repositories.

https://github.com/hao-yu/pulp_rpm/commit/9f5a52823afee80b31c1e3aa14f4f65fc85f9be9

Comment 2 pulp-infra@redhat.com 2017-12-08 14:32:13 UTC
The Pulp upstream bug status is at NEW. Updating the external tracker on this bug.

Comment 3 pulp-infra@redhat.com 2017-12-08 14:32:15 UTC
The Pulp upstream bug priority is at Normal. Updating the external tracker on this bug.

Comment 9 pulp-infra@redhat.com 2018-01-03 19:02:27 UTC
The Pulp upstream bug status is at ASSIGNED. Updating the external tracker on this bug.

Comment 10 Neal Kim 2018-01-04 23:50:52 UTC
Apparently, it is likely that repository applicability may be affected as well.

Cheers,

Comment 11 Pavel Moravec 2018-02-05 20:36:34 UTC
Another possible side-effect symptom: high CPU usage (accompanied with the high memory requirements).

Comment 12 Mike McCune 2018-03-23 16:17:21 UTC
Do users seeing this bug have worker recycling disabled?

Celery workers should recycle every 2 tasks completed if the value of this configuration setting is > 0:

# grep PULP_MAX_TASKS_PER_CHILD /etc//default/pulp_workers
# left commented, process recycling is disabled. PULP_MAX_TASKS_PER_CHILD must be > 0.
PULP_MAX_TASKS_PER_CHILD=2

if this is commented out, I could expect worker processes to consume large amounts of ram over a period of days and end up being killed by the OOM process.

Comment 13 Hao Chang Yu 2018-03-28 11:13:01 UTC
No. The recycling is enabled and currently set to 2 (default).

Comment 17 Pavel Moravec 2018-04-24 12:10:53 UTC
(In reply to Pavel Moravec from comment #11)
> Another possible side-effect symptom: high CPU usage (accompanied with the
> high memory requirements).

The patch proposed by the reporter helps a lot to memory consumption, but it speeds up the task to some smaller extent only.

E.g. I have a reproducer where original reg.app. task for repos:

{ "content_unit_counts" : { "package_group" : 202, "package_category" : 10, "rpm" : 20029, "yum_repo_metadata_file" : 1, "erratum" : 4151 }, "repo_id" : "ORG-Linux-Red_Hat_Enterprise_Linux_Server-Red_Hat_Enterprise_Linux_6_Server_RPMs_x86_64_6Server" }

{ "content_unit_counts" : { "package_group" : 2, "package_category" : 1, "rpm" : 11612, "yum_repo_metadata_file" : 1, "erratum" : 2233 }, "repo_id" : "ORG-Linux-Red_Hat_Enterprise_Linux_Server-Red_Hat_Enterprise_Linux_6_Server_-_Optional_RPMs_x86_64_6Server" }


takes 8minutes without the patch and 6 minutes with the patch - on idle Satellite. That's still too much for quite normal deployment and just 2 repos bound. (for each of the repos, it took approx. 3mins to calculate their applicability for the consumer, per some debugs).

Most of the time, the patched pulp was caught executing these 3 lines:

+        for pkglist in pkglists:
+            for collection in pkglist['collections']:
+                for package in collection.get('packages', []):

Comment 18 Pavel Moravec 2018-04-24 13:25:45 UTC
Very commonly seen backtrace with the patch applied:

#1  PyEval_EvalFrameEx (f=f@entry=Frame 0x1afd820, for file /usr/lib/python2.7/site-packages/mongoengine/base/datastructures.py, line 113, in __getitem__ (self=<BaseList(_name=u'collections.0.packages.9.sum'
#8  0x00007f31d2219d4c in PyEval_EvalFrameEx (f=f@entry=Frame 0x1b0cc80, for file /usr/lib/python2.7/site-packages/mongoengine/base/datastructures.py, line 130, in __iter__ (self=<BaseList(_name=u'collections.0.packages.9.sum'
#11 0x00007f31d2219c41 in PyEval_EvalFrameEx (f=f@entry=Frame 0x1b893f0, for file /usr/lib/python2.7/site-packages/mongoengine/dereference.py, line 99, in _find_references (self=<DeReference(object_map={}
#15 PyEval_EvalFrameEx (f=f@entry=Frame 0x1b32c50, for file /usr/lib/python2.7/site-packages/mongoengine/dereference.py, line 119, in _find_references (self=<DeReference(object_map={}
#19 PyEval_EvalFrameEx (f=f@entry=Frame 0x1b329b0, for file /usr/lib/python2.7/site-packages/mongoengine/dereference.py, line 119, in _find_references (self=<DeReference(object_map={}
#23 PyEval_EvalFrameEx (f=f@entry=Frame 0x1b33630, for file /usr/lib/python2.7/site-packages/mongoengine/dereference.py, line 119, in _find_references (self=<DeReference(object_map={}
#27 PyEval_EvalFrameEx (f=f@entry=Frame 0x1b54740, for file /usr/lib/python2.7/site-packages/mongoengine/dereference.py, line 119, in _find_references (self=<DeReference(object_map={}
#31 PyEval_EvalFrameEx (f=f@entry=Frame 0x1b06ce0, for file /usr/lib/python2.7/site-packages/mongoengine/dereference.py, line 76, in __call__ (self=<DeReference(object_map={}
#41 PyEval_EvalFrameEx (f=f@entry=Frame 0x1aff2c0, for file /usr/lib/python2.7/site-packages/mongoengine/base/fields.py, line 278, in __get__ (self=<ListField(db_field='collections'
#49 PyEval_EvalFrameEx (f=f@entry=Frame 0x7f31af778d38, for file /usr/lib/python2.7/site-packages/mongoengine/base/document.py, line 224, in __getitem__ (self=<ErratumPkglist(_cls='ErratumPkglist') at remote 0x7f31c4b044d0>
#56 0x00007f31d2219d4c in PyEval_EvalFrameEx (f=f@entry=Frame 0x1b46ba0, for file /usr/lib/python2.7/site-packages/pulp_rpm/plugins/db/models.py, line 718, in rpms_generator (cls=<TopLevelDocumentMetaclass(errata_from=<StringField(regex=None

or similar one, where simply:

        pkglists = ErratumPkglist.objects(errata_id=errata_id).only('collections')
        found = set()
        fields = ('name', 'epoch', 'version', 'release', 'arch')
        for pkglist in pkglists:
            for collection in pkglist['collections']:
                for package in collection.get('packages', []):

So (my guess) the pkglist is being slowly fetched by many mongo queries in a row, and these queries and namely processing them takes so much time? (cant be this somehow optimised further? mongo has unique index with key 'errata_id' already..)

Comment 23 Julio Entrena Perez 2018-05-01 14:41:04 UTC
(In reply to Hao Chang Yu from comment #0)

> Pulp is fetching the pkglist from all the repositories that a particular
> Erratum is associated to. This is expensive and the results may contain a
> lot of duplicate pkglist.

Hao Chang, would these duplicates explain errors like the error reported in bug 1503027 ?

2017-10-17 04:22:34 EDT ERROR:  duplicate key value violates unique constraint "index_katello_rpms_on_uuid"
2017-10-17 04:22:34 EDT DETAIL:  Key (uuid)=(7fe43d69-c8fc-44ec-99ba-709da6b77bbe) already exists.
2017-10-17 04:22:34 EDT STATEMENT:  INSERT INTO "katello_rpms" ("uuid", "created_at", "updated_at") VALUES ($1, $2, $3) RETURNING "id"

Comment 24 Hao Chang Yu 2018-05-02 06:38:58 UTC
(In reply to Julio Entrena Perez from comment #23)
> (In reply to Hao Chang Yu from comment #0)
> 
> > Pulp is fetching the pkglist from all the repositories that a particular
> > Erratum is associated to. This is expensive and the results may contain a
> > lot of duplicate pkglist.
> 
> Hao Chang, would these duplicates explain errors like the error reported in
> bug 1503027 ?
> 
> 2017-10-17 04:22:34 EDT ERROR:  duplicate key value violates unique
> constraint "index_katello_rpms_on_uuid"
> 2017-10-17 04:22:34 EDT DETAIL:  Key
> (uuid)=(7fe43d69-c8fc-44ec-99ba-709da6b77bbe) already exists.
> 2017-10-17 04:22:34 EDT STATEMENT:  INSERT INTO "katello_rpms" ("uuid",
> "created_at", "updated_at") VALUES ($1, $2, $3) RETURNING "id"

Hi Julio. No. This is different issue with bug 1503027. This bug is about how errata pkglist are stored in Pulp causing performance and high memory usage.

Comment 27 pulp-infra@redhat.com 2018-05-11 12:32:34 UTC
The Pulp upstream bug status is at POST. Updating the external tracker on this bug.

Comment 29 pulp-infra@redhat.com 2018-05-18 08:03:10 UTC
The Pulp upstream bug status is at MODIFIED. Updating the external tracker on this bug.

Comment 30 pulp-infra@redhat.com 2018-05-18 08:32:56 UTC
All upstream Pulp bugs are at MODIFIED+. Moving this bug to POST.

Comment 32 pulp-infra@redhat.com 2018-06-28 13:14:11 UTC
The Pulp upstream bug status is at MODIFIED. Updating the external tracker on this bug.

Comment 33 pulp-infra@redhat.com 2018-06-28 13:14:19 UTC
The Pulp upstream bug priority is at Normal. Updating the external tracker on this bug.

Comment 34 pulp-infra@redhat.com 2018-07-09 15:05:25 UTC
The Pulp upstream bug status is at CLOSED - CURRENTRELEASE. Updating the external tracker on this bug.

Comment 35 pulp-infra@redhat.com 2018-07-09 15:07:01 UTC
The Pulp upstream bug status is at CLOSED - CURRENTRELEASE. Updating the external tracker on this bug.

Comment 40 jcallaha 2019-02-18 16:08:58 UTC
Verified in Satellite 6.5.0 Snap 15

Steps:
Attached 12 repositories to a host.
Performed an applicability regen on the host, while monitoring celery workers

ps -aux | grep celery

Result:
There was no significant increase in memory consumption from the celery workers.
See attached video for verification.

Comment 41 jcallaha 2019-02-18 16:10:28 UTC
Created attachment 1536025 [details]
verification video

Comment 43 errata-xmlrpc 2019-05-14 12:36:46 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2019:1222


Note You need to log in before you can comment on or make changes to this bug.