Bug 2088620 - pulp workers are idle despite there is one pending task
Summary: pulp workers are idle despite there is one pending task
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Satellite
Classification: Red Hat
Component: Pulp
Version: 6.10.2
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: 6.10.6
Assignee: satellite6-bugs
QA Contact: Lai
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-05-19 20:51 UTC by Brad Buckingham
Modified: 2022-05-31 15:19 UTC (History)
8 users (show)

Fixed In Version: pulpcore-3.14.16-1
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 2050440
Environment:
Last Closed: 2022-05-31 15:19:38 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github pulp pulpcore issues 2506 0 None closed All workers idle while a task is pending 2022-05-19 20:52:00 UTC
Red Hat Product Errata RHBA-2022:4850 0 None None None 2022-05-31 15:19:49 UTC

Comment 3 Lai 2022-05-25 23:03:26 UTC
Steps to retest:

1) Apply a patch to add auxiliary logging and force worker to sleep at the bad place:

--- /usr/lib/python3.6/site-packages/pulpcore/tasking/pulpcore_worker.py.orig	2022-04-11 12:54:11.374669136 +0200
+++ /usr/lib/python3.6/site-packages/pulpcore/tasking/pulpcore_worker.py	2022-04-13 11:39:11.590025573 +0200
@@ -17,6 +17,7 @@ from multiprocessing import Process
 from tempfile import TemporaryDirectory
 
 import django
+import time
 
 os.environ.setdefault("DJANGO_SETTINGS_MODULE", "pulpcore.app.settings")
 django.setup()
@@ -189,6 +190,9 @@ class NewPulpWorker:
             r, w, x = select.select(
                 [self.sentinel, connection.connection], [], [], self.heartbeat_period
             )
+            _logger.info(_("bz2050440: Worker %s artificial sleep before heartbeat"), self.name)
+            time.sleep(2)
+            _logger.info(_("bz2050440: Worker %s artificial slept before heartbeat"), self.name)
             self.beat()
             if connection.connection in r:
                 connection.connection.poll()

(you can make the code changes directly to the "pulpcore_worker.py" file and then run "cat /usr/lib/python3.6/site-packages/pulpcore/tasking/pulpcore_worker.py | patch -p1" to apply the patch)

2) Let just 1 worker running (just to ease the reproducer):

systemctl stop pulpcore-worker@*.service
systemctl start pulpcore-worker

3) "tail -f /var/log/messages" to monitor the "Worker * artificial sle.." logs on another terminal

4) Have prepared a curl command to e.g. create a publication or distribution or delete either of them.

certs="--cacert /etc/pki/katello/certs/katello-server-ca.crt --cert /etc/pki/katello/certs/pulp-client.crt --key /etc/pki/katello/private/pulp-client.key"
hname=$(hostname -f)

# create repo
reponame="repo_bz2050440"
repohref=$(curl -s -X POST -H "Content-Type: application/json" -d "{\"name\":\"$reponame\"}" $certs https://${hname}/pulp/api/v3/repositories/rpm/rpm/ | json_reformat | grep pulp_href | cut -d\" -f4)


5) Once the *first* log "artificial sleep" is printed, immediately fire the API request that will invoke a task.
# prepare below command to fire at step 4
curl -s -H "Accept:application/json" -H "Content-Type: application/json" -X POST -d "{\"repository\": \"${repohref}\"}" $certs https://${hname}/pulp/api/v3/publications/rpm/rpm/ | json_reformat

6) Ensure the task is created while the worker is sleeping, like:

(example)
Apr 13 12:01:42 pmoravec-sat610-bz2050440 pulpcore-worker-1: pulp [None]: pulpcore.tasking.pulpcore_worker:INFO: bz2050440: Worker bz2050440.satotest.gsslab.brq2.redhat.com">28305.gsslab.brq2.redhat.com artificial sleep before heartbeat
Apr 13 12:01:43 pmoravec-sat610-bz2050440 pulpcore-api: pulp [6d4ba28511c34d1db0cc8f7c06c2f69f]:  - - [13/Apr/2022:10:01:43 +0000] "POST /pulp/api/v3/publications/rpm/rpm/ HTTP/1.1" 202 67 "-" "curl/7.29.0"
Apr 13 12:01:44 pmoravec-sat610-bz2050440 pulpcore-worker-1: pulp [None]: pulpcore.tasking.pulpcore_worker:INFO: bz2050440: Worker bz2050440.satotest.gsslab.brq2.redhat.com">28305.gsslab.brq2.redhat.com artificial slept before heartbeat

7) Wait for Godot to have the task executed.

8) check that the task is suddenly executed:

(example)
..
Apr 13 12:03:20 pmoravec-sat610-bz2050440 pulpcore-worker-1: pulp [None]: pulpcore.tasking.pulpcore_worker:INFO: bz2050440: Worker bz2050440.satotest.gsslab.brq2.redhat.com">28305.gsslab.brq2.redhat.com artificial slept before heartbeat
Apr 13 12:03:25 pmoravec-sat610-bz2050440 pulpcore-worker-1: pulp [None]: pulpcore.tasking.pulpcore_worker:INFO: bz2050440: Worker bz2050440.satotest.gsslab.brq2.redhat.com">28305.gsslab.brq2.redhat.com artificial sleep before heartbeat
Apr 13 12:03:27 pmoravec-sat610-bz2050440 pulpcore-worker-1: pulp [None]: pulpcore.tasking.pulpcore_worker:INFO: bz2050440: Worker bz2050440.satotest.gsslab.brq2.redhat.com">28305.gsslab.brq2.redhat.com artificial slept before heartbeat
Apr 13 12:03:27 pmoravec-sat610-bz2050440 pulpcore-worker-1: pulp [6d4ba28511c34d1db0cc8f7c06c2f69f]: pulpcore.tasking.pulpcore_worker:INFO: Starting task b7110104-a52c-49f3-af78-58d47a662634
Apr 13 12:03:27 pmoravec-sat610-bz2050440 pulpcore-worker-1: pulp [6d4ba28511c34d1db0cc8f7c06c2f69f]: pulp_rpm.app.tasks.publishing:INFO: Publishing: repository=repo_bz2050440_ts1649842649_737576601, version=0
Apr 13 12:03:27 pmoravec-sat610-bz2050440 pulpcore-worker-1: pulp [6d4ba28511c34d1db0cc8f7c06c2f69f]: pulp_rpm.app.tasks.publishing:INFO: Publication: 0a35af25-a374-4aa8-912c-bb2d3a4e0e99 created
Apr 13 12:03:27 pmoravec-sat610-bz2050440 pulpcore-worker-1: pulp [6d4ba28511c34d1db0cc8f7c06c2f69f]: pulpcore.tasking.pulpcore_worker:INFO: Task completed b7110104-a52c-49f3-af78-58d47a662634

Optionally:
9) check that the task is executed or not with the following, replacing the "tasks/60aac..." with the pulpcore task that is displayed for you
curl $certs https://${hname}/pulp/api/v3/tasks/60aac23b-29fd-4ced-9c7d-bdce6d4b880c/ | json_reformat


Expected:
The task should executed and completed successfully and should not be idle

Actual:
The task executed and completed successfully and isn't idle.
{
    "pulp_href": "/pulp/api/v3/tasks/60aac23b-29fd-4ced-9c7d-bdce6d4b880c/",
    "pulp_created": "2022-05-25T22:48:49.752292Z",
    "state": "completed",
    "name": "pulp_rpm.app.tasks.publishing.publish",
    "logging_cid": "8c9c54f73fe24061bdea34373363a82b",
    "started_at": "2022-05-25T22:48:50.703106Z",
    "finished_at": "2022-05-25T22:48:50.967578Z",
    "error": null,
    "worker": "/pulp/api/v3/workers/b7008577-4fc2-4213-9653-64d3a11aed8e/",
    "parent_task": null,
    "child_tasks": [

    ],
    "task_group": null,
    "progress_reports": [
        {
            "message": "Generating repository metadata",
            "code": "publish.generating_metadata",
            "state": "completed",
            "total": 1,
            "done": 1,
            "suffix": null
        }
    ],
    "created_resources": [
        "/pulp/api/v3/publications/rpm/rpm/b587447a-4467-4498-9ab5-67dd00b980e4/"
    ],
    "reserved_resources_record": [
        "/pulp/api/v3/repositories/rpm/rpm/e4623177-8cfe-40de-a340-66ed73630f84/"
    ]
}

Verified on 6.10.6 snap 3 with python3-pulpcore-3.14.16-1.el7pc.noarch

This is quite difficult to test.  It worked out fine and I did also check that the codebase is in the build.

Comment 8 errata-xmlrpc 2022-05-31 15:19:38 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Satellite 6.10.6 Async Bug Fix Update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:4850


Note You need to log in before you can comment on or make changes to this bug.