Bug 1590906 - [deadlock] pulp workers appear idle even though many pulp tasks are in 'waiting' status
Summary: [deadlock] pulp workers appear idle even though many pulp tasks are in 'waiti...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Satellite
Classification: Red Hat
Component: Pulp
Version: 6.3.1
Hardware: x86_64
OS: Linux
urgent
urgent
Target Milestone: Unspecified
Assignee: satellite6-bugs
QA Contact: Sanket Jagtap
URL:
Whiteboard:
Depends On: 1491032
Blocks: 1122832
TreeView+ depends on / blocked
 
Reported: 2018-06-13 15:44 UTC by Pavel Moravec
Modified: 2021-04-06 17:51 UTC (History)
35 users (show)

Fixed In Version: pulp-2.13.4.10-1
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1491032
Environment:
Last Closed: 2018-06-20 17:09:07 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Pulp Redmine 2979 0 Normal CLOSED - CURRENTRELEASE Celery workers may deadlock when PULP_MAX_TASKS_PER_CHILD is used 2018-06-13 20:39:10 UTC
Red Hat Knowledge Base (Solution) 3201502 0 None None None 2018-06-13 15:44:14 UTC
Red Hat Product Errata RHBA-2018:1956 0 None None None 2018-06-20 17:09:17 UTC

Comment 2 Pavel Moravec 2018-06-13 15:46:42 UTC
The BZ1491032 was fixed only for 6.2.z but not for 6.3. So Sat 6.3.[0-1] is still affected by https://pulp.plan.io/issues/2979 .

Please backport it in a z-stream.

Comment 3 pulp-infra@redhat.com 2018-06-13 20:39:11 UTC
The Pulp upstream bug status is at CLOSED - CURRENTRELEASE. Updating the external tracker on this bug.

Comment 4 pulp-infra@redhat.com 2018-06-13 20:39:19 UTC
The Pulp upstream bug priority is at Normal. Updating the external tracker on this bug.

Comment 5 pulp-infra@redhat.com 2018-06-13 20:47:53 UTC
All upstream Pulp bugs are at MODIFIED+. Moving this bug to POST.

Comment 14 Sanket Jagtap 2018-06-20 09:30:41 UTC
Build: Satellite 6.3.2.1 snap1


Verified using steps from https://bugzilla.redhat.com/show_bug.cgi?id=1491032#c27

1) PULP_MAX_TASKS_PER_CHILD is uncommented in /etc/default/pulp_workers

2) diff /usr/lib64/python2.7/site-packages/pymongo/pool.py  /usr/lib64/python2.7/site-packages/pymongo/pool.py.old 
19c19
< import time
---
> 
567d566
< 		    time.sleep(.1)

3) enqueue(){ celery --app=pulp.server.async.app call --exchange=C.dq --routing-key=reserved_resource_worker-2@<sat-host> pulp.server.async.tasks._release_resource '--args=["test"]'; }; while true; do for i in $(seq 1 5); do for j in $(seq 1 20); do enqueue & done; sleep 1; done; wait; done

4)  Monitoring journalctl -f | grep 'succeeded in'

Logs:
....
....
Jun 20 03:02:57 qe-sat6-feature-rhel7.satqe.lab.eng.rdu2.redhat.com pulp[7605]: celery.worker.job:INFO: Task pulp.server.async.tasks._release_resource[0f1f9436-12d7-46ff-ad76-bb9aa2cd84f8] succeeded in 0.788953678s: None
Jun 20 03:02:58 qe-sat6-feature-rhel7.satqe.lab.eng.rdu2.redhat.com pulp[7605]: celery.worker.job:INFO: Task pulp.server.async.tasks._release_resource[9f7de4b4-fc41-4bda-9bf5-01be76f38c50] succeeded in 0.221101978001s: None
Jun 20 03:03:01 qe-sat6-feature-rhel7.satqe.lab.eng.rdu2.redhat.com pulp[7605]: celery.worker.job:INFO: Task pulp.server.async.tasks._release_resource[3296028c-ffb0-43e1-97c5-429a077bd7c0] succeeded in 1.214786596s: None
Jun 20 03:03:02 qe-sat6-feature-rhel7.satqe.lab.eng.rdu2.redhat.com pulp[7605]: celery.worker.job:INFO: Task pulp.server.async.tasks._release_resource[e852fed0-ac2c-4e37-bfc6-9f60c15d729c] succeeded in 0.391580195999s: None
Jun 20 03:03:06 qe-sat6-feature-rhel7.satqe.lab.eng.rdu2.redhat.com pulp[7605]: celery.worker.job:INFO: Task pulp.server.async.tasks._release_resource[770ef9bd-311a-442f-ae73-fced0a35c8c7] succeeded in 1.352345535s: None
Jun 20 03:03:06 qe-sat6-feature-rhel7.satqe.lab.eng.rdu2.redhat.com pulp[7605]: celery.worker.job:INFO: Task pulp.server.async.tasks._release_resource[eadd0ab4-4e24-40e5-bb25-e1d861de6ee9] succeeded in 0.440672064999s: None
...
....
....

Jun 20 05:29:20 qe-sat6-feature-rhel7.satqe.lab.eng.rdu2.redhat.com pulp[7605]: celery.worker.job:INFO: Task pulp.server.async.tasks._release_resource[adf36f9e-b3ae-48ac-b360-0ece90f6417f] succeeded in 0.322862540001s: None
Jun 20 05:29:20 qe-sat6-feature-rhel7.satqe.lab.eng.rdu2.redhat.com pulp[7605]: celery.worker.job:INFO: Task pulp.server.async.tasks._release_resource[7ed37e24-fd51-4de8-a23d-35eff7e17508] succeeded in 0.204538307s: None
Jun 20 05:29:22 qe-sat6-feature-rhel7.satqe.lab.eng.rdu2.redhat.com pulp[7605]: celery.worker.job:INFO: Task pulp.server.async.tasks._release_resource[6751f8dc-acb1-4dd5-be10-90f6f749e500] succeeded in 0.322744957997s: None
Jun 20 05:29:22 qe-sat6-feature-rhel7.satqe.lab.eng.rdu2.redhat.com pulp[7605]: celery.worker.job:INFO: Task pulp.server.async.tasks._release_resource[3d21f636-7d58-46e4-b6ad-d17f9582573e] succeeded in 0.204259102s: None


Ran it for apporx 3hrs , no deadlock encountered

Comment 16 errata-xmlrpc 2018-06-20 17:09:07 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:1956


Note You need to log in before you can comment on or make changes to this bug.