Bug 1590906

Summary: [deadlock] pulp workers appear idle even though many pulp tasks are in 'waiting' status
Product: Red Hat Satellite Reporter: Pavel Moravec <pmoravec>
Component: PulpAssignee: satellite6-bugs <satellite6-bugs>
Status: CLOSED ERRATA QA Contact: Sanket Jagtap <sjagtap>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 6.3.1CC: adprice, ajoseph, andrew.schofield, aperotti, bbuckingham, bkearney, bmbouter, brubisch, cduryee, daniele, daviddavis, dkliban, ggainey, ipanova, jcallaha, jentrena, jjansky, kabbott, ktordeur, mhrivnak, mmccune, pcreech, pdwyer, peter.vreman, pmoravec, rchan, satellite6-bugs, sjagtap, sreber, stbenjam, tbrisker, tstrachota, ttereshc, xdmoon, zhunting
Target Milestone: UnspecifiedKeywords: FieldEngineering, PrioBumpField, Triaged
Target Release: Unused   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: pulp-2.13.4.10-1 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1491032 Environment:
Last Closed: 2018-06-20 17:09:07 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On: 1491032    
Bug Blocks: 1122832    

Comment 2 Pavel Moravec 2018-06-13 15:46:42 UTC
The BZ1491032 was fixed only for 6.2.z but not for 6.3. So Sat 6.3.[0-1] is still affected by https://pulp.plan.io/issues/2979 .

Please backport it in a z-stream.

Comment 3 pulp-infra@redhat.com 2018-06-13 20:39:11 UTC
The Pulp upstream bug status is at CLOSED - CURRENTRELEASE. Updating the external tracker on this bug.

Comment 4 pulp-infra@redhat.com 2018-06-13 20:39:19 UTC
The Pulp upstream bug priority is at Normal. Updating the external tracker on this bug.

Comment 5 pulp-infra@redhat.com 2018-06-13 20:47:53 UTC
All upstream Pulp bugs are at MODIFIED+. Moving this bug to POST.

Comment 14 Sanket Jagtap 2018-06-20 09:30:41 UTC
Build: Satellite 6.3.2.1 snap1


Verified using steps from https://bugzilla.redhat.com/show_bug.cgi?id=1491032#c27

1) PULP_MAX_TASKS_PER_CHILD is uncommented in /etc/default/pulp_workers

2) diff /usr/lib64/python2.7/site-packages/pymongo/pool.py  /usr/lib64/python2.7/site-packages/pymongo/pool.py.old 
19c19
< import time
---
> 
567d566
< 		    time.sleep(.1)

3) enqueue(){ celery --app=pulp.server.async.app call --exchange=C.dq --routing-key=reserved_resource_worker-2@<sat-host> pulp.server.async.tasks._release_resource '--args=["test"]'; }; while true; do for i in $(seq 1 5); do for j in $(seq 1 20); do enqueue & done; sleep 1; done; wait; done

4)  Monitoring journalctl -f | grep 'succeeded in'

Logs:
....
....
Jun 20 03:02:57 qe-sat6-feature-rhel7.satqe.lab.eng.rdu2.redhat.com pulp[7605]: celery.worker.job:INFO: Task pulp.server.async.tasks._release_resource[0f1f9436-12d7-46ff-ad76-bb9aa2cd84f8] succeeded in 0.788953678s: None
Jun 20 03:02:58 qe-sat6-feature-rhel7.satqe.lab.eng.rdu2.redhat.com pulp[7605]: celery.worker.job:INFO: Task pulp.server.async.tasks._release_resource[9f7de4b4-fc41-4bda-9bf5-01be76f38c50] succeeded in 0.221101978001s: None
Jun 20 03:03:01 qe-sat6-feature-rhel7.satqe.lab.eng.rdu2.redhat.com pulp[7605]: celery.worker.job:INFO: Task pulp.server.async.tasks._release_resource[3296028c-ffb0-43e1-97c5-429a077bd7c0] succeeded in 1.214786596s: None
Jun 20 03:03:02 qe-sat6-feature-rhel7.satqe.lab.eng.rdu2.redhat.com pulp[7605]: celery.worker.job:INFO: Task pulp.server.async.tasks._release_resource[e852fed0-ac2c-4e37-bfc6-9f60c15d729c] succeeded in 0.391580195999s: None
Jun 20 03:03:06 qe-sat6-feature-rhel7.satqe.lab.eng.rdu2.redhat.com pulp[7605]: celery.worker.job:INFO: Task pulp.server.async.tasks._release_resource[770ef9bd-311a-442f-ae73-fced0a35c8c7] succeeded in 1.352345535s: None
Jun 20 03:03:06 qe-sat6-feature-rhel7.satqe.lab.eng.rdu2.redhat.com pulp[7605]: celery.worker.job:INFO: Task pulp.server.async.tasks._release_resource[eadd0ab4-4e24-40e5-bb25-e1d861de6ee9] succeeded in 0.440672064999s: None
...
....
....

Jun 20 05:29:20 qe-sat6-feature-rhel7.satqe.lab.eng.rdu2.redhat.com pulp[7605]: celery.worker.job:INFO: Task pulp.server.async.tasks._release_resource[adf36f9e-b3ae-48ac-b360-0ece90f6417f] succeeded in 0.322862540001s: None
Jun 20 05:29:20 qe-sat6-feature-rhel7.satqe.lab.eng.rdu2.redhat.com pulp[7605]: celery.worker.job:INFO: Task pulp.server.async.tasks._release_resource[7ed37e24-fd51-4de8-a23d-35eff7e17508] succeeded in 0.204538307s: None
Jun 20 05:29:22 qe-sat6-feature-rhel7.satqe.lab.eng.rdu2.redhat.com pulp[7605]: celery.worker.job:INFO: Task pulp.server.async.tasks._release_resource[6751f8dc-acb1-4dd5-be10-90f6f749e500] succeeded in 0.322744957997s: None
Jun 20 05:29:22 qe-sat6-feature-rhel7.satqe.lab.eng.rdu2.redhat.com pulp[7605]: celery.worker.job:INFO: Task pulp.server.async.tasks._release_resource[3d21f636-7d58-46e4-b6ad-d17f9582573e] succeeded in 0.204259102s: None


Ran it for apporx 3hrs , no deadlock encountered

Comment 16 errata-xmlrpc 2018-06-20 17:09:07 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:1956