Description of problem:
Syncing a repo that has lots of missing rpms such as:
(which is missing all 30,000 of its rpms) results in a sync that can't be cancelled and workers that cant be restarted.
Cancelling the task seems to do nothing, and restarting the pulp_workers shuts down all but one worker and then hangs waiting for the last worker to die.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Create a repo with this feed http://japan.proximity.on.ca/kojifiles/repos/f20-build/latest/armv6hl/
2. Initiate the sync of the repo
3. Attempt to cancel the repo
4. attempt to restart pulp_workers
Task cannot be cancelled.
Restarting pulp_workers hang when trying to shut down one of the workers
Task can be cancelled, pulp_worker restarts fine
This appears to be working with the 2.4 release, there is an extended delay after the task has been marked as canceled where we are still waiting for the worker to die. Eventually the log file is showing a traceback for a CancelException (used internally by the pulp_rpm sync code when a cancel occurs) and things restart properly. The delay was on the order of 1-2 minutes.
I did find an issue after the rpms start downloading. The linked PR is for that item. After it was fixed I have been unable to get the cancel to hang.
Barnaby reported that this seems to work in 2.4.0-1. The fix he applied in comment 2 was for development work that happened after 2.4.0-1, so I cannot cherry pick it into the new 2.4.1 we are building.
I'm moving this back to assigned so we can determine if we need to do anything for 2.4.1 or not. Justin, can you confirm whether this is an issue with 2.4.0-1?
Found nectar bug in 2.4.0-01: Fix is included in PR: https://github.com/pulp/pulp/pull/1120
This was fixed in pulp-2.4.1-0.3.beta.
[root@cloud-qe-13 ~]# rpm -qa pulp-server
[root@ibm-x3550m3-12 ~]# rpm -qa pulp-server
Tried multiple times to cancel task and also pulp_workers restart
This is fixed in Pulp-2.4.1-1.