Bug 1127298

Summary: Syncing repo with lots of errors results in non-cancable task and stuck worker
Product: [Retired] Pulp Reporter: Justin Sherrill <jsherril>
Component: rpm-supportAssignee: Barnaby Court <bcourt>
Status: CLOSED CURRENTRELEASE QA Contact: Preethi Thomas <pthomas>
Severity: high Docs Contact:
Priority: medium    
Version: 2.4 BetaCC: bcourt, jsherril, mhrivnak, pthomas, rbarlow
Target Milestone: ---Keywords: Triaged
Target Release: 2.4.1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1132138 (view as bug list) Environment:
Last Closed: 2014-09-23 17:54:22 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1131719, 1132138    

Description Justin Sherrill 2014-08-06 14:54:41 UTC
Description of problem:

Syncing a repo that has lots of missing rpms such as:

http://japan.proximity.on.ca/kojifiles/repos/f20-build/latest/armv6hl/

(which is missing all 30,000 of its rpms) results in a sync that can't be cancelled and workers that cant be restarted.

Cancelling the task seems to do nothing, and restarting the pulp_workers shuts down all but one worker and then hangs waiting for the last worker to die.

Version-Release number of selected component (if applicable):

2.4.0-0.29

How reproducible:
Always

Steps to Reproduce:
1.  Create a repo with this feed  http://japan.proximity.on.ca/kojifiles/repos/f20-build/latest/armv6hl/
2.  Initiate the sync of the repo
3.  Attempt to cancel the repo
4.  attempt to restart pulp_workers

Actual results:

Task cannot be cancelled.  
Restarting pulp_workers hang when trying to shut down one of the workers


Expected results:

Task can be cancelled, pulp_worker restarts fine

Comment 1 Barnaby Court 2014-08-12 19:32:35 UTC
This appears to be working with the 2.4 release, there is an extended delay after the task has been marked as canceled where we are still waiting for the worker to die.  Eventually the log file is showing a traceback for a CancelException (used internally by the pulp_rpm sync code when a cancel occurs) and things restart properly.  The delay was on the order of 1-2 minutes.

Comment 2 Barnaby Court 2014-08-14 19:36:02 UTC
I did find an issue after the rpms start downloading.  The linked PR is for that item.  After it was fixed I have been unable to get the cancel to hang.  

PR: https://github.com/pulp/pulp_rpm/pull/545

Comment 3 Randy Barlow 2014-08-20 18:50:22 UTC
Barnaby reported that this seems to work in 2.4.0-1. The fix he applied in comment 2 was for development work that happened after 2.4.0-1, so I cannot cherry pick it into the new 2.4.1 we are building.

I'm moving this back to assigned so we can determine if we need to do anything for 2.4.1 or not. Justin, can you confirm whether this is an issue with 2.4.0-1?

Comment 4 Barnaby Court 2014-08-21 18:36:35 UTC
Found nectar bug in 2.4.0-01: Fix is included in PR: https://github.com/pulp/pulp/pull/1120

Comment 5 Randy Barlow 2014-08-23 03:24:53 UTC
This was fixed in pulp-2.4.1-0.3.beta.

Comment 6 Preethi Thomas 2014-08-26 03:00:19 UTC
verified.

[root@cloud-qe-13 ~]# rpm -qa pulp-server
pulp-server-2.4.1-0.3.beta.el6.noarch
[root@cloud-qe-13 ~]# 
[root@ibm-x3550m3-12 ~]# rpm -qa pulp-server
pulp-server-2.4.1-0.3.beta.el7.noarch
[root@ibm-x3550m3-12 ~]# 


Tried multiple times to cancel task and also pulp_workers restart

Comment 7 Randy Barlow 2014-09-23 17:54:22 UTC
This is fixed in Pulp-2.4.1-1.