Bug 1127298 - Syncing repo with lots of errors results in non-cancable task and stuck worker
Summary: Syncing repo with lots of errors results in non-cancable task and stuck worker
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Pulp
Classification: Retired
Component: rpm-support
Version: 2.4 Beta
Hardware: Unspecified
OS: Unspecified
medium
high
Target Milestone: ---
: 2.4.1
Assignee: Barnaby Court
QA Contact: Preethi Thomas
URL:
Whiteboard:
Depends On:
Blocks: 1131719 1132138
TreeView+ depends on / blocked
 
Reported: 2014-08-06 14:54 UTC by Justin Sherrill
Modified: 2014-09-23 17:54 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1132138 (view as bug list)
Environment:
Last Closed: 2014-09-23 17:54:22 UTC
Embargoed:


Attachments (Terms of Use)

Description Justin Sherrill 2014-08-06 14:54:41 UTC
Description of problem:

Syncing a repo that has lots of missing rpms such as:

http://japan.proximity.on.ca/kojifiles/repos/f20-build/latest/armv6hl/

(which is missing all 30,000 of its rpms) results in a sync that can't be cancelled and workers that cant be restarted.

Cancelling the task seems to do nothing, and restarting the pulp_workers shuts down all but one worker and then hangs waiting for the last worker to die.

Version-Release number of selected component (if applicable):

2.4.0-0.29

How reproducible:
Always

Steps to Reproduce:
1.  Create a repo with this feed  http://japan.proximity.on.ca/kojifiles/repos/f20-build/latest/armv6hl/
2.  Initiate the sync of the repo
3.  Attempt to cancel the repo
4.  attempt to restart pulp_workers

Actual results:

Task cannot be cancelled.  
Restarting pulp_workers hang when trying to shut down one of the workers


Expected results:

Task can be cancelled, pulp_worker restarts fine

Comment 1 Barnaby Court 2014-08-12 19:32:35 UTC
This appears to be working with the 2.4 release, there is an extended delay after the task has been marked as canceled where we are still waiting for the worker to die.  Eventually the log file is showing a traceback for a CancelException (used internally by the pulp_rpm sync code when a cancel occurs) and things restart properly.  The delay was on the order of 1-2 minutes.

Comment 2 Barnaby Court 2014-08-14 19:36:02 UTC
I did find an issue after the rpms start downloading.  The linked PR is for that item.  After it was fixed I have been unable to get the cancel to hang.  

PR: https://github.com/pulp/pulp_rpm/pull/545

Comment 3 Randy Barlow 2014-08-20 18:50:22 UTC
Barnaby reported that this seems to work in 2.4.0-1. The fix he applied in comment 2 was for development work that happened after 2.4.0-1, so I cannot cherry pick it into the new 2.4.1 we are building.

I'm moving this back to assigned so we can determine if we need to do anything for 2.4.1 or not. Justin, can you confirm whether this is an issue with 2.4.0-1?

Comment 4 Barnaby Court 2014-08-21 18:36:35 UTC
Found nectar bug in 2.4.0-01: Fix is included in PR: https://github.com/pulp/pulp/pull/1120

Comment 5 Randy Barlow 2014-08-23 03:24:53 UTC
This was fixed in pulp-2.4.1-0.3.beta.

Comment 6 Preethi Thomas 2014-08-26 03:00:19 UTC
verified.

[root@cloud-qe-13 ~]# rpm -qa pulp-server
pulp-server-2.4.1-0.3.beta.el6.noarch
[root@cloud-qe-13 ~]# 
[root@ibm-x3550m3-12 ~]# rpm -qa pulp-server
pulp-server-2.4.1-0.3.beta.el7.noarch
[root@ibm-x3550m3-12 ~]# 


Tried multiple times to cancel task and also pulp_workers restart

Comment 7 Randy Barlow 2014-09-23 17:54:22 UTC
This is fixed in Pulp-2.4.1-1.


Note You need to log in before you can comment on or make changes to this bug.