Bug 1208615

Summary: Content view publish randomly fails with 404 on fetching task
Product: Red Hat Satellite Reporter: Justin Sherrill <jsherril>
Component: PulpAssignee: Partha Aji <paji>
Status: CLOSED CURRENTRELEASE QA Contact: Tazim Kolhar <tkolhar>
Severity: high Docs Contact:
Priority: unspecified    
Version: 6.1.0CC: bbuckingham, bmbouter, cwelton, daviddavis, dkliban, ehelms, ggainey, ipanova, jmatthew, jsherril, mhrivnak, mmccune, omaciel, paji, pcreech, rchan, tkolhar, tsanders, ttereshc
Target Milestone: UnspecifiedKeywords: Triaged
Target Release: Unused   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-08-12 13:55:30 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1212602    
Attachments:
Description Flags
patch for satellite 6.1 none

Description Justin Sherrill 2015-04-02 16:56:21 UTC
Description of problem:

When publishing a content view we generate a lot of tasks in pulp and then wait on them.  For some reason on occasion we initiate the task and as soon as we request the task to poll its status pulp throws a 404.

Requests for the same task later on work fine.

Version-Release number of selected component (if applicable):
2.6.0-1.1.beta

How reproducible:
Randomly

Steps to Reproduce:
Will work on reproducer script, unclear how to reproduce currently.


Pulp logs:

Apr  1 20:52:14 qe-blade-16 pulp: pulp.server.webservices.middleware.exception:INFO: Missing resource(s): resource_id=6f29bc57-55f0-49eb-97ee-1978407be409
Apr  1 20:52:14 qe-blade-16 pulp: celery.worker.strategy:INFO: Received task: pulp.server.managers.repo.unit_association.associate_from_repo[6f29bc57-55f0-49eb-97ee-1978407be409]
Apr  1 20:52:14 qe-blade-16 pulp: celery.worker.strategy:INFO: Received task: pulp.server.async.tasks._release_resource[7b3ce53b-d3ae-4101-bbd3-15776e975d52]
Apr  1 20:52:14 qe-blade-16 pulp: celery.worker.job:INFO: Task pulp.server.async.tasks._queue_reserved_task[01c536c8-f0cf-444c-a799-c65a5f5413da] succeeded in 0.702265332002s: 


apache logs:


10.16.76.78 - - [01/Apr/2015:20:52:13 -0400] "POST /pulp/api/v2/repositories/Default_Organization-rhel-7-cv-1_0-Satellite_Tools_Packages-RHEL_7/actions/associate/ HTTP/1.1" 202 172 "-" "Ruby"
10.16.76.78 - - [01/Apr/2015:20:52:14 -0400] "GET /pulp/api/v2/tasks/6f29bc57-55f0-49eb-97ee-1978407be409/ HTTP/1.1" 404 533 "-" "Ruby"

Comment 2 Justin Sherrill 2015-04-02 17:26:15 UTC
Ruby traceback showing the 404 comes from polling a task:


- /opt/rh/ruby193/root/usr/share/gems/gems/rest-client-1.6.7/lib/restclient/abstract_response.rb:48:in
  `return!'
- /opt/rh/ruby193/root/usr/share/gems/gems/runcible-1.3.0/lib/runcible/base.rb:92:in
  `block in get_response'
- /opt/rh/ruby193/root/usr/share/gems/gems/rest-client-1.6.7/lib/restclient/request.rb:228:in
  `call'
- /opt/rh/ruby193/root/usr/share/gems/gems/rest-client-1.6.7/lib/restclient/request.rb:228:in
  `process_result'
- /opt/rh/ruby193/root/usr/share/gems/gems/rbovirt-0.0.29/lib/restclient_ext/request.rb:50:in
  `block in transmit'
- /opt/rh/ruby193/root/usr/share/ruby/net/http.rb:746:in `start'
- /opt/rh/ruby193/root/usr/share/gems/gems/rbovirt-0.0.29/lib/restclient_ext/request.rb:44:in
  `transmit'
- /opt/rh/ruby193/root/usr/share/gems/gems/rest-client-1.6.7/lib/restclient/request.rb:64:in
  `execute'
- /opt/rh/ruby193/root/usr/share/gems/gems/rest-client-1.6.7/lib/restclient/request.rb:33:in
  `execute'
- /opt/rh/ruby193/root/usr/share/gems/gems/rest-client-1.6.7/lib/restclient/resource.rb:51:in
  `get'
- /opt/rh/ruby193/root/usr/share/gems/gems/runcible-1.3.0/lib/runcible/base.rb:91:in
  `get_response'
- /opt/rh/ruby193/root/usr/share/gems/gems/runcible-1.3.0/lib/runcible/base.rb:82:in
  `call'
- /opt/rh/ruby193/root/usr/share/gems/gems/runcible-1.3.0/lib/runcible/resources/task.rb:41:in
  `poll'

Comment 3 Justin Sherrill 2015-04-02 17:26:36 UTC
Workaround:

Go to the failed task under "Monitor > Tasks" 

and click 'resume'

Comment 4 Justin Sherrill 2015-04-23 18:06:15 UTC
Moving to need_info, as this appears to be a pulp issue (possible race condition)

Comment 5 Justin Sherrill 2015-04-23 18:08:08 UTC
And to note that RHCI can reproduce it pretty easily.

Comment 6 Michael Hrivnak 2015-04-23 19:23:14 UTC
Does this only happen for the unit association task? Or does it ever happen for other tasks?

This is a mysterious problem. The code that creates the task record in the database saves it, and then reads it back from the DB, before returning anything. So the record should be available in mongo before the task ID is even returned to the client.

https://github.com/pulp/pulp/blob/pulp-2.6.0-1/server/pulp/server/async/task_status_manager.py#L54

I'm not sure what to make of this. What you're seeing should theoretically be impossible, yet there it is. :)

Comment 7 Brad Buckingham 2015-04-29 12:25:03 UTC
I am not sure if the needinfo is for someone in particular; however, when I have seen this issue it is typically on the task that is part of either a copy or remove (i.e. associate or unassociate) and it is most when the unit is for rpm or errata.

For RHCI, I most frequently see the error during a 're-deploy' which means it can be publishing a new version of an existing content view and then promoting to an environment.

Comment 8 pulp-infra@redhat.com 2015-05-11 17:30:17 UTC
The Pulp upstream bug status is at NEW. Updating the external tracker on this bug.

Comment 9 pulp-infra@redhat.com 2015-05-11 17:30:19 UTC
The Pulp upstream bug priority is at High. Updating the external tracker on this bug.

Comment 10 Eric Helms 2015-05-19 21:23:28 UTC
We have an upstream Katello issue that reflects this same issue (http://projects.theforeman.org/issues/10107). The user who reported seeing it in multiple scenarios tested this change (https://github.com/pulp/pulp/pull/1864) in their local environment and has reported no issues since applying that patch.

Comment 11 pulp-infra@redhat.com 2015-05-22 16:00:18 UTC
The Pulp upstream bug status is at ASSIGNED. Updating the external tracker on this bug.

Comment 12 Dennis Kliban 2015-06-03 13:06:51 UTC
Created attachment 1034314 [details]
patch for satellite 6.1

Comment 15 Tazim Kolhar 2015-06-11 10:22:16 UTC
Hi,

please provide verfication steps

thanks

Comment 16 Justin Sherrill 2015-06-11 13:27:22 UTC
There really isn't a simple set of verification steps because it would randomly occur.  

The only thing you can really do is attempt to publish a content view with lots of repositories full of content.  The repositories there are the more likely to reproduce.  Try publishing that content view a few times and if this issue does not occur I guess mark as verified.

Comment 17 Tazim Kolhar 2015-06-12 09:36:00 UTC
VERIFIED:

 rpm -qa | grep foreman
ruby193-rubygem-foreman_discovery-2.0.0.15-1.el7sat.noarch
foreman-libvirt-1.7.2.27-1.el7sat.noarch
ruby193-rubygem-foreman_gutterball-0.0.1.9-1.el7sat.noarch
ruby193-rubygem-foreman_docker-1.2.0.14-1.el7sat.noarch
rubygem-hammer_cli_foreman_discovery-0.0.1.10-1.el7sat.noarch
foreman-selinux-1.7.2.13-1.el7sat.noarch
dell-pe1955-02.rhts.eng.bos.redhat.com-foreman-proxy-1.0-2.noarch
foreman-compute-1.7.2.27-1.el7sat.noarch
foreman-gce-1.7.2.27-1.el7sat.noarch
ruby193-rubygem-foreman-redhat_access-0.2.0-8.el7sat.noarch
rubygem-hammer_cli_foreman-0.1.4.14-1.el7sat.noarch
foreman-debug-1.7.2.27-1.el7sat.noarch
foreman-vmware-1.7.2.27-1.el7sat.noarch
ruby193-rubygem-foreman-tasks-0.6.12.8-1.el7sat.noarch
rubygem-hammer_cli_foreman_tasks-0.0.3.4-1.el7sat.noarch
rubygem-hammer_cli_foreman_docker-0.0.3.7-1.el7sat.noarch
foreman-proxy-1.7.2.5-1.el7sat.noarch
dell-pe1955-02.rhts.eng.bos.redhat.com-foreman-client-1.0-1.noarch
ruby193-rubygem-foreman_bootdisk-4.0.2.13-1.el7sat.noarch
dell-pe1955-02.rhts.eng.bos.redhat.com-foreman-proxy-client-1.0-1.noarch
foreman-ovirt-1.7.2.27-1.el7sat.noarch
rubygem-hammer_cli_foreman_bootdisk-0.1.2.7-1.el7sat.noarch
foreman-1.7.2.27-1.el7sat.noarch
ruby193-rubygem-foreman_hooks-0.3.7-2.el7sat.noarch
foreman-postgresql-1.7.2.27-1.el7sat.noarch

steps:
 Try publishing that content view a few times 
No issues occur

Comment 18 Bryan Kearney 2015-08-11 13:18:39 UTC
This bug is slated to be released with Satellite 6.1.

Comment 19 Bryan Kearney 2015-08-12 13:55:30 UTC
This bug was fixed in version 6.1.1 of Satellite which was released on 12 August, 2015.

Comment 20 pulp-infra@redhat.com 2015-10-07 20:12:01 UTC
The Pulp upstream bug status is at MODIFIED. Updating the external tracker on this bug.

Comment 21 pulp-infra@redhat.com 2015-10-12 16:00:38 UTC
The Pulp upstream bug status is at ON_QA. Updating the external tracker on this bug.

Comment 22 Dennis Kliban 2015-10-12 16:38:01 UTC
Originally, Satellite received a custom patch to fix this issue, but the patch was not part of Pulp 2.6.x stream. Pulp 2.7.0 will contain a more generic fix that addresses this issue.

Comment 23 pulp-infra@redhat.com 2015-11-06 20:00:41 UTC
The Pulp upstream bug status is at CLOSED - CURRENTRELEASE. Updating the external tracker on this bug.