Bug 1134594 - Trying to add a repo with a missing/unresponsive capsule hangs indefinitely/for a very long time.
Summary: Trying to add a repo with a missing/unresponsive capsule hangs indefinitely/f...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Satellite
Classification: Red Hat
Component: Content Management
Version: 6.0.3
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: Unspecified
Assignee: Justin Sherrill
QA Contact: Tazim Kolhar
URL: http://projects.theforeman.org/issues...
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-08-27 21:25 UTC by Corey Welton
Modified: 2019-09-12 07:58 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-08-12 14:02:12 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Corey Welton 2014-08-27 21:25:20 UTC
Description of problem:
When trying to create a product repo hangs indefinitely when a capsule disappears/is non-responsive.  Removing it from Content Hosts/Hosts/Capsules does not seem to resolve it.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.  Create and sync a capsule -- generally assure it is working.
2.  Take it down -- wiping the box would work, but I suspect simply removing network connectivity would work. Note: we're emulating a severe issue here, don't remove it from Content Hosts/Hosts/Capsules (yet).
3.  Create a product, "foobar"
4.  Attempt to add rpm repo to product "foobar"
5.  Wait.... and wait.... and wait.

Actual results:

Create repository 'foobar'; product 'foobar-repo'; organization 'Default_Organization'	running	pending

Relevant lines seen in dynflow console
21: Actions::Pulp::Consumer::SyncNode (suspended) [ 1603.83s / 5.25s ]  Cancel
26: Actions::Pulp::Repository::DistributorPublish (pending)

If you attempt to try and resolve it by deleting content host/host/capsule, it still hangs.  It is possible it will eventually timeout after a very long time in pulp, but that timeframe is presently unknown

It is also presently unknown whether restarting any services will get us out of this wedged state.

Expected results:

A reasonable timeout?
Something that doesn't hang when a capsule is forcibly removed from network?
....?

Additional info:

Comment 1 Corey Welton 2014-08-27 21:31:26 UTC
More details for 

21: Actions::Pulp::Consumer::SyncNode (suspended) [ 1603.83s / 5.25s ]  Cancel


Started at: 2014-08-27 20:55:16 UTC

Ended at: 2014-08-27 21:29:14 UTC

Real time: 2037.27s

Execution time (excluding suspended state): 6.48s

Input:

---
consumer_uuid: 869aac38-e4cb-4583-9ef2-ba138debd192
skip_content: true
remote_user: admin-f87eddb1
locale: en
Output:

---
pulp_tasks:
- exception: 
  task_type: 
  _href: /pulp/api/v2/tasks/0738bc06-3c21-4527-8ebc-ad621617fd2d/
  task_id: 0738bc06-3c21-4527-8ebc-ad621617fd2d
  tags:
  - pulp:consumer:869aac38-e4cb-4583-9ef2-ba138debd192
  - pulp:action:unit_update
  finish_time: 
  _ns: task_status
  start_time: 
  traceback: 
  spawned_tasks: []
  progress_report: {}
  queue: agent
  state: waiting
  result: 
  error: 
  _id:
    $oid: 53fe45b4f1cfaa479b5d51a7
  id: 53fe45b4f1cfaa479b5d51a7
poll_attempts:
  total: 147
  failed: 0


More details for 

26: Actions::Pulp::Repository::DistributorPublish (pending)

Started at:

Ended at:

Real time: 0.00s

Execution time (excluding suspended state): 0.00s

Input:

---
pulp_id: Default_Organization-sat6-sat6
distributor_type_id: yum_distributor
source_pulp_id: 
dependency: 
remote_user: admin-f87eddb1
locale: en
Output:

--- {}

Comment 3 Corey Welton 2014-08-27 21:35:00 UTC
Hitting Cancel in dynflow for step 21 gets us out of the bind.  I still think this is pretty gross though.

Comment 4 Mike McCune 2014-08-28 01:18:24 UTC
DOCS:

Users should note that in cases where the Satellite and Capsule are geographically far apart with low or slow bandwidth between the systems it is possible that creating a custom repository may take a significant amount of time.

Comment 5 Corey Welton 2014-08-28 13:42:40 UTC
Additional DOC:

If a capsule is physically removed or unresponsive, outside of Satellite 6 itself, creation of repos may hang indefinitely. The solution for this is to resolve the situation with the unresponsive capsule, removing it from the list of available capsules, if necessary.  After this, use dynflow to cancel the process.

Comment 10 Steve Loranz 2015-04-02 15:15:57 UTC
Created redmine issue http://projects.theforeman.org/issues/10009 from this bug

Comment 11 Justin Sherrill 2015-05-04 16:50:44 UTC
This was actually fixed as part of http://projects.theforeman.org/issues/10229 and https://bugzilla.redhat.com/show_bug.cgi?id=1192500

Moving to modified.

Comment 12 Tazim Kolhar 2015-05-14 11:09:58 UTC
VERIFIED:
# rpm -qa | grep foreman
foreman-1.7.2.21-1.el7sat.noarch
ruby193-rubygem-foreman_discovery-2.0.0.13-1.el7sat.noarch
foreman-libvirt-1.7.2.21-1.el7sat.noarch
ruby193-rubygem-foreman_gutterball-0.0.1.9-1.el7sat.noarch
foreman-postgresql-1.7.2.21-1.el7sat.noarch
ruby193-rubygem-foreman_bootdisk-4.0.2.13-1.el7sat.noarch
dell-pem710-01.rhts.eng.bos.redhat.com-foreman-proxy-client-1.0-1.noarch
foreman-ovirt-1.7.2.21-1.el7sat.noarch
rubygem-hammer_cli_foreman-0.1.4.11-1.el7sat.noarch
foreman-selinux-1.7.2.13-1.el7sat.noarch
foreman-gce-1.7.2.21-1.el7sat.noarch
ruby193-rubygem-foreman-redhat_access-0.1.0-1.el7sat.noarch
ruby193-rubygem-foreman-tasks-0.6.12.5-1.el7sat.noarch
rubygem-hammer_cli_foreman_tasks-0.0.3.4-1.el7sat.noarch
rubygem-hammer_cli_foreman_docker-0.0.3.6-1.el7sat.noarch
ruby193-rubygem-foreman_docker-1.2.0.12-1.el7sat.noarch
ruby193-rubygem-foreman_hooks-0.3.7-2.el7sat.noarch
rubygem-hammer_cli_foreman_bootdisk-0.1.2.7-1.el7sat.noarch
foreman-proxy-1.7.2.4-1.el7sat.noarch
dell-pem710-01.rhts.eng.bos.redhat.com-foreman-client-1.0-1.noarch
dell-pem710-01.rhts.eng.bos.redhat.com-foreman-proxy-1.0-2.noarch
foreman-vmware-1.7.2.21-1.el7sat.noarch
rubygem-hammer_cli_foreman_discovery-0.0.1.10-1.el7sat.noarch
foreman-compute-1.7.2.21-1.el7sat.noarch
foreman-debug-1.7.2.21-1.el7sat.noarch

steps:

1.  Create and sync a capsule -- generally assure it is working.
2.  ifdown eth0 on capsule
3.  Create a product, "foobar"
4.  Attempt to add rpm repo to product "foobar"
5.  Repo added successfully

Comment 13 Bryan Kearney 2015-08-11 13:36:53 UTC
This bug is slated to be released with Satellite 6.1.

Comment 14 Bryan Kearney 2015-08-12 14:02:12 UTC
This bug was fixed in version 6.1.1 of Satellite which was released on 12 August, 2015.


Note You need to log in before you can comment on or make changes to this bug.