1134594 – Trying to add a repo with a missing/unresponsive capsule hangs indefinitely/for a very long time.

Red Hat Satellite engineering is moving the tracking of its product development work on Satellite to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "Satellite project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs will be migrated starting at the end of May. If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "Satellite project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/SAT-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1134594 - Trying to add a repo with a missing/unresponsive capsule hangs indefinitely/for a very long time.

Summary: Trying to add a repo with a missing/unresponsive capsule hangs indefinitely/f...

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Satellite
Classification:	Red Hat
Component:	Content Management
Sub Component:
Version:	6.0.3
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	Unspecified
Assignee:	Justin Sherrill
QA Contact:	Tazim Kolhar
Docs Contact:
URL:	http://projects.theforeman.org/issues...
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2014-08-27 21:25 UTC by Corey Welton
Modified:	2019-09-12 07:58 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2015-08-12 14:02:12 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Corey Welton 2014-08-27 21:25:20 UTC

Description of problem:
When trying to create a product repo hangs indefinitely when a capsule disappears/is non-responsive.  Removing it from Content Hosts/Hosts/Capsules does not seem to resolve it.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.  Create and sync a capsule -- generally assure it is working.
2.  Take it down -- wiping the box would work, but I suspect simply removing network connectivity would work. Note: we're emulating a severe issue here, don't remove it from Content Hosts/Hosts/Capsules (yet).
3.  Create a product, "foobar"
4.  Attempt to add rpm repo to product "foobar"
5.  Wait.... and wait.... and wait.

Actual results:

Create repository 'foobar'; product 'foobar-repo'; organization 'Default_Organization'	running	pending

Relevant lines seen in dynflow console
21: Actions::Pulp::Consumer::SyncNode (suspended) [ 1603.83s / 5.25s ]  Cancel
26: Actions::Pulp::Repository::DistributorPublish (pending)

If you attempt to try and resolve it by deleting content host/host/capsule, it still hangs.  It is possible it will eventually timeout after a very long time in pulp, but that timeframe is presently unknown

It is also presently unknown whether restarting any services will get us out of this wedged state.

Expected results:

A reasonable timeout?
Something that doesn't hang when a capsule is forcibly removed from network?
....?

Additional info:

Comment 1 Corey Welton 2014-08-27 21:31:26 UTC

More details for 

21: Actions::Pulp::Consumer::SyncNode (suspended) [ 1603.83s / 5.25s ]  Cancel


Started at: 2014-08-27 20:55:16 UTC

Ended at: 2014-08-27 21:29:14 UTC

Real time: 2037.27s

Execution time (excluding suspended state): 6.48s

Input:

---
consumer_uuid: 869aac38-e4cb-4583-9ef2-ba138debd192
skip_content: true
remote_user: admin-f87eddb1
locale: en
Output:

---
pulp_tasks:
- exception: 
  task_type: 
  _href: /pulp/api/v2/tasks/0738bc06-3c21-4527-8ebc-ad621617fd2d/
  task_id: 0738bc06-3c21-4527-8ebc-ad621617fd2d
  tags:
  - pulp:consumer:869aac38-e4cb-4583-9ef2-ba138debd192
  - pulp:action:unit_update
  finish_time: 
  _ns: task_status
  start_time: 
  traceback: 
  spawned_tasks: []
  progress_report: {}
  queue: agent
  state: waiting
  result: 
  error: 
  _id:
    $oid: 53fe45b4f1cfaa479b5d51a7
  id: 53fe45b4f1cfaa479b5d51a7
poll_attempts:
  total: 147
  failed: 0


More details for 

26: Actions::Pulp::Repository::DistributorPublish (pending)

Started at:

Ended at:

Real time: 0.00s

Execution time (excluding suspended state): 0.00s

Input:

---
pulp_id: Default_Organization-sat6-sat6
distributor_type_id: yum_distributor
source_pulp_id: 
dependency: 
remote_user: admin-f87eddb1
locale: en
Output:

--- {}

Comment 3 Corey Welton 2014-08-27 21:35:00 UTC

Hitting Cancel in dynflow for step 21 gets us out of the bind.  I still think this is pretty gross though.

Comment 4 Mike McCune 2014-08-28 01:18:24 UTC

DOCS:

Users should note that in cases where the Satellite and Capsule are geographically far apart with low or slow bandwidth between the systems it is possible that creating a custom repository may take a significant amount of time.

Comment 5 Corey Welton 2014-08-28 13:42:40 UTC

Additional DOC:

If a capsule is physically removed or unresponsive, outside of Satellite 6 itself, creation of repos may hang indefinitely. The solution for this is to resolve the situation with the unresponsive capsule, removing it from the list of available capsules, if necessary.  After this, use dynflow to cancel the process.

Comment 10 Steve Loranz 2015-04-02 15:15:57 UTC

Created redmine issue http://projects.theforeman.org/issues/10009 from this bug

Comment 11 Justin Sherrill 2015-05-04 16:50:44 UTC

This was actually fixed as part of http://projects.theforeman.org/issues/10229 and https://bugzilla.redhat.com/show_bug.cgi?id=1192500

Moving to modified.

Comment 12 Tazim Kolhar 2015-05-14 11:09:58 UTC

VERIFIED:
# rpm -qa | grep foreman
foreman-1.7.2.21-1.el7sat.noarch
ruby193-rubygem-foreman_discovery-2.0.0.13-1.el7sat.noarch
foreman-libvirt-1.7.2.21-1.el7sat.noarch
ruby193-rubygem-foreman_gutterball-0.0.1.9-1.el7sat.noarch
foreman-postgresql-1.7.2.21-1.el7sat.noarch
ruby193-rubygem-foreman_bootdisk-4.0.2.13-1.el7sat.noarch
dell-pem710-01.rhts.eng.bos.redhat.com-foreman-proxy-client-1.0-1.noarch
foreman-ovirt-1.7.2.21-1.el7sat.noarch
rubygem-hammer_cli_foreman-0.1.4.11-1.el7sat.noarch
foreman-selinux-1.7.2.13-1.el7sat.noarch
foreman-gce-1.7.2.21-1.el7sat.noarch
ruby193-rubygem-foreman-redhat_access-0.1.0-1.el7sat.noarch
ruby193-rubygem-foreman-tasks-0.6.12.5-1.el7sat.noarch
rubygem-hammer_cli_foreman_tasks-0.0.3.4-1.el7sat.noarch
rubygem-hammer_cli_foreman_docker-0.0.3.6-1.el7sat.noarch
ruby193-rubygem-foreman_docker-1.2.0.12-1.el7sat.noarch
ruby193-rubygem-foreman_hooks-0.3.7-2.el7sat.noarch
rubygem-hammer_cli_foreman_bootdisk-0.1.2.7-1.el7sat.noarch
foreman-proxy-1.7.2.4-1.el7sat.noarch
dell-pem710-01.rhts.eng.bos.redhat.com-foreman-client-1.0-1.noarch
dell-pem710-01.rhts.eng.bos.redhat.com-foreman-proxy-1.0-2.noarch
foreman-vmware-1.7.2.21-1.el7sat.noarch
rubygem-hammer_cli_foreman_discovery-0.0.1.10-1.el7sat.noarch
foreman-compute-1.7.2.21-1.el7sat.noarch
foreman-debug-1.7.2.21-1.el7sat.noarch

steps:

1.  Create and sync a capsule -- generally assure it is working.
2.  ifdown eth0 on capsule
3.  Create a product, "foobar"
4.  Attempt to add rpm repo to product "foobar"
5.  Repo added successfully

Comment 13 Bryan Kearney 2015-08-11 13:36:53 UTC

This bug is slated to be released with Satellite 6.1.

Comment 14 Bryan Kearney 2015-08-12 14:02:12 UTC

This bug was fixed in version 6.1.1 of Satellite which was released on 12 August, 2015.

Note You need to log in before you can comment on or make changes to this bug.