1102763 – capsule: synchronize command never times out/silently fails.

Red Hat Satellite engineering is moving the tracking of its product development work on Satellite to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "Satellite project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs will be migrated starting at the end of May. If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "Satellite project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/SAT-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1102763 - capsule: synchronize command never times out/silently fails.

Summary: capsule: synchronize command never times out/silently fails.

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Satellite
Classification:	Red Hat
Component:	Foreman Proxy
Sub Component:
Version:	6.0.3
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	Unspecified
Assignee:	Mike McCune
QA Contact:	Tazim Kolhar
Docs Contact:
URL:	http://projects.theforeman.org/issues...
Whiteboard:
Duplicates (1):	1213816 (view as bug list)
Depends On:
Blocks:	GSS_Sat6Beta_Tracker, GSS_Sat6_Tracker
TreeView+	depends on / blocked

Reported:	2014-05-29 13:57 UTC by Corey Welton
Modified:	2019-08-15 03:50 UTC (History)
CC List:	11 users (show)
Fixed In Version:
Doc Type:	Release Note
Doc Text:	In certain cases, the synchronize will fail with no indication on the UI. If this is seen, please run foreman-debug on the server and submit a support request with the output of that command.
Clone Of:
Environment:
Last Closed:	2015-08-12 13:58:49 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Corey Welton 2014-05-29 13:57:15 UTC

Description of problem:
if a capsule runs into some issue that keeps syncs from completing, there is nothing to indicate this

Version-Release number of selected component (if applicable):

Satellite-6.0.3-RHEL-6-20140528.4

How reproducible:

unsure.

Steps to Reproduce:
1.  Attempt to sync sat content to a capsule.  It may (or may not) help to reproduce this if you have two servers over wide geographical locations
2.  Wait
3. View results.

Actual results:

In the synchronize process... user sees really nothing, other than the progress bar never moving -- in my case at 50%.

In the pulp logs on sat server we see:

May 29 15:48:44 ibm-x3550m3-07 pulp: pulp.server.async.scheduler:ERROR: Workers 'reserved_resource_worker-23.eng.brq.redhat.com' has gone missing, removing from list of workers
May 29 15:48:52 ibm-x3550m3-07 pulp: gofer.transport.qpid.consumer:ERROR: 7461b684-4048-4e72-94dd-3b82956c6fab
May 29 15:48:52 ibm-x3550m3-07 pulp: gofer.transport.qpid.consumer:ERROR: Traceback (most recent call last):
May 29 15:48:52 ibm-x3550m3-07 pulp: gofer.transport.qpid.consumer:ERROR:   File "/usr/lib/python2.6/site-packages/gofer/transport/qpid/consumer.py", line 113, in get
May 29 15:48:52 ibm-x3550m3-07 pulp: gofer.transport.qpid.consumer:ERROR:     return self.__receiver.fetch(timeout=timeout)
May 29 15:48:52 ibm-x3550m3-07 pulp: gofer.transport.qpid.consumer:ERROR:   File "<string>", line 6, in fetch
May 29 15:48:52 ibm-x3550m3-07 pulp: gofer.transport.qpid.consumer:ERROR:   File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 1030, in fetch
May 29 15:48:52 ibm-x3550m3-07 pulp: gofer.transport.qpid.consumer:ERROR:     self._ecwait(lambda: self.linked)
May 29 15:48:52 ibm-x3550m3-07 pulp: gofer.transport.qpid.consumer:ERROR:   File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 50, in _ecwait
May 29 15:48:52 ibm-x3550m3-07 pulp: gofer.transport.qpid.consumer:ERROR:     result = self._ewait(lambda: self.closed or predicate(), timeout)
May 29 15:48:52 ibm-x3550m3-07 pulp: gofer.transport.qpid.consumer:ERROR:   File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 994, in _ewait
May 29 15:48:52 ibm-x3550m3-07 pulp: gofer.transport.qpid.consumer:ERROR:     self.check_error()
May 29 15:48:52 ibm-x3550m3-07 pulp: gofer.transport.qpid.consumer:ERROR:   File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 983, in check_error
May 29 15:48:52 ibm-x3550m3-07 pulp: gofer.transport.qpid.consumer:ERROR:     raise self.error
May 29 15:48:52 ibm-x3550m3-07 pulp: gofer.transport.qpid.consumer:ERROR: NotFound: no such queue: pulp.task

Expected results:
The log above may have something to do with what's causing this, but is not the main issue; rather, the issue is the synchronizer never "gives up" or indicates an issue

Additional info:

Comment 2 Brad Buckingham 2014-08-19 22:06:42 UTC

Created redmine issue http://projects.theforeman.org/issues/7162 from this bug

Comment 3 Brad Buckingham 2014-08-20 17:52:41 UTC

Much has changed in the code since this bug was initially raised and unfortunately I wasn't able to recreate the scenario as described.  I did, however, attempt to simulate a scenario where the server could not reach the capsule, resulting in a timeout during the sync.  In that case, Satellite would report 'success' to the user; however, the task (from pulp) would actually fail.

In order to simulate this scenario:
- from the capsule:
  service pulp_celerybeat stop
  service pulp_resource_manager stop

- from the satellite:
hammer> capsule content synchronize --id 3 --environment-id 5
[.................................................................                                                                  ] [50%]
Task 2246bfb5-131f-4171-a7c3-6e16e3276ddd: error

The following katello PR has the proposed fix for this scenario:
   https://github.com/Katello/katello/pull/4595

Comment 7 Corey Welton 2014-09-02 15:43:47 UTC

Bouncing back to dev for 6.1

Comment 8 Steve Loranz 2015-04-02 15:30:26 UTC

Connecting redmine issue http://projects.theforeman.org/issues/7162 from this bug

Comment 9 Brad Buckingham 2015-04-21 16:31:16 UTC

*** Bug 1213816 has been marked as a duplicate of this bug. ***

Comment 10 Mike McCune 2015-06-09 17:18:26 UTC

New fix upstream:

https://github.com/Katello/katello/pull/5278

Comment 13 Justin Sherrill 2015-06-15 19:13:10 UTC

https://github.com/Katello/katello/pull/5304

Comment 14 Tazim Kolhar 2015-06-16 08:15:21 UTC

VERIFIED:

# rpm -qa | grep foreman
ruby193-rubygem-foreman_discovery-2.0.0.15-1.el7sat.noarch
foreman-vmware-1.7.2.27-1.el7sat.noarch
rubygem-hammer_cli_foreman_bootdisk-0.1.2.7-1.el7sat.noarch
foreman-debug-1.7.2.27-1.el7sat.noarch
foreman-libvirt-1.7.2.27-1.el7sat.noarch
ruby193-rubygem-foreman_gutterball-0.0.1.9-1.el7sat.noarch
foreman-compute-1.7.2.27-1.el7sat.noarch
foreman-gce-1.7.2.27-1.el7sat.noarch
ruby193-rubygem-foreman-redhat_access-0.2.0-8.el7sat.noarch
rubygem-hammer_cli_foreman_tasks-0.0.3.4-1.el7sat.noarch
rubygem-hammer_cli_foreman_docker-0.0.3.7-1.el7sat.noarch
puppet-foreman_scap_client-0.3.3-9.el7sat.noarch
foreman-1.7.2.27-1.el7sat.noarch
ruby193-rubygem-foreman_docker-1.2.0.14-1.el7sat.noarch
ruby193-rubygem-foreman_hooks-0.3.7-2.el7sat.noarch
rubygem-hammer_cli_foreman-0.1.4.14-1.el7sat.noarch
foreman-selinux-1.7.2.13-1.el7sat.noarch
foreman-proxy-1.7.2.5-1.el7sat.noarch
foreman-postgresql-1.7.2.27-1.el7sat.noarch
rhsm-qe-2.rhq.lab.eng.bos.redhat.com-foreman-client-1.0-1.noarch
rhsm-qe-2.rhq.lab.eng.bos.redhat.com-foreman-proxy-client-1.0-1.noarch
rhsm-qe-2.rhq.lab.eng.bos.redhat.com-foreman-proxy-1.0-1.noarch
foreman-ovirt-1.7.2.27-1.el7sat.noarch
rubygem-hammer_cli_foreman_discovery-0.0.1.10-1.el7sat.noarch
ruby193-rubygem-foreman-tasks-0.6.12.8-1.el7sat.noarch
ruby193-rubygem-foreman_bootdisk-4.0.2.13-1.el7sat.noarch

steps:
# hammer -u admin -p changeme capsule content synchronize --id=2
Could not synchronize capsule content:
  Couldn't find SmartProxy with id=2 [WHERE "features"."name" IN ('Pulp Node')]

Appropriate error is displayed

Comment 15 Justin Sherrill 2015-06-16 12:40:18 UTC

I'm going to move this back to 'ON_QA' as I'm not sure it was verified properly.  From the error it seems there was no capsule with id 2 or that capsule did not have the pulp node feature.

Comment 16 Tazim Kolhar 2015-07-08 10:37:13 UTC

FAILEDQA:

# rpm -qa | grep foreman
ruby193-rubygem-foreman-tasks-0.6.12.8-1.el7sat.noarch
rubygem-hammer_cli_foreman_docker-0.0.3.9-1.el7sat.noarch
foreman-debug-1.7.2.29-1.el7sat.noarch
foreman-postgresql-1.7.2.29-1.el7sat.noarch
foreman-vmware-1.7.2.29-1.el7sat.noarch
rubygem-hammer_cli_foreman_bootdisk-0.1.2.7-1.el7sat.noarch
foreman-selinux-1.7.2.13-1.el7sat.noarch
foreman-1.7.2.29-1.el7sat.noarch
foreman-ovirt-1.7.2.29-1.el7sat.noarch
ruby193-rubygem-foreman_hooks-0.3.7-2.el7sat.noarch
rubygem-hammer_cli_foreman_discovery-0.0.1.10-1.el7sat.noarch
foreman-proxy-1.7.2.5-1.el7sat.noarch
ibm-x3655-03.ovirt.rhts.eng.bos.redhat.com-foreman-proxy-1.0-2.noarch
foreman-compute-1.7.2.29-1.el7sat.noarch
foreman-gce-1.7.2.29-1.el7sat.noarch
ruby193-rubygem-foreman-redhat_access-0.2.0-8.el7sat.noarch
rubygem-hammer_cli_foreman-0.1.4.14-1.el7sat.noarch
foreman-libvirt-1.7.2.29-1.el7sat.noarch
ruby193-rubygem-foreman_gutterball-0.0.1.9-1.el7sat.noarch
ibm-x3655-03.ovirt.rhts.eng.bos.redhat.com-foreman-client-1.0-1.noarch
ibm-x3655-03.ovirt.rhts.eng.bos.redhat.com-foreman-proxy-client-1.0-1.noarch
ruby193-rubygem-foreman_bootdisk-4.0.2.13-1.el7sat.noarch
ruby193-rubygem-foreman_docker-1.2.0.18-1.el7sat.noarch
rubygem-hammer_cli_foreman_tasks-0.0.3.4-1.el7sat.noarch
ruby193-rubygem-foreman_discovery-2.0.0.15-1.el7sat.noarch

steps:

on Capsule Server I executed :
# service pulp_celerybeat stop
celery init v10.0.
Using configuration: /etc/default/pulp_workers, /etc/default/pulp_celerybeat
Stopping pulp_celerybeat... OK

# service pulp_resource_manager stop
celery init v10.0.
Using config script: /etc/default/pulp_resource_manager
celery multi v3.1.11 (Cipater)
> Stopping nodes...
	> resource_manager.eng.bos.redhat.com: QUIT -> 9155
> Waiting for 1 node -> 9155.....
	> resource_manager.eng.bos.redhat.com: OK

# service pulp_celerybeat status
celery init v10.0.
Using configuration: /etc/default/pulp_workers, /etc/default/pulp_celerybeat
pulp_celerybeat is stopped.

# service pulp_resource_manager status
celery init v10.0.
Using config script: /etc/default/pulp_resource_manager
node resource_manager is stopped...

On Satellite6 Server I executed:

# hammer capsule content synchronize --id 2
[Foreman] Username: admin
[Foreman] Password for admin: 
[......................................................................] [100%]


It does not shows any error message. It does shows only 100% complete
please, let me know if anything else has to be added here

Comment 17 Tazim Kolhar 2015-07-09 07:47:29 UTC

VERIFIED:

# rpm -qa | grep foreman
ruby193-rubygem-foreman-tasks-0.6.12.8-1.el7sat.noarch
rubygem-hammer_cli_foreman_docker-0.0.3.9-1.el7sat.noarch
foreman-selinux-1.7.2.13-1.el7sat.noarch
foreman-ovirt-1.7.2.30-1.el7sat.noarch
rubygem-hammer_cli_foreman_bootdisk-0.1.2.7-1.el7sat.noarch
foreman-debug-1.7.2.30-1.el7sat.noarch
ruby193-rubygem-foreman_bootdisk-4.0.2.13-1.el7sat.noarch
foreman-1.7.2.30-1.el7sat.noarch
ruby193-rubygem-foreman_docker-1.2.0.18-1.el7sat.noarch
ruby193-rubygem-foreman-redhat_access-0.2.0-8.el7sat.noarch
rubygem-hammer_cli_foreman_discovery-0.0.1.10-1.el7sat.noarch
foreman-proxy-1.7.2.5-1.el7sat.noarch
ibm-x3755-02.ovirt.rhts.eng.bos.redhat.com-foreman-client-1.0-1.noarch
ibm-x3755-02.ovirt.rhts.eng.bos.redhat.com-foreman-proxy-client-1.0-1.noarch
foreman-compute-1.7.2.30-1.el7sat.noarch
foreman-vmware-1.7.2.30-1.el7sat.noarch
ruby193-rubygem-foreman_hooks-0.3.7-2.el7sat.noarch
rubygem-hammer_cli_foreman-0.1.4.14-1.el7sat.noarch
foreman-libvirt-1.7.2.30-1.el7sat.noarch
ruby193-rubygem-foreman_gutterball-0.0.1.9-1.el7sat.noarch
ibm-x3755-02.ovirt.rhts.eng.bos.redhat.com-foreman-proxy-1.0-1.noarch
foreman-gce-1.7.2.30-1.el7sat.noarch
rubygem-hammer_cli_foreman_tasks-0.0.3.4-1.el7sat.noarch
ruby193-rubygem-foreman_discovery-2.0.0.17-1.el7sat.noarch
foreman-postgresql-1.7.2.30-1.el7sat.noarch


steps:
# service pulp_celerybeat stop
Redirecting to /bin/systemctl stop  pulp_celerybeat.service

# service pulp_resource_manager stop
Redirecting to /bin/systemctl stop  pulp_resource_manager.service

# hammer capsule content synchronize --id 2
[Foreman] Username: admin
[Foreman] Password for admin: 
[..........................................................................    ] [95%]
Host did not respond within 20 seconds. Is katello-agent installed and goferd running on the Host?

Comment 19 Bryan Kearney 2015-08-11 13:33:24 UTC

This bug is slated to be released with Satellite 6.1.

Comment 20 Bryan Kearney 2015-08-12 13:58:49 UTC

This bug was fixed in version 6.1.1 of Satellite which was released on 12 August, 2015.

Note You need to log in before you can comment on or make changes to this bug.