Bug 1102763

Summary: capsule: synchronize command never times out/silently fails.
Product: Red Hat Satellite Reporter: Corey Welton <cwelton>
Component: Foreman ProxyAssignee: Mike McCune <mmccune>
Status: CLOSED CURRENTRELEASE QA Contact: Tazim Kolhar <tkolhar>
Severity: high Docs Contact:
Priority: unspecified    
Version: 6.0.3CC: ahumbe, bbuckingham, bkearney, cwelton, jmontleo, jsherril, kabbott, mmccune, nshaik, shughes, tkolhar
Target Milestone: UnspecifiedKeywords: Triaged
Target Release: Unused   
Hardware: Unspecified   
OS: Unspecified   
URL: http://projects.theforeman.org/issues/7162
Whiteboard:
Fixed In Version: Doc Type: Release Note
Doc Text:
In certain cases, the synchronize will fail with no indication on the UI. If this is seen, please run foreman-debug on the server and submit a support request with the output of that command.
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-08-12 13:58:49 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1115190    

Description Corey Welton 2014-05-29 13:57:15 UTC
Description of problem:
if a capsule runs into some issue that keeps syncs from completing, there is nothing to indicate this

Version-Release number of selected component (if applicable):

Satellite-6.0.3-RHEL-6-20140528.4

How reproducible:

unsure.

Steps to Reproduce:
1.  Attempt to sync sat content to a capsule.  It may (or may not) help to reproduce this if you have two servers over wide geographical locations
2.  Wait
3. View results.

Actual results:

In the synchronize process... user sees really nothing, other than the progress bar never moving -- in my case at 50%.

In the pulp logs on sat server we see:

May 29 15:48:44 ibm-x3550m3-07 pulp: pulp.server.async.scheduler:ERROR: Workers 'reserved_resource_worker-23.eng.brq.redhat.com' has gone missing, removing from list of workers
May 29 15:48:52 ibm-x3550m3-07 pulp: gofer.transport.qpid.consumer:ERROR: 7461b684-4048-4e72-94dd-3b82956c6fab
May 29 15:48:52 ibm-x3550m3-07 pulp: gofer.transport.qpid.consumer:ERROR: Traceback (most recent call last):
May 29 15:48:52 ibm-x3550m3-07 pulp: gofer.transport.qpid.consumer:ERROR:   File "/usr/lib/python2.6/site-packages/gofer/transport/qpid/consumer.py", line 113, in get
May 29 15:48:52 ibm-x3550m3-07 pulp: gofer.transport.qpid.consumer:ERROR:     return self.__receiver.fetch(timeout=timeout)
May 29 15:48:52 ibm-x3550m3-07 pulp: gofer.transport.qpid.consumer:ERROR:   File "<string>", line 6, in fetch
May 29 15:48:52 ibm-x3550m3-07 pulp: gofer.transport.qpid.consumer:ERROR:   File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 1030, in fetch
May 29 15:48:52 ibm-x3550m3-07 pulp: gofer.transport.qpid.consumer:ERROR:     self._ecwait(lambda: self.linked)
May 29 15:48:52 ibm-x3550m3-07 pulp: gofer.transport.qpid.consumer:ERROR:   File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 50, in _ecwait
May 29 15:48:52 ibm-x3550m3-07 pulp: gofer.transport.qpid.consumer:ERROR:     result = self._ewait(lambda: self.closed or predicate(), timeout)
May 29 15:48:52 ibm-x3550m3-07 pulp: gofer.transport.qpid.consumer:ERROR:   File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 994, in _ewait
May 29 15:48:52 ibm-x3550m3-07 pulp: gofer.transport.qpid.consumer:ERROR:     self.check_error()
May 29 15:48:52 ibm-x3550m3-07 pulp: gofer.transport.qpid.consumer:ERROR:   File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 983, in check_error
May 29 15:48:52 ibm-x3550m3-07 pulp: gofer.transport.qpid.consumer:ERROR:     raise self.error
May 29 15:48:52 ibm-x3550m3-07 pulp: gofer.transport.qpid.consumer:ERROR: NotFound: no such queue: pulp.task

Expected results:
The log above may have something to do with what's causing this, but is not the main issue; rather, the issue is the synchronizer never "gives up" or indicates an issue

Additional info:

Comment 2 Brad Buckingham 2014-08-19 22:06:42 UTC
Created redmine issue http://projects.theforeman.org/issues/7162 from this bug

Comment 3 Brad Buckingham 2014-08-20 17:52:41 UTC
Much has changed in the code since this bug was initially raised and unfortunately I wasn't able to recreate the scenario as described.  I did, however, attempt to simulate a scenario where the server could not reach the capsule, resulting in a timeout during the sync.  In that case, Satellite would report 'success' to the user; however, the task (from pulp) would actually fail.

In order to simulate this scenario:
- from the capsule:
  service pulp_celerybeat stop
  service pulp_resource_manager stop

- from the satellite:
hammer> capsule content synchronize --id 3 --environment-id 5
[.................................................................                                                                  ] [50%]
Task 2246bfb5-131f-4171-a7c3-6e16e3276ddd: error

The following katello PR has the proposed fix for this scenario:
   https://github.com/Katello/katello/pull/4595

Comment 7 Corey Welton 2014-09-02 15:43:47 UTC
Bouncing back to dev for 6.1

Comment 8 Steve Loranz 2015-04-02 15:30:26 UTC
Connecting redmine issue http://projects.theforeman.org/issues/7162 from this bug

Comment 9 Brad Buckingham 2015-04-21 16:31:16 UTC
*** Bug 1213816 has been marked as a duplicate of this bug. ***

Comment 10 Mike McCune 2015-06-09 17:18:26 UTC
New fix upstream:

https://github.com/Katello/katello/pull/5278

Comment 13 Justin Sherrill 2015-06-15 19:13:10 UTC
https://github.com/Katello/katello/pull/5304

Comment 14 Tazim Kolhar 2015-06-16 08:15:21 UTC
VERIFIED:

# rpm -qa | grep foreman
ruby193-rubygem-foreman_discovery-2.0.0.15-1.el7sat.noarch
foreman-vmware-1.7.2.27-1.el7sat.noarch
rubygem-hammer_cli_foreman_bootdisk-0.1.2.7-1.el7sat.noarch
foreman-debug-1.7.2.27-1.el7sat.noarch
foreman-libvirt-1.7.2.27-1.el7sat.noarch
ruby193-rubygem-foreman_gutterball-0.0.1.9-1.el7sat.noarch
foreman-compute-1.7.2.27-1.el7sat.noarch
foreman-gce-1.7.2.27-1.el7sat.noarch
ruby193-rubygem-foreman-redhat_access-0.2.0-8.el7sat.noarch
rubygem-hammer_cli_foreman_tasks-0.0.3.4-1.el7sat.noarch
rubygem-hammer_cli_foreman_docker-0.0.3.7-1.el7sat.noarch
puppet-foreman_scap_client-0.3.3-9.el7sat.noarch
foreman-1.7.2.27-1.el7sat.noarch
ruby193-rubygem-foreman_docker-1.2.0.14-1.el7sat.noarch
ruby193-rubygem-foreman_hooks-0.3.7-2.el7sat.noarch
rubygem-hammer_cli_foreman-0.1.4.14-1.el7sat.noarch
foreman-selinux-1.7.2.13-1.el7sat.noarch
foreman-proxy-1.7.2.5-1.el7sat.noarch
foreman-postgresql-1.7.2.27-1.el7sat.noarch
rhsm-qe-2.rhq.lab.eng.bos.redhat.com-foreman-client-1.0-1.noarch
rhsm-qe-2.rhq.lab.eng.bos.redhat.com-foreman-proxy-client-1.0-1.noarch
rhsm-qe-2.rhq.lab.eng.bos.redhat.com-foreman-proxy-1.0-1.noarch
foreman-ovirt-1.7.2.27-1.el7sat.noarch
rubygem-hammer_cli_foreman_discovery-0.0.1.10-1.el7sat.noarch
ruby193-rubygem-foreman-tasks-0.6.12.8-1.el7sat.noarch
ruby193-rubygem-foreman_bootdisk-4.0.2.13-1.el7sat.noarch

steps:
# hammer -u admin -p changeme capsule content synchronize --id=2
Could not synchronize capsule content:
  Couldn't find SmartProxy with id=2 [WHERE "features"."name" IN ('Pulp Node')]

Appropriate error is displayed

Comment 15 Justin Sherrill 2015-06-16 12:40:18 UTC
I'm going to move this back to 'ON_QA' as I'm not sure it was verified properly.  From the error it seems there was no capsule with id 2 or that capsule did not have the pulp node feature.

Comment 16 Tazim Kolhar 2015-07-08 10:37:13 UTC
FAILEDQA:

# rpm -qa | grep foreman
ruby193-rubygem-foreman-tasks-0.6.12.8-1.el7sat.noarch
rubygem-hammer_cli_foreman_docker-0.0.3.9-1.el7sat.noarch
foreman-debug-1.7.2.29-1.el7sat.noarch
foreman-postgresql-1.7.2.29-1.el7sat.noarch
foreman-vmware-1.7.2.29-1.el7sat.noarch
rubygem-hammer_cli_foreman_bootdisk-0.1.2.7-1.el7sat.noarch
foreman-selinux-1.7.2.13-1.el7sat.noarch
foreman-1.7.2.29-1.el7sat.noarch
foreman-ovirt-1.7.2.29-1.el7sat.noarch
ruby193-rubygem-foreman_hooks-0.3.7-2.el7sat.noarch
rubygem-hammer_cli_foreman_discovery-0.0.1.10-1.el7sat.noarch
foreman-proxy-1.7.2.5-1.el7sat.noarch
ibm-x3655-03.ovirt.rhts.eng.bos.redhat.com-foreman-proxy-1.0-2.noarch
foreman-compute-1.7.2.29-1.el7sat.noarch
foreman-gce-1.7.2.29-1.el7sat.noarch
ruby193-rubygem-foreman-redhat_access-0.2.0-8.el7sat.noarch
rubygem-hammer_cli_foreman-0.1.4.14-1.el7sat.noarch
foreman-libvirt-1.7.2.29-1.el7sat.noarch
ruby193-rubygem-foreman_gutterball-0.0.1.9-1.el7sat.noarch
ibm-x3655-03.ovirt.rhts.eng.bos.redhat.com-foreman-client-1.0-1.noarch
ibm-x3655-03.ovirt.rhts.eng.bos.redhat.com-foreman-proxy-client-1.0-1.noarch
ruby193-rubygem-foreman_bootdisk-4.0.2.13-1.el7sat.noarch
ruby193-rubygem-foreman_docker-1.2.0.18-1.el7sat.noarch
rubygem-hammer_cli_foreman_tasks-0.0.3.4-1.el7sat.noarch
ruby193-rubygem-foreman_discovery-2.0.0.15-1.el7sat.noarch

steps:

on Capsule Server I executed :
# service pulp_celerybeat stop
celery init v10.0.
Using configuration: /etc/default/pulp_workers, /etc/default/pulp_celerybeat
Stopping pulp_celerybeat... OK

# service pulp_resource_manager stop
celery init v10.0.
Using config script: /etc/default/pulp_resource_manager
celery multi v3.1.11 (Cipater)
> Stopping nodes...
	> resource_manager.eng.bos.redhat.com: QUIT -> 9155
> Waiting for 1 node -> 9155.....
	> resource_manager.eng.bos.redhat.com: OK

# service pulp_celerybeat status
celery init v10.0.
Using configuration: /etc/default/pulp_workers, /etc/default/pulp_celerybeat
pulp_celerybeat is stopped.

# service pulp_resource_manager status
celery init v10.0.
Using config script: /etc/default/pulp_resource_manager
node resource_manager is stopped...

On Satellite6 Server I executed:

# hammer capsule content synchronize --id 2
[Foreman] Username: admin
[Foreman] Password for admin: 
[......................................................................] [100%]


It does not shows any error message. It does shows only 100% complete
please, let me know if anything else has to be added here

Comment 17 Tazim Kolhar 2015-07-09 07:47:29 UTC
VERIFIED:

# rpm -qa | grep foreman
ruby193-rubygem-foreman-tasks-0.6.12.8-1.el7sat.noarch
rubygem-hammer_cli_foreman_docker-0.0.3.9-1.el7sat.noarch
foreman-selinux-1.7.2.13-1.el7sat.noarch
foreman-ovirt-1.7.2.30-1.el7sat.noarch
rubygem-hammer_cli_foreman_bootdisk-0.1.2.7-1.el7sat.noarch
foreman-debug-1.7.2.30-1.el7sat.noarch
ruby193-rubygem-foreman_bootdisk-4.0.2.13-1.el7sat.noarch
foreman-1.7.2.30-1.el7sat.noarch
ruby193-rubygem-foreman_docker-1.2.0.18-1.el7sat.noarch
ruby193-rubygem-foreman-redhat_access-0.2.0-8.el7sat.noarch
rubygem-hammer_cli_foreman_discovery-0.0.1.10-1.el7sat.noarch
foreman-proxy-1.7.2.5-1.el7sat.noarch
ibm-x3755-02.ovirt.rhts.eng.bos.redhat.com-foreman-client-1.0-1.noarch
ibm-x3755-02.ovirt.rhts.eng.bos.redhat.com-foreman-proxy-client-1.0-1.noarch
foreman-compute-1.7.2.30-1.el7sat.noarch
foreman-vmware-1.7.2.30-1.el7sat.noarch
ruby193-rubygem-foreman_hooks-0.3.7-2.el7sat.noarch
rubygem-hammer_cli_foreman-0.1.4.14-1.el7sat.noarch
foreman-libvirt-1.7.2.30-1.el7sat.noarch
ruby193-rubygem-foreman_gutterball-0.0.1.9-1.el7sat.noarch
ibm-x3755-02.ovirt.rhts.eng.bos.redhat.com-foreman-proxy-1.0-1.noarch
foreman-gce-1.7.2.30-1.el7sat.noarch
rubygem-hammer_cli_foreman_tasks-0.0.3.4-1.el7sat.noarch
ruby193-rubygem-foreman_discovery-2.0.0.17-1.el7sat.noarch
foreman-postgresql-1.7.2.30-1.el7sat.noarch


steps:
# service pulp_celerybeat stop
Redirecting to /bin/systemctl stop  pulp_celerybeat.service

# service pulp_resource_manager stop
Redirecting to /bin/systemctl stop  pulp_resource_manager.service

# hammer capsule content synchronize --id 2
[Foreman] Username: admin
[Foreman] Password for admin: 
[..........................................................................    ] [95%]
Host did not respond within 20 seconds. Is katello-agent installed and goferd running on the Host?

Comment 19 Bryan Kearney 2015-08-11 13:33:24 UTC
This bug is slated to be released with Satellite 6.1.

Comment 20 Bryan Kearney 2015-08-12 13:58:49 UTC
This bug was fixed in version 6.1.1 of Satellite which was released on 12 August, 2015.