Description of problem: When trying to create a product repo hangs indefinitely when a capsule disappears/is non-responsive. Removing it from Content Hosts/Hosts/Capsules does not seem to resolve it. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. Create and sync a capsule -- generally assure it is working. 2. Take it down -- wiping the box would work, but I suspect simply removing network connectivity would work. Note: we're emulating a severe issue here, don't remove it from Content Hosts/Hosts/Capsules (yet). 3. Create a product, "foobar" 4. Attempt to add rpm repo to product "foobar" 5. Wait.... and wait.... and wait. Actual results: Create repository 'foobar'; product 'foobar-repo'; organization 'Default_Organization' running pending Relevant lines seen in dynflow console 21: Actions::Pulp::Consumer::SyncNode (suspended) [ 1603.83s / 5.25s ] Cancel 26: Actions::Pulp::Repository::DistributorPublish (pending) If you attempt to try and resolve it by deleting content host/host/capsule, it still hangs. It is possible it will eventually timeout after a very long time in pulp, but that timeframe is presently unknown It is also presently unknown whether restarting any services will get us out of this wedged state. Expected results: A reasonable timeout? Something that doesn't hang when a capsule is forcibly removed from network? ....? Additional info:
More details for 21: Actions::Pulp::Consumer::SyncNode (suspended) [ 1603.83s / 5.25s ] Cancel Started at: 2014-08-27 20:55:16 UTC Ended at: 2014-08-27 21:29:14 UTC Real time: 2037.27s Execution time (excluding suspended state): 6.48s Input: --- consumer_uuid: 869aac38-e4cb-4583-9ef2-ba138debd192 skip_content: true remote_user: admin-f87eddb1 locale: en Output: --- pulp_tasks: - exception: task_type: _href: /pulp/api/v2/tasks/0738bc06-3c21-4527-8ebc-ad621617fd2d/ task_id: 0738bc06-3c21-4527-8ebc-ad621617fd2d tags: - pulp:consumer:869aac38-e4cb-4583-9ef2-ba138debd192 - pulp:action:unit_update finish_time: _ns: task_status start_time: traceback: spawned_tasks: [] progress_report: {} queue: agent state: waiting result: error: _id: $oid: 53fe45b4f1cfaa479b5d51a7 id: 53fe45b4f1cfaa479b5d51a7 poll_attempts: total: 147 failed: 0 More details for 26: Actions::Pulp::Repository::DistributorPublish (pending) Started at: Ended at: Real time: 0.00s Execution time (excluding suspended state): 0.00s Input: --- pulp_id: Default_Organization-sat6-sat6 distributor_type_id: yum_distributor source_pulp_id: dependency: remote_user: admin-f87eddb1 locale: en Output: --- {}
Hitting Cancel in dynflow for step 21 gets us out of the bind. I still think this is pretty gross though.
DOCS: Users should note that in cases where the Satellite and Capsule are geographically far apart with low or slow bandwidth between the systems it is possible that creating a custom repository may take a significant amount of time.
Additional DOC: If a capsule is physically removed or unresponsive, outside of Satellite 6 itself, creation of repos may hang indefinitely. The solution for this is to resolve the situation with the unresponsive capsule, removing it from the list of available capsules, if necessary. After this, use dynflow to cancel the process.
Created redmine issue http://projects.theforeman.org/issues/10009 from this bug
This was actually fixed as part of http://projects.theforeman.org/issues/10229 and https://bugzilla.redhat.com/show_bug.cgi?id=1192500 Moving to modified.
VERIFIED: # rpm -qa | grep foreman foreman-1.7.2.21-1.el7sat.noarch ruby193-rubygem-foreman_discovery-2.0.0.13-1.el7sat.noarch foreman-libvirt-1.7.2.21-1.el7sat.noarch ruby193-rubygem-foreman_gutterball-0.0.1.9-1.el7sat.noarch foreman-postgresql-1.7.2.21-1.el7sat.noarch ruby193-rubygem-foreman_bootdisk-4.0.2.13-1.el7sat.noarch dell-pem710-01.rhts.eng.bos.redhat.com-foreman-proxy-client-1.0-1.noarch foreman-ovirt-1.7.2.21-1.el7sat.noarch rubygem-hammer_cli_foreman-0.1.4.11-1.el7sat.noarch foreman-selinux-1.7.2.13-1.el7sat.noarch foreman-gce-1.7.2.21-1.el7sat.noarch ruby193-rubygem-foreman-redhat_access-0.1.0-1.el7sat.noarch ruby193-rubygem-foreman-tasks-0.6.12.5-1.el7sat.noarch rubygem-hammer_cli_foreman_tasks-0.0.3.4-1.el7sat.noarch rubygem-hammer_cli_foreman_docker-0.0.3.6-1.el7sat.noarch ruby193-rubygem-foreman_docker-1.2.0.12-1.el7sat.noarch ruby193-rubygem-foreman_hooks-0.3.7-2.el7sat.noarch rubygem-hammer_cli_foreman_bootdisk-0.1.2.7-1.el7sat.noarch foreman-proxy-1.7.2.4-1.el7sat.noarch dell-pem710-01.rhts.eng.bos.redhat.com-foreman-client-1.0-1.noarch dell-pem710-01.rhts.eng.bos.redhat.com-foreman-proxy-1.0-2.noarch foreman-vmware-1.7.2.21-1.el7sat.noarch rubygem-hammer_cli_foreman_discovery-0.0.1.10-1.el7sat.noarch foreman-compute-1.7.2.21-1.el7sat.noarch foreman-debug-1.7.2.21-1.el7sat.noarch steps: 1. Create and sync a capsule -- generally assure it is working. 2. ifdown eth0 on capsule 3. Create a product, "foobar" 4. Attempt to add rpm repo to product "foobar" 5. Repo added successfully
This bug is slated to be released with Satellite 6.1.
This bug was fixed in version 6.1.1 of Satellite which was released on 12 August, 2015.