Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
Red Hat Satellite engineering is moving the tracking of its product development work on Satellite to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "Satellite project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs will be migrated starting at the end of May. If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "Satellite project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/SAT-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1628145

Summary: Ansible Remote Execution Job Stalls at Scale
Product: Red Hat Satellite Reporter: sbadhwar
Component: Remote ExecutionAssignee: satellite6-bugs <satellite6-bugs>
Status: CLOSED DUPLICATE QA Contact:
Severity: high Docs Contact:
Priority: high    
Version: 6.4CC: aruzicka, inecas, jhutar, mmccune, psuriset, sbadhwar
Target Milestone: UnspecifiedKeywords: Performance, Triaged
Target Release: Unused   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-09-18 11:23:29 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1628505, 1646745    
Bug Blocks:    

Description sbadhwar 2018-09-12 11:28:10 UTC
Description of problem:
While running Ansible package install command at scale (45K hosts with 1 package to be installed), the job got stalled with no progress happening. On taking a further look, it seems like the smart_proxy service on 1 of the capsules wasn't processing any task.

Version-Release number of selected component (if applicable):
Satellite 6.4 Snap 18

How reproducible:
Not sure

Steps to Reproduce:
1. Create a new remote execution job with Ansible package command
2. Execute the job on large scale (35k or more hosts)

Actual results:
The job gets stalled with no progress at certain time

Expected results:
The job execution finishes successfully

Additional info:
Foreman debug satellite: http://debugs.theforeman.org/foreman-debug-XHcxQ.tar.xz
Foreman debug capsule: http://debugs.theforeman.org/foreman-debug-a5qE7.tar.xz

Comment 1 Adam Ruzicka 2018-09-12 11:37:58 UTC
Additional info:
There was a bunch of tasks on the capsule, dynflow status showed +-850 events in the queue and 0/5 free workers, however the process was completely idle.