Bug 1431956
| Summary: | about 22 sub-tasks of remote execution task on 5000 systems were left in pending after 7.5 hours | ||
|---|---|---|---|
| Product: | Red Hat Satellite | Reporter: | Jan Hutař <jhutar> |
| Component: | Remote Execution | Assignee: | Adam Ruzicka <aruzicka> |
| Status: | CLOSED ERRATA | QA Contact: | Roman Plevka <rplevka> |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 6.2.8 | CC: | aruzicka, inecas, jcallaha, ktordeur, pcreech, rdrazny, rplevka |
| Target Milestone: | 6.4.0 | Keywords: | Triaged |
| Target Release: | Unused | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | tfm-rubygem-foreman_remote_execution-1.5.4 | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2018-10-16 19:27:18 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
I can use "Cancel" button to cancel individual sub-tasks as well. Oh, looks like I have to click on the "Cancel" button twice to cancel sub-task. Upstream bug assigned to aruzicka Upstream bug assigned to aruzicka *** Bug 1596642 has been marked as a duplicate of this bug. *** Proposing for 6.4: we got also another report from this https://bugzilla.redhat.com/show_bug.cgi?id=1595081, where the simpler reproducer is desribed. The failure can be simulated by just restarting the smart_proxy_dynflow_core service during the job execution is running. After the fix, the task should get time-outed after 10 minutes. VERIFIED on sat6.4.0-21 performed `ls` command over ssh on 5000 hosts with 4 workers. The task finished successfully with 100% success rate, no pending subtasks left Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:2927 |
Description of problem: About 22 sub-tasks of remote execution task on 5000 systems were left in pending after 7.5 hours Version-Release number of selected component (if applicable): Sat: satellite-6.2.8-4.0.el7sat.noarch Capsule: satellite-capsule-6.2.8-4.0.el7sat.noarch How reproducible: often Steps to Reproduce: 1. Run ReX `date` on 5000 systems Actual results: Sub-task is still in pending: Id: 2123a83d-2325-4d5c-befb-57f2105fc78c Label: Actions::RemoteExecution::RunHostJob Name: Remote action: Owner: Execution type: Delayed Start at: 2017-03-14 00:30:33 +0100 Start before: - Started at: 2017-03-14 00:30:33 +0100 Ended at: State: running Result: - Params: Run date on gprfc028container342.example.com Copy&Paste of a task's "Running Steps" tab days it is suspended: Action: Actions::RemoteExecution::RunProxyCommand State: suspended Input: {"effective_user"=>"root", "ssh_user"=>"root", "effective_user_method"=>"sudo", "hostname"=>"172.22.57.86", "script"=>"date", "connection_options"=>{"retry_interval"=>15, "retry_count"=>4, "timeout"=>60}, "proxy_url"=>"https://gprfc017capsule7....:9090", "locale"=>"en"} Output: {"metadata"=>{"timeout"=>"2017-03-13 21:59:24 -0400"}, "proxy_task_id"=>"ab94b984-39cd-42a2-b51b-357a8de5a0df"} Expected results: Should work, ReX should be reliable