Description of problem: After issuing a host reboot job through SSH-based REX, the job may remain in Pending state on Satellite. The host reboots just fine as expected but even after 18+ hours the task still shows Pending. Version-Release number of selected component (if applicable): Verified on Satellite 6.2.14 (package versions below). Not checked in earlier versions. ## Sat 6.2.14 foreman-proxy-1.11.0.7-1.el7sat.noarch rubygem-smart_proxy_dynflow-0.1.3.1-1.el7sat.noarch rubygem-smart_proxy_remote_execution_ssh-0.1.2.6-1.el7sat.noarch tfm-rubygem-dynflow-0.8.13.6-1.el7sat.noarch tfm-rubygem-foreman_remote_execution-0.3.0.19-1.el7sat.noarch tfm-rubygem-smart_proxy_dynflow_core-0.1.3.1-1.el7sat.noarch tfm-rubygem-smart_proxy_remote_execution_ssh_core-0.1.2.6-1.el7sat.noarch How reproducible: Most of the time. Tested on VMs only. Steps to Reproduce: 1. On the webUI, run a remote execution job. 2. Select "Power Action - SSH Default" as Job Template, then insert target host(s) and set "restart" for action. 3. Execute it immediately. Actual results: Most of the times the action will enter Pending state. Less than 20% of the times the job will transition to stopped-success state. Remote host will successfully reboot 100% of the times. No errors are logged to foreman-tasks or dynflow console. Expected results: If the host reboots successfully, the task would enter stopped-success state. Additional info: Tests were performed using Satellite's internal capsule. Example of one such job, still pending over 30 minutes after the host was successfully and completely rebooted: irb(main):003:0> ForemanTasks::Task.find( "136e7e93-a2fc-466c-a916-9ffa880e0fb0") => #<ForemanTasks::Task::DynflowTask id: "136e7e93-a2fc-466c-a916-9ffa880e0fb0", type: "ForemanTasks::Task::DynflowTask", label: "Actions::RemoteExecution::RunHostsJob", started_at: "2018-02-08 20:46:37", ended_at: nil, state: "running", result: "pending", external_id: "95ecf03e-1ffb-4847-a44f-dfc4f5f93913", parent_task_id: nil, start_at: "2018-02-08 20:46:37", start_before: nil>
I see the same thing when trying to restart RHEL 7 clients; the reboot works but the job never succeeds. As a workaround, I've cloned the "Power Action - SSH Default" template and modified it to do this: echo <%= input('action') %> host && sleep 3 <%= case input('action') when 'restart' 'shutdown -r +1' else 'shutdown -h now' end %> This seems to work fine, albeit with a 1 minute delay.
Technical cause for the never-completed task is simply the fact that the "reboot" command might not return success return value, before network goes down during the already-initiated shutdown/reboot. Therefore "reboot -r +1" or other modification _ensuring_ the latest command will _always_ return success is a valid workaround/solution.
I created foreman issue for it and will open PR - let see what upstream community feedback will be. IMHO every solution has its own limitation..
Moving this bug to POST for triage into Satellite 6 since the upstream issue http://projects.theforeman.org/issues/22679 has been resolved.
Verified: @satellite 6.4.0 snap 16 Steps: 1. On the webUI, run a remote execution job. 2. Select "Power Action - SSH Default" as Job Template, then insert target host(s) and set "restart" for action. 3. Execute it immediately. Observation -the host reboots successfully, the task enter stopped-success state.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:2927