Bug 1543636 - Remote Execution SSH-based Power Action remains pending despite having successfully rebooted the host
Summary: Remote Execution SSH-based Power Action remains pending despite having succes...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Satellite
Classification: Red Hat
Component: Remote Execution
Version: 6.2.14
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: 6.4.0
Assignee: satellite6-bugs
QA Contact: Jameer Pathan
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-02-08 21:24 UTC by Pablo Hess
Modified: 2021-12-10 15:39 UTC (History)
15 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-10-16 18:53:28 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Foreman Issue Tracker 22679 0 Normal Closed REX task using job template for reboot might hang despite reboot succeeded 2020-02-26 16:47:13 UTC
Red Hat Knowledge Base (Solution) 3350801 0 None None None 2018-02-12 13:43:09 UTC

Description Pablo Hess 2018-02-08 21:24:12 UTC
Description of problem:
After issuing a host reboot job through SSH-based REX, the job may remain in Pending state on Satellite.  The host reboots just fine as expected but even after 18+ hours the task still shows Pending.


Version-Release number of selected component (if applicable):
Verified on Satellite 6.2.14 (package versions below). Not checked in earlier versions.

## Sat 6.2.14
foreman-proxy-1.11.0.7-1.el7sat.noarch
rubygem-smart_proxy_dynflow-0.1.3.1-1.el7sat.noarch
rubygem-smart_proxy_remote_execution_ssh-0.1.2.6-1.el7sat.noarch
tfm-rubygem-dynflow-0.8.13.6-1.el7sat.noarch
tfm-rubygem-foreman_remote_execution-0.3.0.19-1.el7sat.noarch
tfm-rubygem-smart_proxy_dynflow_core-0.1.3.1-1.el7sat.noarch
tfm-rubygem-smart_proxy_remote_execution_ssh_core-0.1.2.6-1.el7sat.noarch



How reproducible:
Most of the time. Tested on VMs only.

Steps to Reproduce:
1. On the webUI, run a remote execution job.
2. Select "Power Action - SSH Default" as Job Template, then insert target host(s) and set "restart" for action.
3. Execute it immediately.


Actual results:
Most of the times the action will enter Pending state. Less than 20% of the times the job will transition to stopped-success state. Remote host will successfully reboot 100% of the times. No errors are logged to foreman-tasks or dynflow console.



Expected results:
If the host reboots successfully, the task would enter stopped-success state.


Additional info:
Tests were performed using Satellite's internal capsule.

Example of one such job, still pending over 30 minutes after the host was successfully and completely rebooted:

irb(main):003:0> ForemanTasks::Task.find( "136e7e93-a2fc-466c-a916-9ffa880e0fb0")                                                                          
=> #<ForemanTasks::Task::DynflowTask id: "136e7e93-a2fc-466c-a916-9ffa880e0fb0", type: "ForemanTasks::Task::DynflowTask", label: "Actions::RemoteExecution::RunHostsJob", started_at: "2018-02-08 20:46:37", ended_at: nil, state: "running", result: "pending", external_id: "95ecf03e-1ffb-4847-a44f-dfc4f5f93913", parent_task_id: nil, start_at: "2018-02-08 20:46:37", start_before: nil>

Comment 3 Mark Watts 2018-02-14 09:50:32 UTC
I see the same thing when trying to restart RHEL 7 clients; the reboot works but the job never succeeds.

As a workaround, I've cloned the "Power Action - SSH Default" template and modified it to do this:

echo <%= input('action') %> host && sleep 3
<%= case input('action')
      when 'restart'
        'shutdown -r +1'
      else
        'shutdown -h now'
      end %>


This seems to work fine, albeit with a 1 minute delay.

Comment 4 Pavel Moravec 2018-02-20 08:06:59 UTC
Technical cause for the never-completed task is simply the fact that the "reboot" command might not return success return value, before network goes down during the already-initiated shutdown/reboot. Therefore "reboot -r +1" or other modification _ensuring_ the latest command will _always_ return success is a valid workaround/solution.

Comment 6 Pavel Moravec 2018-02-20 09:51:07 UTC
I created foreman issue for it and will open PR - let see what upstream community feedback will be.

IMHO every solution has its own limitation..

Comment 7 Satellite Program 2018-03-05 09:05:41 UTC
Moving this bug to POST for triage into Satellite 6 since the upstream issue http://projects.theforeman.org/issues/22679 has been resolved.

Comment 9 Jameer Pathan 2018-08-10 11:32:46 UTC
Verified:
 
@satellite 6.4.0 snap 16

Steps:

1. On the webUI, run a remote execution job.
2. Select "Power Action - SSH Default" as Job Template, then insert target host(s) and set "restart" for action.
3. Execute it immediately.

Observation

-the host reboots successfully, the task enter stopped-success state.

Comment 16 Bryan Kearney 2018-10-16 18:53:28 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2927


Note You need to log in before you can comment on or make changes to this bug.