Hide Forgot
Description of problem: Customer needs to run many long-running job invocations at the same time on multiple machines. These machines are located in a network with low bandwidth, so keeping many connections alive isn't possible as some jobs could take a long time (e.g: reposync, yum update). These connections waste resources on the client hosts which are not very powerful machines. This could be implemented by having another provider different to SSH or possibly by making ssh run the job and return right away (the capsule could check the status of the job somehow) Additional info: Currently they are running their own custom remote execution scripts which use Ansible core libraries to make calls asynchronously and poll for the status of the execution. The solution provided by Satellite does not necessarily have to poll for the status but it would need to provide a way to check it's status.
Created redmine issue http://projects.theforeman.org/issues/17514 from this bug
Upstream bug assigned to aruzicka
Moving this bug to POST for triage into Satellite 6 since the upstream issue http://projects.theforeman.org/issues/17514 has been resolved.
Has Foreman issue 17514 been ported to Satellite yet? We're on Satellite 6.2.14 and the "--async" option doesn't seem to have any effect as per: # hammer job-invocation create --job-template 'Run Command - SSH Default' --inputs 'command=ls' --search-query name=play01273.example.com --async Job invocation 31 created [...................................................................................................................................................................................................................................] [100%] 1 task(s), 1 success, 0 fail It would be really useful to have the same type of async operation as when using: # hammer host errata apply --errata-ids $errataList --host $host --async
@Mark: Please note this feature has nothing to do with hammer's --async flag. Hammer's --async flag tells hammer not to wait for the job invocation to finish. Preliminary steps: This feature can be enabled on a per-proxy basis by setting :async_ssh to true in /etc/smart_proxy_dynflow_core/settings.d/remote_execution_ssh.yml. The interval for checking on the remote jobs can be set in the same file under the runner_refresh_interval key. Apparently it is not exposed in the installer and needs to be uncommented and toggled in the file by hand. Steps to reproduce: 1) Complete the preliminary steps 2) Run a remote execution job which will take some time (sleep 600) 3) Log in to the server and use ss or netstat to look for opened SSH connections 4) (note) it may take up to a minute (iirc) for the kernel to completely "forget" the tcp connection Expected results: There should NOT be a persistent connection opened to the remote host
Verified in Satellite 6.3 Snap 35. Negative Test: Kicked off a job that executed the command `sleep 600` Satellite immediately started connection. While the command was running (sleeping), the connection was maintained. Finally Satellite killed the connection once the job was complete. Every 2.0s: ss | grep ssh Wed Feb 14 15:29:49 2018 tcp ESTAB 0 0 <host>:ssh <satellite>:37704 tcp ESTAB 0 0 <host>:ssh <self>:44772 Every 2.0s: ss | grep ssh Wed Feb 14 15:36:22 2018 tcp ESTAB 0 0 <host>:ssh <satellite>:37704 tcp ESTAB 0 0 <host>:ssh <self>:44772 Every 2.0s: ss | grep ssh Wed Feb 14 15:40:26 2018 tcp ESTAB 0 0 <host>:ssh <self>:44772 Positive Test: Added `:async_ssh: true` to /etc/smart_proxy_dynflow_core/settings.d/remote_execution_ssh.yml Restarted satellite. Kicked off the job from before (sleep 600). Satellite checked in on the host, then exited. Satellite then periodically checked in until the job completed. # for i in {1..60}; do ss | grep ssh >> connections.txt && sleep 2; done # cat connections.txt ... tcp ESTAB 0 0 <host>:ssh <self>:44772 tcp ESTAB 0 0 <host>:ssh <self>:44772 tcp ESTAB 0 0 <host>:ssh <self>:44772 tcp ESTAB 0 52 <host>:ssh <satellite>:40490 tcp ESTAB 0 0 <host>:ssh <self>:44772 tcp ESTAB 0 0 <host>:ssh <self>:44772 tcp ESTAB 0 0 <host>:ssh <self>:44772 ... In both cases the job completed successfully. However, only after making the settings change, did the job run asynchronously as expected.
Ok, I'm misunderstanding what the --async flag for "hammer job-invocation" is doing here then. My expectation was that: # time hammer job-invocation create --job-template 'Run Command - SSH Default' --inputs 'command=ls' --search-query name=play01273.example.com --async Job invocation 32 created [...................................................................................................................................................................................................................................] [100%] 1 task(s), 1 success, 0 fail real 3m17.119s user 0m1.752s sys 0m0.644s Would return to the console immediately, which it does not.
(In reply to Mark Watts from comment #17) > My expectation was that: > > Would return to the console immediately, which it does not. Your expectation was right, however this feature was broken for quite some time and should be fixed in 6.3. Please see the BZ[1] for it. [1] - https://bugzilla.redhat.com/show_bug.cgi?id=1440962
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. > > For information on the advisory, and where to find the updated files, follow the link below. > > If the solution does not work for you, open a new bug report. > > https://access.redhat.com/errata/RHSA-2018:0336