Red Hat Satellite engineering is moving the tracking of its product development work on Satellite to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "Satellite project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs will be migrated starting at the end of May. If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "Satellite project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/SAT-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1517048 - Remote Execution is slow when running on Scale
Summary: Remote Execution is slow when running on Scale
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Satellite
Classification: Red Hat
Component: Remote Execution
Version: 6.3.0
Hardware: Unspecified
OS: Unspecified
high
medium
Target Milestone: Unspecified
Assignee: satellite6-bugs
QA Contact:
URL:
Whiteboard: scale_lab
: 1517559 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-11-24 04:41 UTC by sbadhwar
Modified: 2018-02-21 16:54 UTC (History)
11 users (show)

Fixed In Version: tfm-rubygems-dynflow-0.8.34
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-02-21 16:54:37 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Foreman Issue Tracker 21980 0 High Closed Inefficient polling for tasks update 2020-06-26 14:41:58 UTC
Red Hat Bugzilla 1520487 0 high CLOSED Remote Execution does not resume properly 2021-02-22 00:41:40 UTC

Internal Links: 1520487

Description sbadhwar 2017-11-24 04:41:08 UTC
Description of problem:
The remote execution seems to be too-slow when running at Scale. This happen even for the simplest of the commands such as running date command using ReX.

Here are some data:
ReX running date command while using sqlite as database for dynflow:-
Total number of hosts: 29902
Total time taken: 22hrs+

ReX running date command while using in-memory database for dynflow:-
Total number of hosts: 29902
Total time taken: 15hrs

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1. Schedule a remote execution for a large number of hosts (e.g. 30k)
2. Run date command under remote execution
3.

Actual results:


Expected results:
The ReX time should be less atleast for the simple commands such as date

Additional info:

Comment 1 Ivan Necas 2017-11-24 06:51:34 UTC
Please provide task export containing the tasks from the invocation fir further analysis and more investigation

Comment 2 sbadhwar 2017-11-24 06:59:08 UTC
(In reply to Ivan Necas from comment #1)
> Please provide task export containing the tasks from the invocation fir
> further analysis and more investigation

Hello Ivan,

I tried to export the tasks using the following command:
foreman-rake foreman_tasks:export_tasks

But seems like the command is aborting with the following error message:
[root@c10-h17-r730xd-vm1 ~]# foreman-rake foreman_tasks:export_tasks
Gathering 172056 tasks.
rake aborted!
Errno::ENOENT: No such file or directory @ rb_sysopen - /opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-0.8.30/web/assets/vendor/google-code-prettify/run_prettify.js
/opt/theforeman/tfm/root/usr/share/gems/gems/foreman-tasks-0.9.6/lib/foreman_tasks/tasks/export_tasks.rake:217:in `block in copy_assets'
/opt/theforeman/tfm/root/usr/share/gems/gems/foreman-tasks-0.9.6/lib/foreman_tasks/tasks/export_tasks.rake:214:in `each'
/opt/theforeman/tfm/root/usr/share/gems/gems/foreman-tasks-0.9.6/lib/foreman_tasks/tasks/export_tasks.rake:214:in `copy_assets'
/opt/theforeman/tfm/root/usr/share/gems/gems/foreman-tasks-0.9.6/lib/foreman_tasks/tasks/export_tasks.rake:251:in `block (3 levels) in <top (required)>'
/opt/theforeman/tfm/root/usr/share/gems/gems/foreman-tasks-0.9.6/lib/foreman_tasks/tasks/export_tasks.rake:250:in `block (2 levels) in <top (required)>'
Tasks: TOP => foreman_tasks:export_tasks
(See full trace by running task with --trace)

Comment 3 Adam Ruzicka 2017-11-24 07:45:44 UTC
(In reply to sbadhwar from comment #2)
Hello, you're most likely hitting this https://bugzilla.redhat.com/show_bug.cgi?id=1512562 . Until that is resolved, could you please get us foreman-debug? It should contain raw dump of dynflow's db.

Comment 4 Ivan Necas 2017-11-24 08:12:21 UTC
Also note the fix for #1512562 is actually a one-liner: you could just remove the specific line https://github.com/theforeman/foreman-tasks/pull/296 and the export should work then.

Comment 5 Ivan Necas 2017-11-24 08:12:34 UTC
But +1 for foreman-debug

Comment 7 Adam Ruzicka 2017-11-29 09:12:33 UTC
*** Bug 1517559 has been marked as a duplicate of this bug. ***

Comment 8 Adam Ruzicka 2017-11-29 09:15:47 UTC
Reposting data from the duplicate BZ here for completeness.

Description of problem:
Using ssh in a loop (serialized) is 3 times faster than same action with Remote Execution on clients equally distributed over 10 capsules (with dynflow database set to "in memory" on satellite and capsules).

Satellite is a VM with 20 cores and 47 GB of RAM. 10 capsules (again VMs) have 8 CPUs and 16 GB RAM. All (including hosts) on 10G network.


Version-Release number of selected component (if applicable):
satellite-6.3.0-21.0.beta.el7sat.noarch


How reproducible:
always


Steps to Reproduce:
1. Run ReX job on 30k hosts with command
   `systemctl stop rhsmcertd; systemctl disable rhsmcertd`
2. Try with a subset with simple loop and ssh


Actual results:
Job is now running for 23 hours, 24 minutes and reports 24800 hosts as done so far, i.e. more than 3 seconds per host.

I have tried to run simple loop with ssh (it is this complicated only because of IP ranges we are using:

# time \
    for ip1 in $( seq 0 30 ); do
        for ip2 in 0 1 2 3; do
            ip=$( expr $ip1 \* 8 + $ip2 )
            ssh -o "StrictHostKeyChecking no" \
                -i /root/id_rsa_perf
                root.$ip.100
                "systemctl stop rhsmcertd; systemctl disable rhsmcertd"
        done
    done

This ran the command on 124 hosts and finished 2m4.953s, i.e. slightly above 1 second per host.


Expected results:
I know satellite and capsules are doing much more than just sshing to the clients (e.g. storing results for later auditing), but as the load should be somehow distributed among 10 capsules and as we have already tuned database on satellite and capsules to be inmemory only, I would expect speed of this action to be close to what I'm able to achieve with ssh or faster.

Comment 13 Adam Ruzicka 2017-12-04 15:12:13 UTC
(In reply to sbadhwar from comment #10)
This was a new bug, created a BZ[1] for it.

[1] - https://bugzilla.redhat.com/show_bug.cgi?id=1520487

Comment 14 Ivan Necas 2017-12-14 17:21:36 UTC
Created redmine issue http://projects.theforeman.org/issues/21980 from this bug

Comment 15 Ivan Necas 2017-12-14 17:29:28 UTC
We've found a possible regression that could cause the degradation in the performance in 6.3: see attached issue. With this improvement, we should get to a bit better numbers. There is for sure more things we could do for performance improvements, but I would leave those once outside of the advanced phase of 6.3 release. The main goal is to make sure the performance of 6.3 is better (or at least the same) as 6.2.

Will this change make rex ultimately fast? Most probably no.
Will it make it faster than it is now: Most probably yes.

Comment 16 Ivan Necas 2017-12-14 17:36:42 UTC
Upstream release here https://github.com/theforeman/foreman-packaging/pull/1983

Comment 17 Satellite Program 2018-02-21 16:54:37 UTC
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA.
> > 
> > For information on the advisory, and where to find the updated files, follow the link below.
> > 
> > If the solution does not work for you, open a new bug report.
> > 
> > https://access.redhat.com/errata/RHSA-2018:0336


Note You need to log in before you can comment on or make changes to this bug.