Bug 1517048
Summary: | Remote Execution is slow when running on Scale | ||
---|---|---|---|
Product: | Red Hat Satellite | Reporter: | sbadhwar |
Component: | Remote Execution | Assignee: | satellite6-bugs <satellite6-bugs> |
Status: | CLOSED ERRATA | QA Contact: | |
Severity: | medium | Docs Contact: | |
Priority: | high | ||
Version: | 6.3.0 | CC: | aruzicka, bbuckingham, bkearney, cduryee, inecas, jhutar, lzap, mmccune, psuriset, sbadhwar, zhunting |
Target Milestone: | Unspecified | Keywords: | Performance, PrioBumpQA, Triaged |
Target Release: | Unused | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | scale_lab | ||
Fixed In Version: | tfm-rubygems-dynflow-0.8.34 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2018-02-21 16:54:37 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
sbadhwar
2017-11-24 04:41:08 UTC
Please provide task export containing the tasks from the invocation fir further analysis and more investigation (In reply to Ivan Necas from comment #1) > Please provide task export containing the tasks from the invocation fir > further analysis and more investigation Hello Ivan, I tried to export the tasks using the following command: foreman-rake foreman_tasks:export_tasks But seems like the command is aborting with the following error message: [root@c10-h17-r730xd-vm1 ~]# foreman-rake foreman_tasks:export_tasks Gathering 172056 tasks. rake aborted! Errno::ENOENT: No such file or directory @ rb_sysopen - /opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-0.8.30/web/assets/vendor/google-code-prettify/run_prettify.js /opt/theforeman/tfm/root/usr/share/gems/gems/foreman-tasks-0.9.6/lib/foreman_tasks/tasks/export_tasks.rake:217:in `block in copy_assets' /opt/theforeman/tfm/root/usr/share/gems/gems/foreman-tasks-0.9.6/lib/foreman_tasks/tasks/export_tasks.rake:214:in `each' /opt/theforeman/tfm/root/usr/share/gems/gems/foreman-tasks-0.9.6/lib/foreman_tasks/tasks/export_tasks.rake:214:in `copy_assets' /opt/theforeman/tfm/root/usr/share/gems/gems/foreman-tasks-0.9.6/lib/foreman_tasks/tasks/export_tasks.rake:251:in `block (3 levels) in <top (required)>' /opt/theforeman/tfm/root/usr/share/gems/gems/foreman-tasks-0.9.6/lib/foreman_tasks/tasks/export_tasks.rake:250:in `block (2 levels) in <top (required)>' Tasks: TOP => foreman_tasks:export_tasks (See full trace by running task with --trace) (In reply to sbadhwar from comment #2) Hello, you're most likely hitting this https://bugzilla.redhat.com/show_bug.cgi?id=1512562 . Until that is resolved, could you please get us foreman-debug? It should contain raw dump of dynflow's db. Also note the fix for #1512562 is actually a one-liner: you could just remove the specific line https://github.com/theforeman/foreman-tasks/pull/296 and the export should work then. But +1 for foreman-debug *** Bug 1517559 has been marked as a duplicate of this bug. *** Reposting data from the duplicate BZ here for completeness. Description of problem: Using ssh in a loop (serialized) is 3 times faster than same action with Remote Execution on clients equally distributed over 10 capsules (with dynflow database set to "in memory" on satellite and capsules). Satellite is a VM with 20 cores and 47 GB of RAM. 10 capsules (again VMs) have 8 CPUs and 16 GB RAM. All (including hosts) on 10G network. Version-Release number of selected component (if applicable): satellite-6.3.0-21.0.beta.el7sat.noarch How reproducible: always Steps to Reproduce: 1. Run ReX job on 30k hosts with command `systemctl stop rhsmcertd; systemctl disable rhsmcertd` 2. Try with a subset with simple loop and ssh Actual results: Job is now running for 23 hours, 24 minutes and reports 24800 hosts as done so far, i.e. more than 3 seconds per host. I have tried to run simple loop with ssh (it is this complicated only because of IP ranges we are using: # time \ for ip1 in $( seq 0 30 ); do for ip2 in 0 1 2 3; do ip=$( expr $ip1 \* 8 + $ip2 ) ssh -o "StrictHostKeyChecking no" \ -i /root/id_rsa_perf root.$ip.100 "systemctl stop rhsmcertd; systemctl disable rhsmcertd" done done This ran the command on 124 hosts and finished 2m4.953s, i.e. slightly above 1 second per host. Expected results: I know satellite and capsules are doing much more than just sshing to the clients (e.g. storing results for later auditing), but as the load should be somehow distributed among 10 capsules and as we have already tuned database on satellite and capsules to be inmemory only, I would expect speed of this action to be close to what I'm able to achieve with ssh or faster. (In reply to sbadhwar from comment #10) This was a new bug, created a BZ[1] for it. [1] - https://bugzilla.redhat.com/show_bug.cgi?id=1520487 Created redmine issue http://projects.theforeman.org/issues/21980 from this bug We've found a possible regression that could cause the degradation in the performance in 6.3: see attached issue. With this improvement, we should get to a bit better numbers. There is for sure more things we could do for performance improvements, but I would leave those once outside of the advanced phase of 6.3 release. The main goal is to make sure the performance of 6.3 is better (or at least the same) as 6.2. Will this change make rex ultimately fast? Most probably no. Will it make it faster than it is now: Most probably yes. Upstream release here https://github.com/theforeman/foreman-packaging/pull/1983 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA.
> >
> > For information on the advisory, and where to find the updated files, follow the link below.
> >
> > If the solution does not work for you, open a new bug report.
> >
> > https://access.redhat.com/errata/RHSA-2018:0336
|