Description of problem: In satellite 6.3, the time for everything to be ready to respond to http requests or handle tasks is significantly longer than in 6.2. This leads to potential initial timeouts when trying to register a host after the installation finishes or after restarting of services. Steps to Reproduce (a): 1. systemctl restart httpd 2. wait until the /users/login is available Actual results: It takes about 2 minutes to be ready Expected results: On similar setup on satellite 6.2, it takes under a minute Steps to Reproduce (b): 1. systemctl restart foreman-tasks 2. watch /var/log/foreman/dynflow_executor.output for containing 'Everything ready for world' Actual results: It takes about 3.5 minutes to be ready Expected results: On similar setup on satellite 6.2, it takes under a minute Additional info: I've narrowed it down to /opt/rh/rh-ror42/root/usr/share/gems/gems/railties-4.2.6/lib/rails/application/routes_reloader.rb , that runs twice during the boot time and seems to cause the significant slow down at startup. It seems that something changed between rails 4.1 and 4.2 to cause the delays. The fix should make sure we don't start reloading the routes twice at the initialization, as that's what causes the long time to start. Setting the severity on high, as it has impact not only on response times after the installation finishes, but also when the passenger scales up the processes.
We are seeing this bug block upgrades from 6.2 -> 6.3, as outlined here: https://bugzilla.redhat.com/show_bug.cgi?id=1495205
Another issue we saw w/ upgrade automation and I guess its because of this issue: https://bugzilla.redhat.com/show_bug.cgi?id=1499402
I was able to narrow down the slow startup issue with the following: 1. the slow startup of the environment is caused by howrails prepare the routes helpers: it iterates though all the defined routes and calls routes define_method on a module, that is being included to all the controllers (and other parts that utilize the helpers) https://github.com/rails/rails/blob/4-2-stable/actionpack/lib/action_dispatch/routing/route_set.rb#L342 2. while usually, defining a method using define_method is relatively cheap operation, it gets dramatically more expensive when there are a lot of classes the module is included into. 3. there was also refactoring between rails 4.1 and 4.2, that removed some caching of used modules, which also contributed into the amount of controllers to be used https://github.com/rails/rails/commit/2bbcca004cc232cef868cd0e301f274ce5638df0#diff-ccdef70ce0e95afc8816f47f89b56c60L400, there has been some fixes in Rails 5 around that in https://github.com/rails/rails/commit/226cd8a094c81edb72143173e4d0c88a2eb01d5f and https://github.com/rails/rails/commit/3b637802b26fd9b4b80e87d9df24e25580d1a85b, but even with this change, the load times were not significantly faster. There might be some other changes, that I was not able to track down 4. there were also changes in foreman and other plugins, that use more helper modules in more classes than before, all of them probably contributing to the slow load times. One workaround in foreman-tasks was turning off the eager-loading of the classes, so that the majority of the controllers would not get loaded, which reduced the amount of classes that included the module. I was able to get to original numbers, when it comes to startup times for foreman-tasks. However, since foreman-tasks is running multiple threads, this solution would include some risks. The solution I would like to explore is, instead of turning eager loading completely, disabling eager loading for helpers and controller, which are the classes that should never be used from the foreman-tasks process. This should also give us some memory benefits. However, the slower time for the rails application would still be there: I can't think of any workaround for this issue at the moment. It however should not affect the upgrades issues that we are facing currently. Additional improvement could be done on the check for the tasks availability to be a bit more resilient on the tasks reporting un-availability, as in this case, the tasks were starting, but since they were busy by loading the additional code, the respose times were slower than katello expected.
*** Bug 1495205 has been marked as a duplicate of this bug. ***
*** Bug 1499402 has been marked as a duplicate of this bug. ***
Created attachment 1336764 [details] path to workaround the slow start of the rails environment in foreman-tasks To apply: cd /opt/theforeman/tfm/root/usr/share/gems/gems/foreman-tasks-* patch -p1 < ~/eager_load.patch For the upgrades: this should happen right after the yum update is called and the 6.3 packages are freshly installed on the system
Created redmine issue http://projects.theforeman.org/issues/21295 from this bug
Upstream bug assigned to inecas
I've proposed a final patch upstream, that should address both slow start of the passenger, as well as foreman-tasks https://github.com/theforeman/foreman/pull/4908. Steps to verify: service httpd restart; time curl -k https://localhost On my setup, without the patch, it gives me 1m 15s, with the patch, I get 15s only, which is similar or even a bit faster than on sat 62
Created attachment 1337521 [details] Patch for properly fixing the slow start of the rails environment This patch should properly fix the slow start of the rails environment and was proposed upstream. To apply: cd /usr/share/foreman patch -p1 < ~/foreman_routes.patch As with the previous patch, this should be applied after yum update before running the installer. This previous patch obsoletes the previous one.
Thanks Ivan. I tested the Patch w/ regular upgrade automation and didn't see any issue while satellite upgrade. Now trying upgrade w/ some other DB and will update here.
with the patch: # systemctl restart httpd; time curl -k https://localhost <html><body>You are being <a href="https://localhost/users/login">redirected</a>.</body></html> real 0m23.379s user 0m0.034s sys 0m0.045s without the patch: # systemctl restart httpd; time curl -k https://localhost <html><body>You are being <a href="https://localhost/users/login">redirected</a>.</body></html> real 2m38.011s user 0m0.050s sys 0m0.036s this is some massive improvement!
Moving this bug to POST for triage into Satellite 6 since the upstream issue http://projects.theforeman.org/issues/21295 has been resolved.
Verified w/ sat6.3 snap21. I used the same DB with which I faced the reported issue and this time w/ snap21, I didn't see any issue w/ upgrade. Just to ensure everything is working, I'm using two DB's for testing. With both DB's upgrade was completed successfully.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:0336