Bug 1163452
Summary: | puppet requests blocking UI | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Satellite | Reporter: | Alex Krzos <akrzos> | ||||||
Component: | Installation | Assignee: | Chris Duryee <cduryee> | ||||||
Status: | CLOSED ERRATA | QA Contact: | jcallaha | ||||||
Severity: | urgent | Docs Contact: | |||||||
Priority: | urgent | ||||||||
Version: | 6.0.4 | CC: | abalakht, bbuckingham, bkearney, cdonnell, cduryee, dgupte, ehelms, igreen, jcallaha, mburgerh, mjahangi, mmccune, mmello, mrichter, mtenheuv, perfbz, psuriset, rakumar, xdmoon | ||||||
Target Milestone: | Unspecified | Keywords: | PrioBumpGSS | ||||||
Target Release: | Unused | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | foreman-installer-1.11.0.11-1,satellite-installer-6.2.0.13-1,rubygem-kafo-0.7.6.1-1 | Doc Type: | Bug Fix | ||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | |||||||||
: | 1405533 (view as bug list) | Environment: | |||||||
Last Closed: | 2017-01-26 10:47:38 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 1115190, 1405533 | ||||||||
Attachments: |
|
Since this issue was entered in Red Hat Bugzilla, the release flag has been set to ? to ensure that it is properly evaluated for this release. Satellite is currently configured to handle six simulateneous requests to passenger. Additional requests will be read by apache but sit in W state until a worker is available. Here is how I tested: edit /usr/share/foreman/app/controllers/api/v2/users_controller.rb and alter the "index" function to be as such: def index sleep(20) @users = resource_scope_for_index end This will make API calls for the user list take 20 seconds, which gives plenty of time to load up the workers via hammer and watch the results. Then, make hammer calls: hammer -u admin -p <password> user list & Note that the first time you do this, it will take a little longer since additional workers need to spin up. Subsequent to the first run, you'll see the first X calls take 20-ish seconds, but additional calls will be serviced after the first batch drain out, or will time out. One fix for this is to have multiple pools of workers, but I think being able to alter the PassengerMaxPoolSize setting via the installer will be sufficient, and servicing API and UI calls from the same pool vs two pools. Upstream bug component is Provisioning Upstream bug component is Installer Connecting redmine issue http://projects.theforeman.org/issues/14127 from this bug *** Bug 1329544 has been marked as a duplicate of this bug. *** *** Bug 1333291 has been marked as a duplicate of this bug. *** example conf addition that guarantees a certain # of workers for web UI requests # cat /etc/httpd/conf.d/passenger_tunings.conf # this file contains additional tunings for passenger <IfModule mod_passenger.c> # PassengerMaxPoolSize is dependent on your memory and number of # cores. It is 6 by default. PassengerMaxPoolSize 10 # PassengerMaxInstancesPerApp depends on server utilization, but 1/2 of # PassengerMaxPoolSize is a good starting point PassengerMaxInstancesPerApp 5 PassengerMaxRequestQueueSize 250 PassengerStatThrottleRate 120 </IfModule> *** Bug 1356285 has been marked as a duplicate of this bug. *** Moving this bug to POST for triage into Satellite 6 since the upstream issue http://projects.theforeman.org/issues/14023 has been resolved. *** Bug 1360406 has been marked as a duplicate of this bug. *** I'm still seeing quite a bit of slowdown in Satellite 6.2.7 Snap 1. I ran 6 concurrent API host searches, as seen below, on a pretty beefy system 100k+ hosts. During the searches, an example page went from about 1.1s load times to just over 12s. Is there some kind of configuration i should be doing, or is this fix supposed to be built-in to the release? Attaching screenshot of load times. -bash-4.2# for i in {1..6}; do docker run -d ch-d:rhel7 curl -X GET -k -u admin:changeme -H "Content-Type: application/json" "https://rhsm-qe-1.rhq.lab.eng.bos.redhat.com/api/v2/hosts?search=name~virt&per_page=1000";done; 3cb4e34f3a1fd33724b0dd8f15d8495f7b238ac61d67fb8c78bb0a1f05f55afb 14c05b2d075b267b51da1b8b4b8dcf1341207051135efd593a761ea20e106034 905b4b2b4592af1c437a3145d056db1143c1c325d590884a6be9794e632c7a7a c4f09546053a6025944c49958aabde321a26dd9e38e87d54b1b0d02a136bbe4b 0dfc6b9662a45937196ed0d527ebb674685998d2c911b67742276535e66f1370 64914c57b7293b6567f92dcc43fe6271d98599161cd28f2df75b0853e08d4743 { "total": 101906, "subtotal": 99609, "page": 1, "per_page": 1000, "search": "name~virt", "sort": { "by": null, "order": null }, "results": [ .... Created attachment 1241412 [details]
page load time
Verified in Satellite 6.2.7 Snap 2 Followed steps in #20, scaled the number of simultaneous requests to 6, 8, 10, 12, 14, 16. For everything less than 12 requests, there was an occasional lag in page load time (6-12s) right after the requests were kicked off. However immediately after, page load times were back to normal. When I submitted 12 or more simultaneous requests, the page load times jumped to upwards of a minute, which is expected as all the passengers were consumed. After upping the PassengerMaxPoolSize to 20, the performance returned to what was seen with less than 12 simultaneous requests. default configuration -bash-4.2# cat /etc/httpd/conf.modules.d/passenger_extra.conf # The Passenger Apache module configuration file is being # managed by Puppet and changes will be overwritten. <IfModule mod_passenger.c> PassengerMaxPoolSize 12 PassengerMaxInstancesPerApp 6 PassengerMaxRequestQueueSize 250 PassengerStatThrottleRate 120 </IfModule> Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:0197 *** Bug 1384260 has been marked as a duplicate of this bug. *** *** Bug 1377060 has been marked as a duplicate of this bug. *** |
Created attachment 956839 [details] production.log output on sat6 server. Description of problem: Due to the design that the UI shares the same application processes that process API requests, long api request can block a UI transaction. I anticipate that any request that consumes the foreman application process can block the UI as well. Version-Release number of selected component (if applicable): Sat6-GA-Post-Release-Compose3 candlepin-0.9.23.1-1.el7.noarch candlepin-common-1.0.1-1.el7.noarch candlepin-guice-3.0-2_redhat_1.el7.noarch candlepin-scl-1-5.el7.noarch candlepin-scl-quartz-2.1.5-6.el7.noarch candlepin-scl-rhino-1.7R3-3.el7.noarch candlepin-scl-runtime-1-5.el7.noarch candlepin-selinux-0.9.23.1-1.el7.noarch candlepin-tomcat-0.9.23.1-1.el7.noarch elasticsearch-0.90.10-6.el7sat.noarch katello-1.5.0-30.el7sat.noarch katello-certs-tools-1.5.6-1.el7sat.noarch katello-default-ca-1.0-1.noarch katello-installer-0.0.64-1.el7sat.noarch katello-server-ca-1.0-1.noarch perfc-380g8-01.perf.lab.eng.rdu.redhat.com-qpid-broker-1.0-1.noarch perfc-380g8-01.perf.lab.eng.rdu.redhat.com-qpid-client-cert-1.0-1.noarch pulp-katello-0.3-4.el7sat.noarch pulp-nodes-common-2.4.3-0.1.beta.el7sat.noarch pulp-nodes-parent-2.4.3-0.1.beta.el7sat.noarch pulp-puppet-plugins-2.4.3-1.el7sat.noarch pulp-puppet-tools-2.4.3-1.el7sat.noarch pulp-rpm-plugins-2.4.3-1.el7sat.noarch pulp-selinux-2.4.3-1.el7sat.noarch pulp-server-2.4.3-1.el7sat.noarch python-gofer-qpid-1.3.0-1.el7sat.noarch python-isodate-0.5.0-1.pulp.el7sat.noarch python-kombu-3.0.15-12.pulp.el7sat.noarch python-pulp-bindings-2.4.3-1.el7sat.noarch python-pulp-common-2.4.3-1.el7sat.noarch python-pulp-puppet-common-2.4.3-1.el7sat.noarch python-pulp-rpm-common-2.4.3-1.el7sat.noarch python-qpid-0.22-15.el7.noarch python-qpid-qmf-0.22-37.el7.x86_64 qpid-cpp-client-0.22-42.el7.x86_64 qpid-cpp-server-0.22-42.el7.x86_64 qpid-cpp-server-linearstore-0.22-42.el7.x86_64 qpid-java-client-0.22-7.el7.noarch qpid-java-common-0.22-7.el7.noarch qpid-proton-c-0.7-2.el7.x86_64 qpid-qmf-0.22-37.el7.x86_64 qpid-tools-0.22-13.el7.noarch ruby193-rubygem-katello-1.5.0-92.el7sat.noarch rubygem-hammer_cli_katello-0.0.4-14.el7sat.noarch rubygem-smart_proxy_pulp-1.0.1-1.1.el7sat.noarch ruby193-rubygem-passenger-4.0.18-19.el7sat.x86_64 rubygem-passenger-native-4.0.18-19.el7sat.x86_64 rubygem-passenger-native-libs-4.0.18-19.el7sat.x86_64 ruby193-rubygem-passenger-native-4.0.18-19.el7sat.x86_64 rubygem-passenger-4.0.18-19.el7sat.x86_64 ruby193-rubygem-passenger-native-libs-4.0.18-19.el7sat.x86_64 mod_passenger-4.0.18-19.el7sat.x86_64 How reproducible: Always Steps to Reproduce: 1. Create 6 long running api requests, such as searching for a specific host and setting per_page to 4000. 2. Attempt to access the Web UI. 3. Observe all foreman processes are consumed processing a request. Actual results: The UI request must wait until a foreman application process is free to process the UI request. The end user just waits for the page to load without any indication as to why the page isn't loading or that the request has been queued. In my environment this generated page load times on the order of > 120 seconds: Completed 200 OK in 132807ms (Views: 129532.0ms | ActiveRecord: 2641.3ms) Expected results: The UI should be very responsive. Additional info: Top output showing all foreman processes consumed processing long running api request: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 37797 foreman 20 0 1314936 299420 2788 S 99.3 0.2 0:13.35 Passenger RackApp: /usr/share/foreman 37808 foreman 20 0 1315032 299220 2820 S 99.3 0.2 0:12.89 Passenger RackApp: /usr/share/foreman 37819 foreman 20 0 1315128 299124 2816 S 99.3 0.2 0:12.68 Passenger RackApp: /usr/share/foreman 36768 foreman 20 0 1579880 468856 4484 S 99.0 0.4 6:08.84 Passenger RackApp: /usr/share/foreman 37785 foreman 20 0 1314840 300084 2880 S 99.0 0.2 0:13.67 Passenger RackApp: /usr/share/foreman Perhaps there are apache/passenger configuration tweaks to allow the UI to have its own set of processes to provide separate resources for UI requests.