Bug 1163452

Summary:

puppet requests blocking UI

Product:

Red Hat Satellite

Reporter:

Alex Krzos <akrzos>

Component:

Installation

Assignee:

Chris Duryee <cduryee>

Status:

CLOSED ERRATA

QA Contact:

jcallaha

Severity:

urgent

Docs Contact:

Priority:

urgent

Version:

6.0.4

CC:

abalakht, bbuckingham, bkearney, cdonnell, cduryee, dgupte, ehelms, igreen, jcallaha, mburgerh, mjahangi, mmccune, mmello, mrichter, mtenheuv, perfbz, psuriset, rakumar, xdmoon

Target Milestone:

Unspecified

Keywords:

PrioBumpGSS

Target Release:

Unused

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

foreman-installer-1.11.0.11-1,satellite-installer-6.2.0.13-1,rubygem-kafo-0.7.6.1-1

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Clones:

1405533 (view as bug list)

Environment:

Last Closed:

2017-01-26 10:47:38 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

1115190, 1405533

Attachments:

Description	Flags
production.log output on sat6 server.	none
page load time	none

Description Alex Krzos 2014-11-12 17:54:22 UTC

Created attachment 956839 [details]
production.log output on sat6 server.

Description of problem:
Due to the design that the UI shares the same application processes that process API requests, long api request can block a UI transaction.  I anticipate that any request that consumes the foreman application process can block the UI as well.

Version-Release number of selected component (if applicable):
Sat6-GA-Post-Release-Compose3
candlepin-0.9.23.1-1.el7.noarch
candlepin-common-1.0.1-1.el7.noarch
candlepin-guice-3.0-2_redhat_1.el7.noarch
candlepin-scl-1-5.el7.noarch
candlepin-scl-quartz-2.1.5-6.el7.noarch
candlepin-scl-rhino-1.7R3-3.el7.noarch
candlepin-scl-runtime-1-5.el7.noarch
candlepin-selinux-0.9.23.1-1.el7.noarch
candlepin-tomcat-0.9.23.1-1.el7.noarch
elasticsearch-0.90.10-6.el7sat.noarch
katello-1.5.0-30.el7sat.noarch
katello-certs-tools-1.5.6-1.el7sat.noarch
katello-default-ca-1.0-1.noarch
katello-installer-0.0.64-1.el7sat.noarch
katello-server-ca-1.0-1.noarch
perfc-380g8-01.perf.lab.eng.rdu.redhat.com-qpid-broker-1.0-1.noarch
perfc-380g8-01.perf.lab.eng.rdu.redhat.com-qpid-client-cert-1.0-1.noarch
pulp-katello-0.3-4.el7sat.noarch
pulp-nodes-common-2.4.3-0.1.beta.el7sat.noarch
pulp-nodes-parent-2.4.3-0.1.beta.el7sat.noarch
pulp-puppet-plugins-2.4.3-1.el7sat.noarch
pulp-puppet-tools-2.4.3-1.el7sat.noarch
pulp-rpm-plugins-2.4.3-1.el7sat.noarch
pulp-selinux-2.4.3-1.el7sat.noarch
pulp-server-2.4.3-1.el7sat.noarch
python-gofer-qpid-1.3.0-1.el7sat.noarch
python-isodate-0.5.0-1.pulp.el7sat.noarch
python-kombu-3.0.15-12.pulp.el7sat.noarch
python-pulp-bindings-2.4.3-1.el7sat.noarch
python-pulp-common-2.4.3-1.el7sat.noarch
python-pulp-puppet-common-2.4.3-1.el7sat.noarch
python-pulp-rpm-common-2.4.3-1.el7sat.noarch
python-qpid-0.22-15.el7.noarch
python-qpid-qmf-0.22-37.el7.x86_64
qpid-cpp-client-0.22-42.el7.x86_64
qpid-cpp-server-0.22-42.el7.x86_64
qpid-cpp-server-linearstore-0.22-42.el7.x86_64
qpid-java-client-0.22-7.el7.noarch
qpid-java-common-0.22-7.el7.noarch
qpid-proton-c-0.7-2.el7.x86_64
qpid-qmf-0.22-37.el7.x86_64
qpid-tools-0.22-13.el7.noarch
ruby193-rubygem-katello-1.5.0-92.el7sat.noarch
rubygem-hammer_cli_katello-0.0.4-14.el7sat.noarch
rubygem-smart_proxy_pulp-1.0.1-1.1.el7sat.noarch

ruby193-rubygem-passenger-4.0.18-19.el7sat.x86_64
rubygem-passenger-native-4.0.18-19.el7sat.x86_64
rubygem-passenger-native-libs-4.0.18-19.el7sat.x86_64
ruby193-rubygem-passenger-native-4.0.18-19.el7sat.x86_64
rubygem-passenger-4.0.18-19.el7sat.x86_64
ruby193-rubygem-passenger-native-libs-4.0.18-19.el7sat.x86_64
mod_passenger-4.0.18-19.el7sat.x86_64

How reproducible:
Always

Steps to Reproduce:
1. Create 6 long running api requests, such as searching for a specific host and setting per_page to 4000.
2. Attempt to access the Web UI.
3. Observe all foreman processes are consumed processing a request.

Actual results:
The UI request must wait until a foreman application process is free to process the UI request.  The end user just waits for the page to load without any indication as to why the page isn't loading or that the request has been queued.  In my environment this generated page load times on the order of > 120 seconds:
Completed 200 OK in 132807ms (Views: 129532.0ms | ActiveRecord: 2641.3ms)

Expected results:
The UI should be very responsive.

Additional info:

Top output showing all foreman processes consumed processing long running api request:

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
37797 foreman   20   0 1314936 299420   2788 S  99.3  0.2   0:13.35 Passenger RackApp: /usr/share/foreman
37808 foreman   20   0 1315032 299220   2820 S  99.3  0.2   0:12.89 Passenger RackApp: /usr/share/foreman
37819 foreman   20   0 1315128 299124   2816 S  99.3  0.2   0:12.68 Passenger RackApp: /usr/share/foreman
36768 foreman   20   0 1579880 468856   4484 S  99.0  0.4   6:08.84 Passenger RackApp: /usr/share/foreman
37785 foreman   20   0 1314840 300084   2880 S  99.0  0.2   0:13.67 Passenger RackApp: /usr/share/foreman

Perhaps there are apache/passenger configuration tweaks to allow the UI to have its own set of processes to provide separate resources for UI requests.

Comment 1 RHEL Program Management 2014-11-12 18:13:29 UTC

Since this issue was entered in Red Hat Bugzilla, the release flag has been
set to ? to ensure that it is properly evaluated for this release.

Comment 4 Chris Duryee 2016-03-01 22:37:52 UTC

Satellite is currently configured to handle six simulateneous requests to passenger. Additional requests will be read by apache but sit in W state until a worker is available.

Here is how I tested:

edit /usr/share/foreman/app/controllers/api/v2/users_controller.rb and alter the "index" function to be as such:

     def index
        sleep(20)
        @users = resource_scope_for_index
      end


This will make API calls for the user list take 20 seconds, which gives plenty of time to load up the workers via hammer and watch the results.

Then, make hammer calls:

hammer -u admin -p <password> user list &

Note that the first time you do this, it will take a little longer since additional workers need to spin up. Subsequent to the first run, you'll see the first X calls take 20-ish seconds, but additional calls will be serviced after the first batch drain out, or will time out.

One fix for this is to have multiple pools of workers, but I think being able to alter the PassengerMaxPoolSize setting via the installer will be sufficient, and servicing API and UI calls from the same pool vs two pools.

Comment 5 Bryan Kearney 2016-03-09 23:03:09 UTC

Upstream bug component is Provisioning

Comment 6 Bryan Kearney 2016-03-10 09:03:14 UTC

Upstream bug component is Installer

Comment 7 Justin Sherrill 2016-05-16 17:48:37 UTC

Connecting redmine issue http://projects.theforeman.org/issues/14127 from this bug

Comment 9 Chris Duryee 2016-09-20 18:29:15 UTC

*** Bug 1329544 has been marked as a duplicate of this bug. ***

Comment 10 Chris Duryee 2016-09-26 14:59:12 UTC

*** Bug 1333291 has been marked as a duplicate of this bug. ***

Comment 11 Chris Duryee 2016-10-20 18:41:28 UTC

example conf addition that guarantees a certain # of workers for web UI requests


# cat /etc/httpd/conf.d/passenger_tunings.conf
# this file contains additional tunings for passenger

<IfModule mod_passenger.c>
   # PassengerMaxPoolSize is dependent on your memory and number of
   # cores. It is 6 by default.
   PassengerMaxPoolSize 10
   # PassengerMaxInstancesPerApp depends on server utilization, but 1/2 of
   # PassengerMaxPoolSize is a good starting point
   PassengerMaxInstancesPerApp 5
   PassengerMaxRequestQueueSize 250
   PassengerStatThrottleRate 120
</IfModule>

Comment 16 Chris Duryee 2016-11-11 15:47:14 UTC

*** Bug 1356285 has been marked as a duplicate of this bug. ***

Comment 18 Bryan Kearney 2016-11-29 13:22:07 UTC

Moving this bug to POST for triage into Satellite 6 since the upstream issue http://projects.theforeman.org/issues/14023 has been resolved.

Comment 19 Ohad Levy 2016-11-30 12:33:29 UTC

*** Bug 1360406 has been marked as a duplicate of this bug. ***

Comment 20 jcallaha 2017-01-16 21:32:09 UTC

I'm still seeing quite a bit of slowdown in Satellite 6.2.7 Snap 1.

I ran 6 concurrent API host searches, as seen below, on a pretty beefy system 100k+ hosts. During the searches, an example page went from about 1.1s load times to just over 12s. Is there some kind of configuration i should be doing, or is this fix supposed to be built-in to the release?

Attaching screenshot of load times.

-bash-4.2# for i in {1..6}; do docker run -d ch-d:rhel7 curl -X GET -k -u admin:changeme -H "Content-Type: application/json" "https://rhsm-qe-1.rhq.lab.eng.bos.redhat.com/api/v2/hosts?search=name~virt&per_page=1000";done;
3cb4e34f3a1fd33724b0dd8f15d8495f7b238ac61d67fb8c78bb0a1f05f55afb
14c05b2d075b267b51da1b8b4b8dcf1341207051135efd593a761ea20e106034
905b4b2b4592af1c437a3145d056db1143c1c325d590884a6be9794e632c7a7a
c4f09546053a6025944c49958aabde321a26dd9e38e87d54b1b0d02a136bbe4b
0dfc6b9662a45937196ed0d527ebb674685998d2c911b67742276535e66f1370
64914c57b7293b6567f92dcc43fe6271d98599161cd28f2df75b0853e08d4743

{
  "total": 101906,
  "subtotal": 99609,
  "page": 1,
  "per_page": 1000,
  "search": "name~virt",
  "sort": {
    "by": null,
    "order": null
  },
  "results": [
....

Comment 21 jcallaha 2017-01-16 21:32:40 UTC

Created attachment 1241412 [details]
page load time

Comment 26 jcallaha 2017-01-18 16:23:01 UTC

Verified in Satellite 6.2.7 Snap 2

Followed steps in #20, scaled the number of simultaneous requests to 6, 8, 10, 12, 14, 16. For everything less than 12 requests, there was an occasional lag in page load time (6-12s) right after the requests were kicked off. However immediately after, page load times were back to normal.

When I submitted 12 or more simultaneous requests, the page load times jumped to upwards of a minute, which is expected as all the passengers were consumed. After upping the PassengerMaxPoolSize to 20, the performance returned to what was seen with less than 12 simultaneous requests.

default configuration

-bash-4.2# cat /etc/httpd/conf.modules.d/passenger_extra.conf
# The Passenger Apache module configuration file is being
# managed by Puppet and changes will be overwritten.
<IfModule mod_passenger.c>
  PassengerMaxPoolSize 12
  PassengerMaxInstancesPerApp 6
  PassengerMaxRequestQueueSize 250
  PassengerStatThrottleRate 120
</IfModule>

Comment 28 errata-xmlrpc 2017-01-26 10:47:38 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:0197

Comment 29 Stephen Benjamin 2017-02-06 21:27:11 UTC

*** Bug 1384260 has been marked as a duplicate of this bug. ***

Comment 30 Bryan Kearney 2017-02-06 21:42:29 UTC

*** Bug 1377060 has been marked as a duplicate of this bug. ***