1333630 – Apache has 7x number of open files on capsule with satellite 6.2 compare to satellite 6.1

Red Hat Satellite engineering is moving the tracking of its product development work on Satellite to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "Satellite project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs will be migrated starting at the end of May. If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "Satellite project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/SAT-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1333630 - Apache has 7x number of open files on capsule with satellite 6.2 compare to satellite 6.1

Summary: Apache has 7x number of open files on capsule with satellite 6.2 compare to s...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Satellite
Classification:	Red Hat
Component:	Upgrades
Sub Component:
Version:	6.2.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	low
Target Milestone:	Unspecified
Assignee:	Justin Sherrill
QA Contact:	Renzo Nuccitelli
Docs Contact:
URL:	http://projects.theforeman.org/issues...
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2016-05-06 04:51 UTC by Jan Hutař
Modified:	2019-09-26 16:20 UTC (History)
CC List:	10 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-02-21 16:54:37 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Foreman Issue Tracker	15841	0	None	None	None	2016-07-26 14:05:09 UTC
Foreman Issue Tracker	21430	0	None	None	None	2017-10-23 15:00:06 UTC

Description Jan Hutař 2016-05-06 04:51:20 UTC

Description of problem:
Apache on 6.1 capsule have about 5k open files, on 6.2 it have 38k. It seems suspicious.


Version-Release number of selected component (if applicable):
Capsule61: capsule-installer-2.3.25-1.el7sat.noarch
Capsule62: satellite-capsule-6.2.0-9.0.beta.el7sat.noarch


How reproducible:
always


Steps to Reproduce:
1. Restart services with `katello-services restart`
2. # lsof | wc -l


Actual results:
Capsule61: 19624
Capsule62: 51776


Expected results:
We should be sure this is expected

Comment 1 Jan Hutař 2016-05-06 04:53:09 UTC

[root@capsule61 ~]# lsof | cut -d ' ' -f 1 | sort | uniq -c | sort -n | tail
    201 ruby
    207 gmain
    344 tuned
    348 qdrouterd
    447 Passenger
    888 qpidd
   1628 mongod
   2970 python
   4119 httpd
   6636 celery

Comment 2 Jan Hutař 2016-05-06 04:53:20 UTC

[root@capsule62 ~]# lsof | cut -d ' ' -f 1 | sort | uniq -c | sort -n | tail
    257 ruby-time
    344 tuned
    348 qdrouterd
    453 pulp_stre
    517 ruby
    571 Passenger
    920 qpidd
   3549 mongod
   6171 celery
  36966 httpd

Comment 5 Justin Sherrill 2016-06-03 13:13:32 UTC

Offending processes: 

[root@satellite ~]# lsof | grep httpd | awk '{print $2}' |  sort   | uniq --count
    121 11642
    121 15371
    121 15700
    121 15752
    124 6627
   5412 6648
   5346 6649
   5456 6650
   2790 6651
   2772 6652
   2772 6653
   4300 6654
   4320 6655
   4300 6656
      8 6657
     26 6660
      6 6668
    121 6675
    121 6676
    121 6677
    121 6678
    121 6679
    121 6680
    121 6681
    121 6682
    121 7904
    121 7933
    121 7966


apache    6648  0.0  0.4 1089156 74440 ?       Sl   Jun02   0:42 (wsgi:pulp)     -DFOREGROUND
apache    6649  0.0  0.4 1089160 71748 ?       Sl   Jun02   0:41 (wsgi:pulp)     -DFOREGROUND
apache    6650  0.0  0.4 1089152 69776 ?       Sl   Jun02   0:41 (wsgi:pulp)     -DFOREGROUND
apache    6651  0.0  0.2 684948 34280 ?        Sl   Jun02   0:02 (wsgi:pulp-cont -DFOREGROUND
apache    6652  0.0  0.1 684948 31604 ?        Sl   Jun02   0:02 (wsgi:pulp-cont -DFOREGROUND
apache    6653  0.0  0.1 816020 32160 ?        Sl   Jun02   0:02 (wsgi:pulp-cont -DFOREGROUND
apache    6654  0.0  0.3 797128 59072 ?        Sl   Jun02   0:14 (wsgi:pulp_forg -DFOREGROUND
apache    6655  0.0  0.3 862664 59072 ?        Sl   Jun02   0:14 (wsgi:pulp_forg -DFOREGROUND
apache    6656  0.0  0.3 797128 59072 ?        Sl   Jun02   0:14 (wsgi:pulp_forg -DFOREGROUND


whereas in 6.2 there was only 1 wsgi process.

Same version of mod_wsgi installed.  Will peak into the httpd configs

Comment 6 Justin Sherrill 2016-06-03 13:18:07 UTC

Likely the culprit:

6.2:

[root@satellite httpd]# grep WSGIProcessGroup /etc/httpd/conf.d/*
/etc/httpd/conf.d/pulp.conf:WSGIProcessGroup pulp
/etc/httpd/conf.d/pulp.conf:    WSGIProcessGroup pulp
/etc/httpd/conf.d/pulp_content.conf:WSGIProcessGroup pulp-content
/etc/httpd/conf.d/pulp_content.conf:    WSGIProcessGroup pulp-content
/etc/httpd/conf.d/pulp_puppet.conf:WSGIProcessGroup pulp_forge


6.1:

[root@sat-perf-02 6.2_conf]# grep WSGIProcessGroup /etc/httpd/conf.d/*
/etc/httpd/conf.d/pulp.conf:WSGIProcessGroup pulp
/etc/httpd/conf.d/pulp.conf:    WSGIProcessGroup pulp

Comment 7 Justin Sherrill 2016-06-03 13:31:52 UTC

After chatting with mhrivnak, these changes have actually been in pulp for a while, but katello/satellite hadn't pulled them in.  

Pulp is now configured to use multiple wsgi processes for each app.  There are 3 main apps:

pulp (the api)
pulp_content (app that handles content fetching, facilitates lazy sync)
pulp_forge (serves puppet content via the forge api)

In 6.1 all of these were served by a single process, but now they each have 3 processes (which is why there are 9 total).  This allows for concurrent requests to pulp.  

We could decrease the number of pulp_forge wsgi processes to just 1 as satellite really isn't using this feature very much, but I (and pthe ulp team) would recommend keeping the others as is.  

Bumping that down to 1 is likely not required for 6.2 and could be pushed to another release.  It would likely free up ~12K files.

Comment 15 Justin Sherrill 2016-07-26 13:12:21 UTC

Created redmine issue http://projects.theforeman.org/issues/15841 from this bug

Comment 16 Justin Sherrill 2016-07-26 13:30:45 UTC

Incoming fix will keep 7 WSGI processes rather than the current 9.  We will lower the number of pulp_puppet processes from 3 to 1 as we rarely use that functionality.  

The main pulp and pulp_content wsgi processes are much more important and likely will lead to some performance improvements over 6.1

Comment 20 Daniel Lobato Garcia 2017-08-23 09:52:01 UTC

Failed.

               Version Tested:

               Satellite-6.3 Snap X ( where 'X' is the snap number e.g. Snap1)

               While logging to Satellite via non-admin UI, an UI error is seen as shown in screenshot.

              "

Comment 21 Daniel Lobato Garcia 2017-08-23 10:01:19 UTC

Sorry for the previous message - I was testing on a full blown Satellite, not just a capsule

Comment 22 Corey Welton 2017-10-23 14:36:41 UTC

FYI currently seeing the following in Capsule63

[root@cloud-qe-04 ~]# lsof|wc -l
58629

Comment 23 Corey Welton 2017-10-23 14:53:49 UTC

[root@cloud-qe-04 ~]# lsof | cut -d ' ' -f 1 | sort | uniq -c | sort -n | tail 
    403 ruby-time
    455 named
    462 pulp_stre
    668 Passenger
    702 qpidd
   1233 ruby
   2160 libvirtd
   4700 mongod
   6703 celery
  37671 httpd


Failed in SNAP 20

Comment 25 Renzo Nuccitelli 2017-11-01 10:04:18 UTC

Result on 6.3 snap 22:

[~]# lsof|wc -l
47129

[~]# lsof | cut -d ' ' -f 1 | sort | uniq -c | sort -n | tail
    262 ruby-time
    330 qdrouterd
    356 tuned
    462 pulp_stre
    568 Passenger
    600 ruby
   1200 qpidd
   4050 mongod
   8072 celery
  28476 httpd

Since it's ~ 11k less then comment #22 but only ~3k less then number provided on descriptions, I'm not sure this is enough or not. Could you guys provide some guidance about the threshold we are looking for on this BZ?

Comment 26 Justin Sherrill 2017-11-01 13:13:16 UTC

Renzo,  What you are seeing is expected.  In 6.1, only one wsgi process was used to handle requests per python app for pulp.  In 6.2, we increased that to 3 for each app, at least tripling the number of open files held by apache.  It was realized that for one of the apps, we really only needed 1 process, so it was only reduced by about 20% in theory.  Looks like you're seeing a bit more than that.  But regardless, we are not expected to reduce it back down to 6.1 levels.

Comment 27 Renzo Nuccitelli 2017-11-01 13:52:12 UTC

Justin, thanks for the quick answer. Thus because of comment #26 I moving this BZ to VERIFIED.

Comment 29 Satellite Program 2018-02-21 16:54:37 UTC

Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA.
> > 
> > For information on the advisory, and where to find the updated files, follow the link below.
> > 
> > If the solution does not work for you, open a new bug report.
> > 
> > https://access.redhat.com/errata/RHSA-2018:0336

Note You need to log in before you can comment on or make changes to this bug.