1417419 – Ruby memory is growing higher during remote execution at scale.

Red Hat Satellite engineering is moving the tracking of its product development work on Satellite to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "Satellite project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs will be migrated starting at the end of May. If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "Satellite project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/SAT-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1417419 - Ruby memory is growing higher during remote execution at scale.

Summary: Ruby memory is growing higher during remote execution at scale.

Keywords:
Status:	CLOSED INSUFFICIENT_DATA
Alias:	None
Product:	Red Hat Satellite
Classification:	Red Hat
Component:	Remote Execution
Sub Component:
Version:	6.2.6
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	Unspecified
Assignee:	satellite6-bugs
QA Contact:
Docs Contact:
URL:
Whiteboard:	scale_lab
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-01-29 02:05 UTC by Pradeep Kumar Surisetty
Modified:	2017-08-21 19:46 UTC (History)
CC List:	11 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2017-08-21 19:46:22 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
ruby mem growth (71.37 KB, image/png) 2017-01-29 02:05 UTC, Pradeep Kumar Surisetty	no flags	Details
Trends of memory usage over time (54.64 KB, text/plain) 2017-01-29 02:07 UTC, Pradeep Kumar Surisetty	no flags	Details
Trends of memory usage over time (11.95 KB, text/plain) 2017-01-29 04:30 UTC, Pradeep Kumar Surisetty	no flags	Details
UI (122.04 KB, image/png) 2017-01-29 04:34 UTC, Pradeep Kumar Surisetty	no flags	Details
passenegr growth (134.14 KB, image/png) 2017-01-29 05:35 UTC, Pradeep Kumar Surisetty	no flags	Details
ruby @36G when passenger mem is close to that (75.64 KB, image/png) 2017-01-29 05:36 UTC, Pradeep Kumar Surisetty	no flags	Details
Ruby memory growth during rex job: subscription-manager repos --list (78.53 KB, image/png) 2017-04-05 03:49 UTC, Pradeep Kumar Surisetty	no flags	Details
pgsql memory growth during rex job: subscription-manager repos --list (70.56 KB, image/png) 2017-04-05 03:50 UTC, Pradeep Kumar Surisetty	no flags	Details
passenger-foreman memory growth during rex job: subscription-manager repos --list (125.15 KB, image/png) 2017-04-05 03:50 UTC, Pradeep Kumar Surisetty	no flags	Details
View All

Description Pradeep Kumar Surisetty 2017-01-29 02:05:30 UTC

Created attachment 1245470 [details]
ruby mem growth

Description of problem:


Started ReX on 6k nodes. (simple date command). During Remote execution Ruby started growing from 1 G to 18GB as shown in attachment this causes swapping & slowness.

My satellite has :   48G mem


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. Start ReX on 6k nodes. 
2.
3.

Actual results:




Expected results:


Additional info:

Comment 1 Pradeep Kumar Surisetty 2017-01-29 02:07:19 UTC

Created attachment 1245471 [details]
Trends of memory usage over time

Comment 2 Pradeep Kumar Surisetty 2017-01-29 04:30:54 UTC

Created attachment 1245500 [details]
Trends of memory usage over time

Comment 3 Pradeep Kumar Surisetty 2017-01-29 04:34:35 UTC

Created attachment 1245501 [details]
UI

Comment 4 Pradeep Kumar Surisetty 2017-01-29 04:43:18 UTC

Attached mem growth over a period of time. 
 
while true; do   (date && ps aux --sort -rss | head -n20) >> /var/log/foreman/ps-aux1.log;   sleep 60; done

foreman, qpidd, pgsql using higher

Comment 5 Pradeep Kumar Surisetty 2017-01-29 05:34:31 UTC

Passenger is biggest contributing factor for this ruby mem growth

Comment 6 Pradeep Kumar Surisetty 2017-01-29 05:35:19 UTC

Created attachment 1245502 [details]
passenegr growth

Comment 7 Pradeep Kumar Surisetty 2017-01-29 05:36:35 UTC

Created attachment 1245503 [details]
ruby @36G when passenger mem is close to that

Comment 12 Ivan Necas 2017-02-21 17:16:51 UTC

Was the testing performed on admin or a non-admin user? I'm asking to check if that could be related to this https://bugzilla.redhat.com/show_bug.cgi?id=1422690

Comment 13 Pradeep Kumar Surisetty 2017-02-21 17:38:34 UTC

(In reply to Ivan Necas from comment #12)
> Was the testing performed on admin or a non-admin user? I'm asking to check
> if that could be related to this
> https://bugzilla.redhat.com/show_bug.cgi?id=1422690

admin user

Comment 14 Pradeep Kumar Surisetty 2017-02-24 06:31:59 UTC

 Ruby memory is growing higher during remote execution at scale
  

     For 1K+ Rex: Ruby jumped from few MBs to 5GB, passeng-foreman jumped from few MB to 4GB
     for  2k+  Rex: Ruby jumped from few MBs to 8GB, passeng-foreman jumped from few MB to 7GB

     If this continues like this, we might need huge memory for 40K hosts.
     This issue will become another major memory concering issue like qpid mem issue.

Comment 16 Pradeep Kumar Surisetty 2017-04-05 03:46:21 UTC

These numbers from a different setup (30k scale setup)

Started Rex job `subscripton-manager repos --list` on 22k hosts 

Ruby mem shooted upto 98G
passenger-foreman upto 90G
postgresql 40G


This is killing most of the katello services

Comment 17 Pradeep Kumar Surisetty 2017-04-05 03:49:35 UTC

Created attachment 1268848 [details]
Ruby memory growth during rex job: subscription-manager repos --list

Comment 18 Pradeep Kumar Surisetty 2017-04-05 03:50:03 UTC

Created attachment 1268849 [details]
pgsql memory growth during rex job: subscription-manager repos --list

Comment 19 Pradeep Kumar Surisetty 2017-04-05 03:50:36 UTC

Created attachment 1268850 [details]
passenger-foreman memory growth during rex job: subscription-manager repos --list

Comment 20 Ivan Necas 2017-04-05 15:13:25 UTC

Pradeep: we need to start distinguishing between different jobs: those interacting with satellite and those that don't, as it might not be clear if it isn't connected with https://bugzilla.redhat.com/show_bug.cgi?id=1434040.

For this bug, only scripts non-interacting with satellite are valid. For the scripts interacting with satellite, we need to track it against different components.

Comment 21 Pradeep Kumar Surisetty 2017-04-05 15:27:54 UTC

sure. i will move this to different bz or check if its connected to 1434040

Comment 22 Shimon Shtein 2017-04-06 08:49:19 UTC

IMHO we are creating a load test for /rhsm/ endpoints:

by running `subscripton-manager repos --list`, each host is generating the following requests to satellite:

/rhsm/consumers/:id/certificates/serials
/rhsm/consumers/:id
/rhsm/consumers/:id/content_overrides
/rhsm/consumers/:id/release

which means Satellite has to deal with 4*(number_of_hosts) requests in a very small time interval. No wonder it's memory is growing up - passenger will probably spawn a huge amount of processes to deal with those requests in parallel.

Comment 23 Pradeep Kumar Surisetty 2017-04-06 12:53:16 UTC

Moving `subscripton-manager repos --list` on 22k hosts  issue to different bug (1439741) to avoid confusion.

Comment 31 Bryan Kearney 2017-08-11 13:41:23 UTC

moving to high as we investigate.

Note You need to log in before you can comment on or make changes to this bug.