Bug 1417419 - Ruby memory is growing higher during remote execution at scale.
Summary: Ruby memory is growing higher during remote execution at scale.
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Red Hat Satellite
Classification: Red Hat
Component: Remote Execution
Version: 6.2.6
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: Unspecified
Assignee: satellite6-bugs
QA Contact:
URL:
Whiteboard: scale_lab
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-01-29 02:05 UTC by Pradeep Kumar Surisetty
Modified: 2017-08-21 19:46 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-08-21 19:46:22 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
ruby mem growth (71.37 KB, image/png)
2017-01-29 02:05 UTC, Pradeep Kumar Surisetty
no flags Details
Trends of memory usage over time (54.64 KB, text/plain)
2017-01-29 02:07 UTC, Pradeep Kumar Surisetty
no flags Details
Trends of memory usage over time (11.95 KB, text/plain)
2017-01-29 04:30 UTC, Pradeep Kumar Surisetty
no flags Details
UI (122.04 KB, image/png)
2017-01-29 04:34 UTC, Pradeep Kumar Surisetty
no flags Details
passenegr growth (134.14 KB, image/png)
2017-01-29 05:35 UTC, Pradeep Kumar Surisetty
no flags Details
ruby @36G when passenger mem is close to that (75.64 KB, image/png)
2017-01-29 05:36 UTC, Pradeep Kumar Surisetty
no flags Details
Ruby memory growth during rex job: subscription-manager repos --list (78.53 KB, image/png)
2017-04-05 03:49 UTC, Pradeep Kumar Surisetty
no flags Details
pgsql memory growth during rex job: subscription-manager repos --list (70.56 KB, image/png)
2017-04-05 03:50 UTC, Pradeep Kumar Surisetty
no flags Details
passenger-foreman memory growth during rex job: subscription-manager repos --list (125.15 KB, image/png)
2017-04-05 03:50 UTC, Pradeep Kumar Surisetty
no flags Details

Description Pradeep Kumar Surisetty 2017-01-29 02:05:30 UTC
Created attachment 1245470 [details]
ruby mem growth

Description of problem:


Started ReX on 6k nodes. (simple date command). During Remote execution Ruby started growing from 1 G to 18GB as shown in attachment this causes swapping & slowness.

My satellite has :   48G mem


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. Start ReX on 6k nodes. 
2.
3.

Actual results:




Expected results:


Additional info:

Comment 1 Pradeep Kumar Surisetty 2017-01-29 02:07:19 UTC
Created attachment 1245471 [details]
Trends of memory usage over time

Comment 2 Pradeep Kumar Surisetty 2017-01-29 04:30:54 UTC
Created attachment 1245500 [details]
Trends of memory usage over time

Comment 3 Pradeep Kumar Surisetty 2017-01-29 04:34:35 UTC
Created attachment 1245501 [details]
UI

Comment 4 Pradeep Kumar Surisetty 2017-01-29 04:43:18 UTC
Attached mem growth over a period of time. 
 
while true; do   (date && ps aux --sort -rss | head -n20) >> /var/log/foreman/ps-aux1.log;   sleep 60; done

foreman, qpidd, pgsql using higher

Comment 5 Pradeep Kumar Surisetty 2017-01-29 05:34:31 UTC
Passenger is biggest contributing factor for this ruby mem growth

Comment 6 Pradeep Kumar Surisetty 2017-01-29 05:35:19 UTC
Created attachment 1245502 [details]
passenegr growth

Comment 7 Pradeep Kumar Surisetty 2017-01-29 05:36:35 UTC
Created attachment 1245503 [details]
ruby @36G when passenger mem is close to that

Comment 12 Ivan Necas 2017-02-21 17:16:51 UTC
Was the testing performed on admin or a non-admin user? I'm asking to check if that could be related to this https://bugzilla.redhat.com/show_bug.cgi?id=1422690

Comment 13 Pradeep Kumar Surisetty 2017-02-21 17:38:34 UTC
(In reply to Ivan Necas from comment #12)
> Was the testing performed on admin or a non-admin user? I'm asking to check
> if that could be related to this
> https://bugzilla.redhat.com/show_bug.cgi?id=1422690

admin user

Comment 14 Pradeep Kumar Surisetty 2017-02-24 06:31:59 UTC
 Ruby memory is growing higher during remote execution at scale
  

     For 1K+ Rex: Ruby jumped from few MBs to 5GB, passeng-foreman jumped from few MB to 4GB
     for  2k+  Rex: Ruby jumped from few MBs to 8GB, passeng-foreman jumped from few MB to 7GB

     If this continues like this, we might need huge memory for 40K hosts.
     This issue will become another major memory concering issue like qpid mem issue.

Comment 16 Pradeep Kumar Surisetty 2017-04-05 03:46:21 UTC
These numbers from a different setup (30k scale setup)

Started Rex job `subscripton-manager repos --list` on 22k hosts 

Ruby mem shooted upto 98G
passenger-foreman upto 90G
postgresql 40G


This is killing most of the katello services

Comment 17 Pradeep Kumar Surisetty 2017-04-05 03:49:35 UTC
Created attachment 1268848 [details]
Ruby memory growth during rex job: subscription-manager repos --list

Comment 18 Pradeep Kumar Surisetty 2017-04-05 03:50:03 UTC
Created attachment 1268849 [details]
pgsql memory growth during rex job: subscription-manager repos --list

Comment 19 Pradeep Kumar Surisetty 2017-04-05 03:50:36 UTC
Created attachment 1268850 [details]
passenger-foreman memory growth during rex job: subscription-manager repos --list

Comment 20 Ivan Necas 2017-04-05 15:13:25 UTC
Pradeep: we need to start distinguishing between different jobs: those interacting with satellite and those that don't, as it might not be clear if it isn't connected with https://bugzilla.redhat.com/show_bug.cgi?id=1434040.

For this bug, only scripts non-interacting with satellite are valid. For the scripts interacting with satellite, we need to track it against different components.

Comment 21 Pradeep Kumar Surisetty 2017-04-05 15:27:54 UTC
sure. i will move this to different bz or check if its connected to 1434040

Comment 22 Shimon Shtein 2017-04-06 08:49:19 UTC
IMHO we are creating a load test for /rhsm/ endpoints:

by running `subscripton-manager repos --list`, each host is generating the following requests to satellite:

/rhsm/consumers/:id/certificates/serials
/rhsm/consumers/:id
/rhsm/consumers/:id/content_overrides
/rhsm/consumers/:id/release

which means Satellite has to deal with 4*(number_of_hosts) requests in a very small time interval. No wonder it's memory is growing up - passenger will probably spawn a huge amount of processes to deal with those requests in parallel.

Comment 23 Pradeep Kumar Surisetty 2017-04-06 12:53:16 UTC
Moving `subscripton-manager repos --list` on 22k hosts  issue to different bug (1439741) to avoid confusion.

Comment 31 Bryan Kearney 2017-08-11 13:41:23 UTC
moving to high as we investigate.


Note You need to log in before you can comment on or make changes to this bug.