Bug 1417419

Summary: Ruby memory is growing higher during remote execution at scale.
Product: Red Hat Satellite Reporter: Pradeep Kumar Surisetty <psuriset>
Component: Remote ExecutionAssignee: satellite6-bugs <satellite6-bugs>
Status: CLOSED INSUFFICIENT_DATA QA Contact:
Severity: high Docs Contact:
Priority: high    
Version: 6.2.6CC: bbuckingham, bkearney, cdonnell, cduryee, inecas, jcallaha, jhutar, mmccune, pmoravec, psuriset, sshtein
Target Milestone: UnspecifiedKeywords: Performance, Triaged
Target Release: Unused   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: scale_lab
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-08-21 19:46:22 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
ruby mem growth
none
Trends of memory usage over time
none
Trends of memory usage over time
none
UI
none
passenegr growth
none
ruby @36G when passenger mem is close to that
none
Ruby memory growth during rex job: subscription-manager repos --list
none
pgsql memory growth during rex job: subscription-manager repos --list
none
passenger-foreman memory growth during rex job: subscription-manager repos --list none

Description Pradeep Kumar Surisetty 2017-01-29 02:05:30 UTC
Created attachment 1245470 [details]
ruby mem growth

Description of problem:


Started ReX on 6k nodes. (simple date command). During Remote execution Ruby started growing from 1 G to 18GB as shown in attachment this causes swapping & slowness.

My satellite has :   48G mem


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. Start ReX on 6k nodes. 
2.
3.

Actual results:




Expected results:


Additional info:

Comment 1 Pradeep Kumar Surisetty 2017-01-29 02:07:19 UTC
Created attachment 1245471 [details]
Trends of memory usage over time

Comment 2 Pradeep Kumar Surisetty 2017-01-29 04:30:54 UTC
Created attachment 1245500 [details]
Trends of memory usage over time

Comment 3 Pradeep Kumar Surisetty 2017-01-29 04:34:35 UTC
Created attachment 1245501 [details]
UI

Comment 4 Pradeep Kumar Surisetty 2017-01-29 04:43:18 UTC
Attached mem growth over a period of time. 
 
while true; do   (date && ps aux --sort -rss | head -n20) >> /var/log/foreman/ps-aux1.log;   sleep 60; done

foreman, qpidd, pgsql using higher

Comment 5 Pradeep Kumar Surisetty 2017-01-29 05:34:31 UTC
Passenger is biggest contributing factor for this ruby mem growth

Comment 6 Pradeep Kumar Surisetty 2017-01-29 05:35:19 UTC
Created attachment 1245502 [details]
passenegr growth

Comment 7 Pradeep Kumar Surisetty 2017-01-29 05:36:35 UTC
Created attachment 1245503 [details]
ruby @36G when passenger mem is close to that

Comment 12 Ivan Necas 2017-02-21 17:16:51 UTC
Was the testing performed on admin or a non-admin user? I'm asking to check if that could be related to this https://bugzilla.redhat.com/show_bug.cgi?id=1422690

Comment 13 Pradeep Kumar Surisetty 2017-02-21 17:38:34 UTC
(In reply to Ivan Necas from comment #12)
> Was the testing performed on admin or a non-admin user? I'm asking to check
> if that could be related to this
> https://bugzilla.redhat.com/show_bug.cgi?id=1422690

admin user

Comment 14 Pradeep Kumar Surisetty 2017-02-24 06:31:59 UTC
 Ruby memory is growing higher during remote execution at scale
  

     For 1K+ Rex: Ruby jumped from few MBs to 5GB, passeng-foreman jumped from few MB to 4GB
     for  2k+  Rex: Ruby jumped from few MBs to 8GB, passeng-foreman jumped from few MB to 7GB

     If this continues like this, we might need huge memory for 40K hosts.
     This issue will become another major memory concering issue like qpid mem issue.

Comment 16 Pradeep Kumar Surisetty 2017-04-05 03:46:21 UTC
These numbers from a different setup (30k scale setup)

Started Rex job `subscripton-manager repos --list` on 22k hosts 

Ruby mem shooted upto 98G
passenger-foreman upto 90G
postgresql 40G


This is killing most of the katello services

Comment 17 Pradeep Kumar Surisetty 2017-04-05 03:49:35 UTC
Created attachment 1268848 [details]
Ruby memory growth during rex job: subscription-manager repos --list

Comment 18 Pradeep Kumar Surisetty 2017-04-05 03:50:03 UTC
Created attachment 1268849 [details]
pgsql memory growth during rex job: subscription-manager repos --list

Comment 19 Pradeep Kumar Surisetty 2017-04-05 03:50:36 UTC
Created attachment 1268850 [details]
passenger-foreman memory growth during rex job: subscription-manager repos --list

Comment 20 Ivan Necas 2017-04-05 15:13:25 UTC
Pradeep: we need to start distinguishing between different jobs: those interacting with satellite and those that don't, as it might not be clear if it isn't connected with https://bugzilla.redhat.com/show_bug.cgi?id=1434040.

For this bug, only scripts non-interacting with satellite are valid. For the scripts interacting with satellite, we need to track it against different components.

Comment 21 Pradeep Kumar Surisetty 2017-04-05 15:27:54 UTC
sure. i will move this to different bz or check if its connected to 1434040

Comment 22 Shimon Shtein 2017-04-06 08:49:19 UTC
IMHO we are creating a load test for /rhsm/ endpoints:

by running `subscripton-manager repos --list`, each host is generating the following requests to satellite:

/rhsm/consumers/:id/certificates/serials
/rhsm/consumers/:id
/rhsm/consumers/:id/content_overrides
/rhsm/consumers/:id/release

which means Satellite has to deal with 4*(number_of_hosts) requests in a very small time interval. No wonder it's memory is growing up - passenger will probably spawn a huge amount of processes to deal with those requests in parallel.

Comment 23 Pradeep Kumar Surisetty 2017-04-06 12:53:16 UTC
Moving `subscripton-manager repos --list` on 22k hosts  issue to different bug (1439741) to avoid confusion.

Comment 31 Bryan Kearney 2017-08-11 13:41:23 UTC
moving to high as we investigate.