Bug 1417419
Summary: | Ruby memory is growing higher during remote execution at scale. | ||
---|---|---|---|
Product: | Red Hat Satellite | Reporter: | Pradeep Kumar Surisetty <psuriset> |
Component: | Remote Execution | Assignee: | satellite6-bugs <satellite6-bugs> |
Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 6.2.6 | CC: | bbuckingham, bkearney, cdonnell, cduryee, inecas, jcallaha, jhutar, mmccune, pmoravec, psuriset, sshtein |
Target Milestone: | Unspecified | Keywords: | Performance, Triaged |
Target Release: | Unused | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | scale_lab | ||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2017-08-21 19:46:22 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Attachments: |
Created attachment 1245471 [details]
Trends of memory usage over time
Created attachment 1245500 [details]
Trends of memory usage over time
Created attachment 1245501 [details]
UI
Attached mem growth over a period of time. while true; do (date && ps aux --sort -rss | head -n20) >> /var/log/foreman/ps-aux1.log; sleep 60; done foreman, qpidd, pgsql using higher Passenger is biggest contributing factor for this ruby mem growth Created attachment 1245502 [details]
passenegr growth
Created attachment 1245503 [details]
ruby @36G when passenger mem is close to that
Was the testing performed on admin or a non-admin user? I'm asking to check if that could be related to this https://bugzilla.redhat.com/show_bug.cgi?id=1422690 (In reply to Ivan Necas from comment #12) > Was the testing performed on admin or a non-admin user? I'm asking to check > if that could be related to this > https://bugzilla.redhat.com/show_bug.cgi?id=1422690 admin user Ruby memory is growing higher during remote execution at scale For 1K+ Rex: Ruby jumped from few MBs to 5GB, passeng-foreman jumped from few MB to 4GB for 2k+ Rex: Ruby jumped from few MBs to 8GB, passeng-foreman jumped from few MB to 7GB If this continues like this, we might need huge memory for 40K hosts. This issue will become another major memory concering issue like qpid mem issue. These numbers from a different setup (30k scale setup) Started Rex job `subscripton-manager repos --list` on 22k hosts Ruby mem shooted upto 98G passenger-foreman upto 90G postgresql 40G This is killing most of the katello services Created attachment 1268848 [details]
Ruby memory growth during rex job: subscription-manager repos --list
Created attachment 1268849 [details]
pgsql memory growth during rex job: subscription-manager repos --list
Created attachment 1268850 [details]
passenger-foreman memory growth during rex job: subscription-manager repos --list
Pradeep: we need to start distinguishing between different jobs: those interacting with satellite and those that don't, as it might not be clear if it isn't connected with https://bugzilla.redhat.com/show_bug.cgi?id=1434040. For this bug, only scripts non-interacting with satellite are valid. For the scripts interacting with satellite, we need to track it against different components. sure. i will move this to different bz or check if its connected to 1434040 IMHO we are creating a load test for /rhsm/ endpoints: by running `subscripton-manager repos --list`, each host is generating the following requests to satellite: /rhsm/consumers/:id/certificates/serials /rhsm/consumers/:id /rhsm/consumers/:id/content_overrides /rhsm/consumers/:id/release which means Satellite has to deal with 4*(number_of_hosts) requests in a very small time interval. No wonder it's memory is growing up - passenger will probably spawn a huge amount of processes to deal with those requests in parallel. Moving `subscripton-manager repos --list` on 22k hosts issue to different bug (1439741) to avoid confusion. moving to high as we investigate. |
Created attachment 1245470 [details] ruby mem growth Description of problem: Started ReX on 6k nodes. (simple date command). During Remote execution Ruby started growing from 1 G to 18GB as shown in attachment this causes swapping & slowness. My satellite has : 48G mem Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. Start ReX on 6k nodes. 2. 3. Actual results: Expected results: Additional info: