Created attachment 1245470 [details] ruby mem growth Description of problem: Started ReX on 6k nodes. (simple date command). During Remote execution Ruby started growing from 1 G to 18GB as shown in attachment this causes swapping & slowness. My satellite has : 48G mem Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. Start ReX on 6k nodes. 2. 3. Actual results: Expected results: Additional info:
Created attachment 1245471 [details] Trends of memory usage over time
Created attachment 1245500 [details] Trends of memory usage over time
Created attachment 1245501 [details] UI
Attached mem growth over a period of time. while true; do (date && ps aux --sort -rss | head -n20) >> /var/log/foreman/ps-aux1.log; sleep 60; done foreman, qpidd, pgsql using higher
Passenger is biggest contributing factor for this ruby mem growth
Created attachment 1245502 [details] passenegr growth
Created attachment 1245503 [details] ruby @36G when passenger mem is close to that
Was the testing performed on admin or a non-admin user? I'm asking to check if that could be related to this https://bugzilla.redhat.com/show_bug.cgi?id=1422690
(In reply to Ivan Necas from comment #12) > Was the testing performed on admin or a non-admin user? I'm asking to check > if that could be related to this > https://bugzilla.redhat.com/show_bug.cgi?id=1422690 admin user
Ruby memory is growing higher during remote execution at scale For 1K+ Rex: Ruby jumped from few MBs to 5GB, passeng-foreman jumped from few MB to 4GB for 2k+ Rex: Ruby jumped from few MBs to 8GB, passeng-foreman jumped from few MB to 7GB If this continues like this, we might need huge memory for 40K hosts. This issue will become another major memory concering issue like qpid mem issue.
These numbers from a different setup (30k scale setup) Started Rex job `subscripton-manager repos --list` on 22k hosts Ruby mem shooted upto 98G passenger-foreman upto 90G postgresql 40G This is killing most of the katello services
Created attachment 1268848 [details] Ruby memory growth during rex job: subscription-manager repos --list
Created attachment 1268849 [details] pgsql memory growth during rex job: subscription-manager repos --list
Created attachment 1268850 [details] passenger-foreman memory growth during rex job: subscription-manager repos --list
Pradeep: we need to start distinguishing between different jobs: those interacting with satellite and those that don't, as it might not be clear if it isn't connected with https://bugzilla.redhat.com/show_bug.cgi?id=1434040. For this bug, only scripts non-interacting with satellite are valid. For the scripts interacting with satellite, we need to track it against different components.
sure. i will move this to different bz or check if its connected to 1434040
IMHO we are creating a load test for /rhsm/ endpoints: by running `subscripton-manager repos --list`, each host is generating the following requests to satellite: /rhsm/consumers/:id/certificates/serials /rhsm/consumers/:id /rhsm/consumers/:id/content_overrides /rhsm/consumers/:id/release which means Satellite has to deal with 4*(number_of_hosts) requests in a very small time interval. No wonder it's memory is growing up - passenger will probably spawn a huge amount of processes to deal with those requests in parallel.
Moving `subscripton-manager repos --list` on 22k hosts issue to different bug (1439741) to avoid confusion.
moving to high as we investigate.