Bug 1976051
Summary: | potential memory leak in puma | |||
---|---|---|---|---|
Product: | Red Hat Satellite | Reporter: | matt jia <mjia> | |
Component: | Installation | Assignee: | Jonathon Turel <jturel> | |
Status: | CLOSED ERRATA | QA Contact: | Devendra Singh <desingh> | |
Severity: | urgent | Docs Contact: | ||
Priority: | urgent | |||
Version: | 6.9.0 | CC: | ahumbe, amasolov, egolov, ehelms, hyu, inecas, janarula, jhutar, jturel, ktordeur, ldelouw, mmccune, oezr, pdwyer, pmendezh, smallamp, wpinheir, yferszt, zhunting | |
Target Milestone: | 6.10.0 | Keywords: | AutomationBlocker, Performance, PrioBumpGSS, Regression, Triaged | |
Target Release: | Unused | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | foreman-installer-2.5.2.1-1,foreman-2.5.2.16-1 | Doc Type: | Known Issue | |
Doc Text: |
Cause: Satellite 6.10 has an apparent memory leak in Satellite 6.10 beta that can occur during high traffic API scenarios or a long running instance. The memory consumption is seen in the Puma process that grows over time and is often observed via the loss of the Candlepin service.
Consequence: The Satellite 6.10 Beta server will run out of RAM and often see the OS OOM Killer kill the highest consuming memory process which is often Candlepin or the Puppetserver. This bug is not an issue with Candlepin itself and likely lies in the Puma + Ruby processes in Satellite.
Workaround (if any): Add additional RAM to the instance you are running the Beta on or restart the Satellite services via 'satellite-maintain service restart'.
|
Story Points: | --- | |
Clone Of: | ||||
: | 1998259 (view as bug list) | Environment: | ||
Last Closed: | 2021-11-16 14:12:09 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: |
Description
matt jia
2021-06-25 04:47:48 UTC
In an effort to better understand why this happening, and what if any side effects there might be by making this change be the default I opened an issue on the Puma project: https://github.com/puma/puma/issues/2665 Created redmine issue https://projects.theforeman.org/issues/33214 from this bug Upstream bug assigned to ehelms Upstream bug assigned to ehelms Moving this bug to POST for triage into Satellite since the upstream issue https://projects.theforeman.org/issues/33214 has been resolved. Verified on 6.10 Snap14. The memory leak still persists so marking this bug as FailedQA. 6.10 = ~1.6GB average 6.10 SNAP 18 foreman 26114 0.1 1.4 949552 465264 ? Ssl 00:20 0:51 puma 5.3.2 (unix:///run/foreman.sock) [foreman] foreman 26974 5.1 3.1 1519468 1030552 ? Sl 00:21 38:39 puma: cluster worker 6: 26114 [foreman] foreman 26967 5.8 4.9 2097720 1606144 ? Sl 00:21 43:53 puma: cluster worker 5: 26114 [foreman] foreman 26961 6.6 4.9 2106348 1617172 ? Sl 00:21 49:45 puma: cluster worker 4: 26114 [foreman] foreman 26937 8.2 5.0 2138660 1636544 ? Sl 00:21 61:53 puma: cluster worker 0: 26114 [foreman] foreman 27005 5.3 5.0 2159172 1650008 ? Sl 00:21 39:49 puma: cluster worker 10: 26114 [foreman] foreman 27016 6.0 4.9 2160456 1613920 ? Sl 00:21 44:54 puma: cluster worker 11: 26114 [foreman] foreman 26955 5.3 4.6 2161060 1523992 ? Sl 00:21 39:38 puma: cluster worker 3: 26114 [foreman] foreman 26943 5.0 5.0 2180196 1642116 ? Sl 00:21 37:33 puma: cluster worker 1: 26114 [foreman] foreman 26980 5.9 5.1 2223732 1675512 ? Sl 00:21 44:36 puma: cluster worker 7: 26114 [foreman] foreman 26998 5.5 5.4 2316848 1775980 ? Sl 00:21 41:07 puma: cluster worker 9: 26114 [foreman] foreman 26985 5.9 5.3 2342232 1738144 ? Sl 00:21 44:27 puma: cluster worker 8: 26114 [foreman] foreman 26949 5.5 5.5 2348560 1798304 ? Sl 00:21 41:37 puma: cluster worker 2: 26114 [foreman] I saw the difference in memory consumption between 6.9.6 and 6.10 when I ran a client scale test between both versions that initiates ~20,000 client requests to the Satellite. memory usage in Puma with a 'satellite-maintain service restart' before the start of each test. Both running 12 workers, 5 threads: Environment=FOREMAN_PUMA_THREADS_MIN=5 Environment=FOREMAN_PUMA_THREADS_MAX=5 Environment=FOREMAN_PUMA_WORKERS=12 6.9.6: # ps -aux | sort -nk5 | grep puma foreman 6050 5.1 1.7 994124 545848 ? Ssl 12:18 1:33 puma 4.3.6 (unix:///run/foreman.sock) [foreman] foreman 6866 5.2 1.8 1059488 581748 ? Sl 12:19 1:29 puma: cluster worker 10: 6050 [foreman] foreman 6811 25.9 1.8 1066192 590964 ? Sl 12:19 7:21 puma: cluster worker 5: 6050 [foreman] foreman 6776 14.9 1.8 1066456 580196 ? Sl 12:19 4:13 puma: cluster worker 1: 6050 [foreman] foreman 6792 6.6 1.8 1068156 585952 ? Sl 12:19 1:52 puma: cluster worker 3: 6050 [foreman] foreman 6799 6.2 1.8 1072952 582668 ? Sl 12:19 1:45 puma: cluster worker 4: 6050 [foreman] foreman 6769 11.4 1.8 1117540 594192 ? Sl 12:19 3:14 puma: cluster worker 0: 6050 [foreman] foreman 6854 21.2 1.8 1153740 605788 ? Sl 12:19 6:01 puma: cluster worker 9: 6050 [foreman] foreman 6784 13.3 2.1 1170016 702908 ? Sl 12:19 3:47 puma: cluster worker 2: 6050 [foreman] foreman 6840 8.0 2.1 1198652 696428 ? Sl 12:19 2:17 puma: cluster worker 8: 6050 [foreman] foreman 6882 10.1 2.2 1259972 723060 ? Sl 12:19 2:52 puma: cluster worker 11: 6050 [foreman] foreman 6828 6.1 2.2 1288596 729184 ? Sl 12:19 1:44 puma: cluster worker 7: 6050 [foreman] foreman 6819 28.1 2.2 1292724 724892 ? Sl 12:19 7:57 puma: cluster worker 6: 6050 [foreman] *** AVERAGE *** 640MB per process 6.10 SNAP 18 # ps -aux | sort -nk5 | grep puma foreman 26114 0.1 1.4 949552 465264 ? Ssl 00:20 0:51 puma 5.3.2 (unix:///run/foreman.sock) [foreman] foreman 26974 5.1 3.1 1519468 1030552 ? Sl 00:21 38:39 puma: cluster worker 6: 26114 [foreman] foreman 26967 5.8 4.9 2097720 1606144 ? Sl 00:21 43:53 puma: cluster worker 5: 26114 [foreman] foreman 26961 6.6 4.9 2106348 1617172 ? Sl 00:21 49:45 puma: cluster worker 4: 26114 [foreman] foreman 26937 8.2 5.0 2138660 1636544 ? Sl 00:21 61:53 puma: cluster worker 0: 26114 [foreman] foreman 27005 5.3 5.0 2159172 1650008 ? Sl 00:21 39:49 puma: cluster worker 10: 26114 [foreman] foreman 27016 6.0 4.9 2160456 1613920 ? Sl 00:21 44:54 puma: cluster worker 11: 26114 [foreman] foreman 26955 5.3 4.6 2161060 1523992 ? Sl 00:21 39:38 puma: cluster worker 3: 26114 [foreman] foreman 26943 5.0 5.0 2180196 1642116 ? Sl 00:21 37:33 puma: cluster worker 1: 26114 [foreman] foreman 26980 5.9 5.1 2223732 1675512 ? Sl 00:21 44:36 puma: cluster worker 7: 26114 [foreman] foreman 26998 5.5 5.4 2316848 1775980 ? Sl 00:21 41:07 puma: cluster worker 9: 26114 [foreman] foreman 26985 5.9 5.3 2342232 1738144 ? Sl 00:21 44:27 puma: cluster worker 8: 26114 [foreman] foreman 26949 5.5 5.5 2348560 1798304 ? Sl 00:21 41:37 puma: cluster worker 2: 26114 [foreman] *** AVERAGE *** 1.6GB per process The load test I ran above generated the following traffic from simulated rhsm/RHEL clients, the COUNT is the # of requests for the test execution which takes ~60 minutes. COUNT | URL ============= 103018 GET /rhsm/consumers/UUID/content_overrides 103016 GET /rhsm/consumers/UUID/release 68679 GET /rhsm/consumers/UUID/accessible_content 68663 GET /rhsm/status 34341 GET /rhsm/consumers/UUID/certificates/serials 34335 GET /rhsm/ 34324 PUT /rhsm/consumers/UUID/profiles 17155 PUT /rhsm/consumers/UUID 1822 GET /rhsm/consumers/UUID/certificates?serials=8337128579608863933 1817 GET /rhsm/consumers/UUID/certificates?serials=7149584401433911907 1814 GET /rhsm/consumers/UUID/certificates?serials=8054784201157233495 1813 GET /rhsm/consumers/UUID/certificates?serials=4363590632859785227 1812 GET /rhsm/consumers/UUID/certificates?serials=7943123757126001110 1812 GET /rhsm/consumers/UUID/certificates?serials=5930596121017259445 1809 GET /rhsm/consumers/UUID/certificates?serials=2715746447103305874 1807 GET /rhsm/consumers/UUID/certificates?serials=918995456737458903 1806 GET /rhsm/consumers/UUID/certificates?serials=7425470875582694452 1806 GET /rhsm/consumers/UUID/certificates?serials=5479988143725913015 1805 GET /rhsm/consumers/UUID/certificates?serials=9056329964136729920 1805 GET /rhsm/consumers/UUID/certificates?serials=8152045719415230802 1805 GET /rhsm/consumers/UUID/certificates?serials=6052320734561193763 1805 GET /rhsm/consumers/UUID/certificates?serials=2513067510536526130 1804 GET /rhsm/consumers/UUID/certificates?serials=5547592078598993350 1800 GET /rhsm/consumers/UUID/certificates?serials=4372919443647807873 1797 GET /rhsm/consumers/UUID/certificates?serials=2306787689496807186 1794 GET /rhsm/consumers/UUID/certificates?serials=2788435843577782173 1793 GET /rhsm/consumers/UUID/certificates?serials=5126524993372817230 144 POST /api/hosts/facts 96 POST /api/config_reports 96 GET /node/sat-r220-09.lab.eng.rdu2.redhat.com?format=yml 96 GET /node/dhcp-8-29-64.lab.eng.rdu2.redhat.com?format=yml 15 GET /rhsm/consumers/UUID 14 GET /rhsm/consumers/UUID/certificates?serials=4046405494583633843 The high usage in 6.10 got identified as an increased memory consumption of the new SettingsRegistry (https://projects.theforeman.org/issues/33640), which is completely unrelated to this BZ (which is about configuring Puma with better defaults). Can we please split these up, and verify both once the fix for https://projects.theforeman.org/issues/33640 is in? Connecting redmine issue https://projects.theforeman.org/issues/33640 from this bug Verified on 6.10 Snap23 Verification points: 1- Didn't see any test failure in automation due to a memory leak with the default configuration. 2- Ran multiple repository sync, cv publish and send thousands of GET request to list down the installed packages but didn't see any issue. # ps -aux | sort -nk5 | grep puma root 53160 0.0 0.0 112812 976 pts/1 S+ 03:29 0:00 grep --color=auto puma foreman 1054 0.0 2.2 943544 456588 ? Ssl Oct14 4:05 puma 5.3.2 (unix:///run/foreman.sock) [foreman] foreman 2306 0.0 3.3 1153708 672888 ? Sl Oct14 2:06 puma: cluster worker 0: 1054 [foreman] foreman 2352 0.0 3.3 1157872 686436 ? Sl Oct14 1:54 puma: cluster worker 8: 1054 [foreman] foreman 2321 0.0 3.3 1171632 686040 ? Sl Oct14 1:59 puma: cluster worker 2: 1054 [foreman] foreman 2337 0.0 3.3 1173712 690012 ? Sl Oct14 1:59 puma: cluster worker 5: 1054 [foreman] foreman 2328 0.0 3.3 1176076 675596 ? Sl Oct14 2:07 puma: cluster worker 3: 1054 [foreman] foreman 2342 0.0 3.3 1176356 684116 ? Sl Oct14 2:10 puma: cluster worker 6: 1054 [foreman] foreman 2333 0.0 3.3 1177256 685664 ? Sl Oct14 2:38 puma: cluster worker 4: 1054 [foreman] foreman 2311 0.0 3.4 1188100 699148 ? Sl Oct14 2:15 puma: cluster worker 1: 1054 [foreman] foreman 2350 0.1 3.5 1258108 722620 ? Sl Oct14 8:44 puma: cluster worker 7: 1054 [foreman] # ps -aux | sort -nk5 | grep puma root 53430 0.0 0.0 112812 972 pts/1 S+ 03:29 0:00 grep --color=auto puma foreman 1054 0.0 2.2 943544 456588 ? Ssl Oct14 4:05 puma 5.3.2 (unix:///run/foreman.sock) [foreman] foreman 2306 0.0 3.3 1153708 672888 ? Sl Oct14 2:13 puma: cluster worker 0: 1054 [foreman] foreman 2352 0.0 3.3 1157872 686456 ? Sl Oct14 2:01 puma: cluster worker 8: 1054 [foreman] foreman 2321 0.0 3.3 1171632 686040 ? Sl Oct14 2:06 puma: cluster worker 2: 1054 [foreman] foreman 2337 0.0 3.3 1173712 690044 ? Sl Oct14 2:05 puma: cluster worker 5: 1054 [foreman] foreman 2342 0.0 3.3 1176356 684116 ? Sl Oct14 2:16 puma: cluster worker 6: 1054 [foreman] foreman 2333 0.0 3.3 1177256 690428 ? Sl Oct14 2:45 puma: cluster worker 4: 1054 [foreman] foreman 2328 0.0 3.3 1179548 672540 ? Sl Oct14 2:14 puma: cluster worker 3: 1054 [foreman] foreman 2311 0.0 3.4 1188100 699148 ? Sl Oct14 2:21 puma: cluster worker 1: 1054 [foreman] foreman 2350 0.1 3.5 1258108 722620 ? Sl Oct14 8:50 puma: cluster worker 7: 1054 [foreman] # ps -aux | sort -nk5 | grep puma root 55798 0.0 0.0 112812 972 pts/1 S+ 03:49 0:00 grep --color=auto puma foreman 1054 0.0 2.2 943544 451860 ? Ssl Oct14 4:06 puma 5.3.2 (unix:///run/foreman.sock) [foreman] foreman 2306 0.0 3.2 1146532 668732 ? Sl Oct14 3:55 puma: cluster worker 0: 1054 [foreman] foreman 2352 0.0 3.3 1163468 683860 ? Sl Oct14 3:53 puma: cluster worker 8: 1054 [foreman] foreman 2337 0.0 3.4 1167588 696536 ? Sl Oct14 4:01 puma: cluster worker 5: 1054 [foreman] foreman 2342 0.0 3.3 1168156 675032 ? Sl Oct14 3:56 puma: cluster worker 6: 1054 [foreman] foreman 2321 0.0 3.3 1171632 682676 ? Sl Oct14 3:46 puma: cluster worker 2: 1054 [foreman] foreman 2333 0.0 3.3 1177256 687728 ? Sl Oct14 4:25 puma: cluster worker 4: 1054 [foreman] foreman 2328 0.0 3.2 1179548 668852 ? Sl Oct14 3:57 puma: cluster worker 3: 1054 [foreman] foreman 2311 0.0 3.4 1188100 696656 ? Sl Oct14 4:03 puma: cluster worker 1: 1054 [foreman] foreman 2350 0.1 3.5 1258108 718264 ? Sl Oct14 10:27 puma: cluster worker 7: 1054 [foreman] 3- Verified on below fixed in-version packages # rpm -qa|grep foreman-installer-2.5. foreman-installer-2.5.2.9-1.el7sat.noarch # rpm -qa|grep foreman-2. foreman-2.5.2.17-1.el7sat.noarch Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: Satellite 6.10 Release), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:4702 |