Bug 1976051

Summary:	potential memory leak in puma
Product:	Red Hat Satellite	Reporter:	matt jia <mjia>
Component:	Installation	Assignee:	Jonathon Turel <jturel>
Status:	CLOSED ERRATA	QA Contact:	Devendra Singh <desingh>
Severity:	urgent	Docs Contact:
Priority:	urgent
Version:	6.9.0	CC:	ahumbe, amasolov, egolov, ehelms, hyu, inecas, janarula, jhutar, jturel, ktordeur, ldelouw, mmccune, oezr, pdwyer, pmendezh, smallamp, wpinheir, yferszt, zhunting
Target Milestone:	6.10.0	Keywords:	AutomationBlocker, Performance, PrioBumpGSS, Regression, Triaged
Target Release:	Unused
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	foreman-installer-2.5.2.1-1,foreman-2.5.2.16-1	Doc Type:	Known Issue
Doc Text:	Cause: Satellite 6.10 has an apparent memory leak in Satellite 6.10 beta that can occur during high traffic API scenarios or a long running instance. The memory consumption is seen in the Puma process that grows over time and is often observed via the loss of the Candlepin service. Consequence: The Satellite 6.10 Beta server will run out of RAM and often see the OS OOM Killer kill the highest consuming memory process which is often Candlepin or the Puppetserver. This bug is not an issue with Candlepin itself and likely lies in the Puma + Ruby processes in Satellite. Workaround (if any): Add additional RAM to the instance you are running the Beta on or restart the Satellite services via 'satellite-maintain service restart'.	Story Points:	---
Clone Of:
Clones:	1998259 (view as bug list)		Environment:
Last Closed:	2021-11-16 14:12:09 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description matt jia 2021-06-25 04:47:48 UTC

After 6.9.2 upgrade, people are seeing memory spikes with default puma settings. 

For example, below is observed on a server with 32 vcpus and 130G+ ram.


top - 21:34:30 up 18:48,  7 users,  load average: 58.43, 68.66, 53.90
Tasks: 1367 total,   2 running, 1365 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.9 us,  3.7 sy,  0.0 ni, 18.3 id, 76.9 wa,  0.0 hi,  0.2 si,  0.0 st
KiB Mem : 13184585+total,   617372 free, 12994666+used,  1281824 buff/cache
KiB Swap: 34598908 total,  9700972 free, 24897936 used.   820016 avail Mem

   PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
   188 root      20   0       0      0      0 R  34.6  0.0  18:37.34 kswapd0
107827 tomcat    20   0   19.6g   2.8g   1176 S  30.8  2.2   7:47.79 java
109360 foreman   20   0  122.5g 114.8g   1196 S   8.4 91.3   7:17.14 ruby

Comment 4 Eric Helms 2021-07-16 17:34:13 UTC

In an effort to better understand why this happening, and what if any side effects there might be by making this change be the default I opened an issue on the Puma project: https://github.com/puma/puma/issues/2665

Comment 11 Eric Helms 2021-08-04 21:27:53 UTC

Created redmine issue https://projects.theforeman.org/issues/33214 from this bug

Comment 14 Bryan Kearney 2021-08-05 00:05:21 UTC

Upstream bug assigned to ehelms

Comment 15 Bryan Kearney 2021-08-05 00:05:24 UTC

Upstream bug assigned to ehelms

Comment 21 Bryan Kearney 2021-08-11 00:05:05 UTC

Moving this bug to POST for triage into Satellite since the upstream issue https://projects.theforeman.org/issues/33214 has been resolved.

Comment 25 Devendra Singh 2021-08-30 10:04:09 UTC

Verified on 6.10 Snap14.

Comment 26 Devendra Singh 2021-09-15 17:30:53 UTC

The memory leak still persists so marking this bug as FailedQA.

6.10 = ~1.6GB average

6.10 SNAP 18
 
foreman  26114  0.1  1.4 949552 465264 ?       Ssl  00:20   0:51 puma 5.3.2 (unix:///run/foreman.sock) [foreman]
foreman  26974  5.1  3.1 1519468 1030552 ?     Sl   00:21  38:39 puma: cluster worker 6: 26114 [foreman]
foreman  26967  5.8  4.9 2097720 1606144 ?     Sl   00:21  43:53 puma: cluster worker 5: 26114 [foreman]
foreman  26961  6.6  4.9 2106348 1617172 ?     Sl   00:21  49:45 puma: cluster worker 4: 26114 [foreman]
foreman  26937  8.2  5.0 2138660 1636544 ?     Sl   00:21  61:53 puma: cluster worker 0: 26114 [foreman]
foreman  27005  5.3  5.0 2159172 1650008 ?     Sl   00:21  39:49 puma: cluster worker 10: 26114 [foreman]
foreman  27016  6.0  4.9 2160456 1613920 ?     Sl   00:21  44:54 puma: cluster worker 11: 26114 [foreman]
foreman  26955  5.3  4.6 2161060 1523992 ?     Sl   00:21  39:38 puma: cluster worker 3: 26114 [foreman]
foreman  26943  5.0  5.0 2180196 1642116 ?     Sl   00:21  37:33 puma: cluster worker 1: 26114 [foreman]
foreman  26980  5.9  5.1 2223732 1675512 ?     Sl   00:21  44:36 puma: cluster worker 7: 26114 [foreman]
foreman  26998  5.5  5.4 2316848 1775980 ?     Sl   00:21  41:07 puma: cluster worker 9: 26114 [foreman]
foreman  26985  5.9  5.3 2342232 1738144 ?     Sl   00:21  44:27 puma: cluster worker 8: 26114 [foreman]
foreman  26949  5.5  5.5 2348560 1798304 ?     Sl   00:21  41:37 puma: cluster worker 2: 26114 [foreman]

Comment 27 Mike McCune 2021-09-15 20:15:32 UTC

I saw the difference in memory consumption between 6.9.6 and 6.10 when I ran a client scale test between both versions that initiates ~20,000 client requests to the Satellite. memory usage in Puma with a 'satellite-maintain service restart' before the start of each test. Both running 12 workers, 5 threads:


Environment=FOREMAN_PUMA_THREADS_MIN=5
Environment=FOREMAN_PUMA_THREADS_MAX=5
Environment=FOREMAN_PUMA_WORKERS=12


6.9.6:

# ps -aux | sort -nk5 | grep puma

foreman   6050  5.1  1.7 994124 545848 ?       Ssl  12:18   1:33 puma 4.3.6 (unix:///run/foreman.sock) [foreman]
foreman   6866  5.2  1.8 1059488 581748 ?      Sl   12:19   1:29 puma: cluster worker 10: 6050 [foreman]
foreman   6811 25.9  1.8 1066192 590964 ?      Sl   12:19   7:21 puma: cluster worker 5: 6050 [foreman]
foreman   6776 14.9  1.8 1066456 580196 ?      Sl   12:19   4:13 puma: cluster worker 1: 6050 [foreman]
foreman   6792  6.6  1.8 1068156 585952 ?      Sl   12:19   1:52 puma: cluster worker 3: 6050 [foreman]
foreman   6799  6.2  1.8 1072952 582668 ?      Sl   12:19   1:45 puma: cluster worker 4: 6050 [foreman]
foreman   6769 11.4  1.8 1117540 594192 ?      Sl   12:19   3:14 puma: cluster worker 0: 6050 [foreman]
foreman   6854 21.2  1.8 1153740 605788 ?      Sl   12:19   6:01 puma: cluster worker 9: 6050 [foreman]
foreman   6784 13.3  2.1 1170016 702908 ?      Sl   12:19   3:47 puma: cluster worker 2: 6050 [foreman]
foreman   6840  8.0  2.1 1198652 696428 ?      Sl   12:19   2:17 puma: cluster worker 8: 6050 [foreman]
foreman   6882 10.1  2.2 1259972 723060 ?      Sl   12:19   2:52 puma: cluster worker 11: 6050 [foreman]
foreman   6828  6.1  2.2 1288596 729184 ?      Sl   12:19   1:44 puma: cluster worker 7: 6050 [foreman]
foreman   6819 28.1  2.2 1292724 724892 ?      Sl   12:19   7:57 puma: cluster worker 6: 6050 [foreman]

*** AVERAGE *** 640MB per process

6.10 SNAP 18


# ps -aux | sort -nk5 | grep puma

foreman  26114  0.1  1.4 949552 465264 ?       Ssl  00:20   0:51 puma 5.3.2 (unix:///run/foreman.sock) [foreman]
foreman  26974  5.1  3.1 1519468 1030552 ?     Sl   00:21  38:39 puma: cluster worker 6: 26114 [foreman]
foreman  26967  5.8  4.9 2097720 1606144 ?     Sl   00:21  43:53 puma: cluster worker 5: 26114 [foreman]
foreman  26961  6.6  4.9 2106348 1617172 ?     Sl   00:21  49:45 puma: cluster worker 4: 26114 [foreman]
foreman  26937  8.2  5.0 2138660 1636544 ?     Sl   00:21  61:53 puma: cluster worker 0: 26114 [foreman]
foreman  27005  5.3  5.0 2159172 1650008 ?     Sl   00:21  39:49 puma: cluster worker 10: 26114 [foreman]
foreman  27016  6.0  4.9 2160456 1613920 ?     Sl   00:21  44:54 puma: cluster worker 11: 26114 [foreman]
foreman  26955  5.3  4.6 2161060 1523992 ?     Sl   00:21  39:38 puma: cluster worker 3: 26114 [foreman]
foreman  26943  5.0  5.0 2180196 1642116 ?     Sl   00:21  37:33 puma: cluster worker 1: 26114 [foreman]
foreman  26980  5.9  5.1 2223732 1675512 ?     Sl   00:21  44:36 puma: cluster worker 7: 26114 [foreman]
foreman  26998  5.5  5.4 2316848 1775980 ?     Sl   00:21  41:07 puma: cluster worker 9: 26114 [foreman]
foreman  26985  5.9  5.3 2342232 1738144 ?     Sl   00:21  44:27 puma: cluster worker 8: 26114 [foreman]
foreman  26949  5.5  5.5 2348560 1798304 ?     Sl   00:21  41:37 puma: cluster worker 2: 26114 [foreman]

*** AVERAGE *** 1.6GB per process

Comment 30 Mike McCune 2021-09-22 20:12:53 UTC

The load test I ran above generated the following traffic from simulated rhsm/RHEL clients, the COUNT is the # of requests for the test execution which takes ~60 minutes.


  COUNT |   URL
  =============
 103018 GET /rhsm/consumers/UUID/content_overrides
 103016 GET /rhsm/consumers/UUID/release
  68679 GET /rhsm/consumers/UUID/accessible_content
  68663 GET /rhsm/status
  34341 GET /rhsm/consumers/UUID/certificates/serials
  34335 GET /rhsm/
  34324 PUT /rhsm/consumers/UUID/profiles
  17155 PUT /rhsm/consumers/UUID
   1822 GET /rhsm/consumers/UUID/certificates?serials=8337128579608863933
   1817 GET /rhsm/consumers/UUID/certificates?serials=7149584401433911907
   1814 GET /rhsm/consumers/UUID/certificates?serials=8054784201157233495
   1813 GET /rhsm/consumers/UUID/certificates?serials=4363590632859785227
   1812 GET /rhsm/consumers/UUID/certificates?serials=7943123757126001110
   1812 GET /rhsm/consumers/UUID/certificates?serials=5930596121017259445
   1809 GET /rhsm/consumers/UUID/certificates?serials=2715746447103305874
   1807 GET /rhsm/consumers/UUID/certificates?serials=918995456737458903
   1806 GET /rhsm/consumers/UUID/certificates?serials=7425470875582694452
   1806 GET /rhsm/consumers/UUID/certificates?serials=5479988143725913015
   1805 GET /rhsm/consumers/UUID/certificates?serials=9056329964136729920
   1805 GET /rhsm/consumers/UUID/certificates?serials=8152045719415230802
   1805 GET /rhsm/consumers/UUID/certificates?serials=6052320734561193763
   1805 GET /rhsm/consumers/UUID/certificates?serials=2513067510536526130
   1804 GET /rhsm/consumers/UUID/certificates?serials=5547592078598993350
   1800 GET /rhsm/consumers/UUID/certificates?serials=4372919443647807873
   1797 GET /rhsm/consumers/UUID/certificates?serials=2306787689496807186
   1794 GET /rhsm/consumers/UUID/certificates?serials=2788435843577782173
   1793 GET /rhsm/consumers/UUID/certificates?serials=5126524993372817230
    144 POST /api/hosts/facts
     96 POST /api/config_reports
     96 GET /node/sat-r220-09.lab.eng.rdu2.redhat.com?format=yml
     96 GET /node/dhcp-8-29-64.lab.eng.rdu2.redhat.com?format=yml
     15 GET /rhsm/consumers/UUID
     14 GET /rhsm/consumers/UUID/certificates?serials=4046405494583633843

Comment 31 Evgeni Golov 2021-10-06 07:24:53 UTC

The high usage in 6.10 got identified as an increased memory consumption of the new SettingsRegistry (https://projects.theforeman.org/issues/33640), which is completely unrelated to this BZ (which is about configuring Puma with better defaults).

Can we please split these up, and verify both once the fix for https://projects.theforeman.org/issues/33640 is in?

Comment 32 Jonathon Turel 2021-10-07 13:40:23 UTC

Connecting redmine issue https://projects.theforeman.org/issues/33640 from this bug

Comment 39 Devendra Singh 2021-10-19 08:04:42 UTC

Verified on 6.10 Snap23

Verification points: 

1- Didn't see any test failure in automation due to a memory leak with the default configuration.

2- Ran multiple repository sync, cv publish and send thousands of GET request to list down the installed packages but didn't see any issue.

# ps -aux | sort -nk5 | grep puma
root     53160  0.0  0.0 112812   976 pts/1    S+   03:29   0:00 grep --color=auto puma
foreman   1054  0.0  2.2 943544 456588 ?       Ssl  Oct14   4:05 puma 5.3.2 (unix:///run/foreman.sock) [foreman]
foreman   2306  0.0  3.3 1153708 672888 ?      Sl   Oct14   2:06 puma: cluster worker 0: 1054 [foreman]
foreman   2352  0.0  3.3 1157872 686436 ?      Sl   Oct14   1:54 puma: cluster worker 8: 1054 [foreman]
foreman   2321  0.0  3.3 1171632 686040 ?      Sl   Oct14   1:59 puma: cluster worker 2: 1054 [foreman]
foreman   2337  0.0  3.3 1173712 690012 ?      Sl   Oct14   1:59 puma: cluster worker 5: 1054 [foreman]
foreman   2328  0.0  3.3 1176076 675596 ?      Sl   Oct14   2:07 puma: cluster worker 3: 1054 [foreman]
foreman   2342  0.0  3.3 1176356 684116 ?      Sl   Oct14   2:10 puma: cluster worker 6: 1054 [foreman]
foreman   2333  0.0  3.3 1177256 685664 ?      Sl   Oct14   2:38 puma: cluster worker 4: 1054 [foreman]
foreman   2311  0.0  3.4 1188100 699148 ?      Sl   Oct14   2:15 puma: cluster worker 1: 1054 [foreman]
foreman   2350  0.1  3.5 1258108 722620 ?      Sl   Oct14   8:44 puma: cluster worker 7: 1054 [foreman]


# ps -aux | sort -nk5 | grep puma
root     53430  0.0  0.0 112812   972 pts/1    S+   03:29   0:00 grep --color=auto puma
foreman   1054  0.0  2.2 943544 456588 ?       Ssl  Oct14   4:05 puma 5.3.2 (unix:///run/foreman.sock) [foreman]
foreman   2306  0.0  3.3 1153708 672888 ?      Sl   Oct14   2:13 puma: cluster worker 0: 1054 [foreman]
foreman   2352  0.0  3.3 1157872 686456 ?      Sl   Oct14   2:01 puma: cluster worker 8: 1054 [foreman]
foreman   2321  0.0  3.3 1171632 686040 ?      Sl   Oct14   2:06 puma: cluster worker 2: 1054 [foreman]
foreman   2337  0.0  3.3 1173712 690044 ?      Sl   Oct14   2:05 puma: cluster worker 5: 1054 [foreman]
foreman   2342  0.0  3.3 1176356 684116 ?      Sl   Oct14   2:16 puma: cluster worker 6: 1054 [foreman]
foreman   2333  0.0  3.3 1177256 690428 ?      Sl   Oct14   2:45 puma: cluster worker 4: 1054 [foreman]
foreman   2328  0.0  3.3 1179548 672540 ?      Sl   Oct14   2:14 puma: cluster worker 3: 1054 [foreman]
foreman   2311  0.0  3.4 1188100 699148 ?      Sl   Oct14   2:21 puma: cluster worker 1: 1054 [foreman]
foreman   2350  0.1  3.5 1258108 722620 ?      Sl   Oct14   8:50 puma: cluster worker 7: 1054 [foreman]

# ps -aux | sort -nk5 | grep puma
root     55798  0.0  0.0 112812   972 pts/1    S+   03:49   0:00 grep --color=auto puma
foreman   1054  0.0  2.2 943544 451860 ?       Ssl  Oct14   4:06 puma 5.3.2 (unix:///run/foreman.sock) [foreman]
foreman   2306  0.0  3.2 1146532 668732 ?      Sl   Oct14   3:55 puma: cluster worker 0: 1054 [foreman]
foreman   2352  0.0  3.3 1163468 683860 ?      Sl   Oct14   3:53 puma: cluster worker 8: 1054 [foreman]
foreman   2337  0.0  3.4 1167588 696536 ?      Sl   Oct14   4:01 puma: cluster worker 5: 1054 [foreman]
foreman   2342  0.0  3.3 1168156 675032 ?      Sl   Oct14   3:56 puma: cluster worker 6: 1054 [foreman]
foreman   2321  0.0  3.3 1171632 682676 ?      Sl   Oct14   3:46 puma: cluster worker 2: 1054 [foreman]
foreman   2333  0.0  3.3 1177256 687728 ?      Sl   Oct14   4:25 puma: cluster worker 4: 1054 [foreman]
foreman   2328  0.0  3.2 1179548 668852 ?      Sl   Oct14   3:57 puma: cluster worker 3: 1054 [foreman]
foreman   2311  0.0  3.4 1188100 696656 ?      Sl   Oct14   4:03 puma: cluster worker 1: 1054 [foreman]
foreman   2350  0.1  3.5 1258108 718264 ?      Sl   Oct14  10:27 puma: cluster worker 7: 1054 [foreman]

3- Verified on below fixed in-version packages

# rpm -qa|grep foreman-installer-2.5.
foreman-installer-2.5.2.9-1.el7sat.noarch
# rpm -qa|grep foreman-2.
foreman-2.5.2.17-1.el7sat.noarch

Comment 42 errata-xmlrpc 2021-11-16 14:12:09 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Satellite 6.10 Release), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:4702