Bug 1460747 - Repeated invocation of out of memory on Satellite 6.2 [NEEDINFO]
Repeated invocation of out of memory on Satellite 6.2
Status: NEW
Product: Red Hat Satellite 6
Classification: Red Hat
Component: Hosts (Show other bugs)
6.2.11
Unspecified Unspecified
urgent Severity urgent (vote)
: Unspecified
: --
Assigned To: satellite6-bugs
: Regression, Triaged
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2017-06-12 11:02 EDT by Rajan Gupta
Modified: 2017-09-12 01:08 EDT (History)
14 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
lzap: needinfo? (inecas)
inecas: needinfo? (rajgupta)


Attachments (Terms of Use)
Passenger Recycler script (1.98 KB, application/x-ruby)
2017-06-12 11:02 EDT, Rajan Gupta
no flags Details

  None (edit)
Description Rajan Gupta 2017-06-12 11:02:41 EDT
Created attachment 1287066 [details]
Passenger Recycler script

Description of problem:
Repeated invocation of oom-killer on Satellite 6 processes

Cu has applied the fix provided in BZ 1447958 as well as updated the system to the latest one.

However, There are still OOM errors on the satellite system.

We have got a temporary fix from our senior satellite team member (Lukas Zapletal) which will clear the will recycle passenger worker processes in case they eat more than 2 GB of RAM. Cu has deployed this and running it from cron hourly to make sure processes are terminated cleanly rather than via hard OOM kill.

https://gist.github.com/lzap/8dddbe66ec8d43cbd4277c1de7045c17


Version-Release number of selected component (if applicable):
Red Hat Satellite 6.2.9


Cu is looking forward to fix this ASAP.
Comment 1 Karl Abbott 2017-06-12 16:09:12 EDT
Have a customer using this that has reported the following around the script from comment 1: (shows a significant improvement over the previous numbers)

Also,
 
After the script:
 
------ Passenger processes ------
PID    VMSize     Private   Name
---------------------------------
4969   2837.9 MB  655.6 MB  Passenger RackApp: /usr/share/foreman
4970   301.6 MB   67.4 MB   Passenger RackApp: /etc/puppet/rack
11386  216.1 MB   0.3 MB    PassengerWatchdog
11389  1526.0 MB  5.2 MB    PassengerHelperAgent
11395  216.3 MB   0.9 MB    PassengerLoggingAgent
12561  2830.8 MB  658.2 MB  Passenger RackApp: /usr/share/foreman
12602  2830.8 MB  644.3 MB  Passenger RackApp: /usr/share/foreman
12640  2830.9 MB  650.4 MB  Passenger RackApp: /usr/share/foreman
14864  2768.8 MB  591.5 MB  Passenger RackApp: /usr/share/foreman
15122  2831.4 MB  638.0 MB  Passenger RackApp: /usr/share/foreman
16977  2831.5 MB  618.4 MB  Passenger RackApp: /usr/share/foreman
20832  687.5 MB   210.1 MB  Passenger AppPreloader: /usr/share/foreman
22590  2773.9 MB  601.3 MB  Passenger RackApp: /usr/share/foreman
25490  2710.1 MB  500.0 MB  Passenger RackApp: /usr/share/foreman
28654  2709.8 MB  528.3 MB  Passenger RackApp: /usr/share/foreman
30521  2838.0 MB  604.9 MB  Passenger RackApp: /usr/share/foreman
 
I took the exit 1 from it which meant it ran over only one process per run!
--------

Previous numbers:

------ Passenger processes -------
PID    VMSize     Private    Name
----------------------------------
4970   301.6 MB   62.7 MB    Passenger RackApp: /etc/puppet/rack
11386  216.1 MB   0.3 MB     PassengerWatchdog
11389  1526.0 MB  4.6 MB     PassengerHelperAgent
11395  216.3 MB   0.9 MB     PassengerLoggingAgent
12529  7246.8 MB  5083.6 MB  Passenger RackApp: /usr/share/foreman
12561  2830.8 MB  655.1 MB   Passenger RackApp: /usr/share/foreman
12602  2830.8 MB  629.3 MB   Passenger RackApp: /usr/share/foreman
12640  2830.9 MB  648.2 MB   Passenger RackApp: /usr/share/foreman
13845  3471.0 MB  1278.0 MB  Passenger RackApp: /usr/share/foreman
14548  7568.5 MB  5376.6 MB  Passenger RackApp: /usr/share/foreman
14864  2768.8 MB  597.4 MB   Passenger RackApp: /usr/share/foreman
15122  2831.4 MB  644.9 MB   Passenger RackApp: /usr/share/foreman
16977  2829.5 MB  617.2 MB   Passenger RackApp: /usr/share/foreman
18277  7504.7 MB  5318.3 MB  Passenger RackApp: /usr/share/foreman
26818  7568.9 MB  5318.7 MB  Passenger RackApp: /usr/share/foreman
Comment 2 Karl Abbott 2017-06-13 08:09:18 EDT
Quick update from my customer overnight:

One of the processes increased the memory usage to 5611M.

# passenger-status
Version : 4.0.18
Date    : 2017-06-13 10:31:07 +0200
Instance: 11355
----------- General information -----------
Max pool size : 12
Processes     : 12
Requests in top-level queue : 0

----------- Application groups -----------
/usr/share/foreman#default:
  App root: /usr/share/foreman
  Requests in queue: 0
  * PID: 12561   Sessions: 1       Processed: 13622   Uptime: 18h 9m 33s
    CPU: 8%      Memory  : 664M    Last used: 1s ago
  * PID: 12602   Sessions: 0       Processed: 13340   Uptime: 18h 9m 32s
    CPU: 7%      Memory  : 654M    Last used: 2s ago
  * PID: 12640   Sessions: 1       Processed: 13651   Uptime: 18h 9m 32s
    CPU: 8%      Memory  : 1381M   Last used: 1s ago
  * PID: 14864   Sessions: 1       Processed: 13776   Uptime: 18h 9m 3s
    CPU: 8%      Memory  : 607M    Last used: 1s ago
  * PID: 16977   Sessions: 1       Processed: 12619   Uptime: 18h 8m 59s
    CPU: 7%      Memory  : 642M    Last used: 1s ago
  * PID: 28654   Sessions: 0       Processed: 8860    Uptime: 12h 45m 41s
    CPU: 7%      Memory  : 5611M   Last used: 1s ago
  * PID: 22590   Sessions: 0       Processed: 8197    Uptime: 12h 43m 4s
    CPU: 7%      Memory  : 628M    Last used: 2s ago
  * PID: 4793    Sessions: 0       Processed: 2977    Uptime: 4h 28m 46s
    CPU: 8%      Memory  : 691M    Last used: 2s ago
  * PID: 26786   Sessions: 1       Processed: 291     Uptime: 28m 46s
    CPU: 7%      Memory  : 524M    Last used: 1s ago
  * PID: 6874    Sessions: 1       Processed: 270     Uptime: 27m 4s
    CPU: 7%      Memory  : 523M    Last used: 2s ago
  * PID: 32696   Sessions: 1       Processed: 284     Uptime: 25m 4s
    CPU: 8%      Memory  : 611M    Last used: 1s ago

/etc/puppet/rack#default:
  App root: /etc/puppet/rack
  Requests in queue: 0
  * PID: 4970    Sessions: 0       Processed: 1013    Uptime: 18h 1m 2s
    CPU: 0%      Memory  : 68M     Last used: 54s ago

-------

then.....


-------

Looks like passenger-recycler.rb did a job here:

/usr/share/foreman#default:
  App root: /usr/share/foreman
  Requests in queue: 0
  * PID: 12561   Sessions: 0       Processed: 13936   Uptime: 18h 45m 54s
    CPU: 8%      Memory  : 1356M   Last used: 17s ago
  * PID: 12602   Sessions: 0       Processed: 13784   Uptime: 18h 45m 53s
    CPU: 7%      Memory  : 654M    Last used: 12s ago
  * PID: 12640   Sessions: 0       Processed: 14100   Uptime: 18h 45m 53s
    CPU: 9%      Memory  : 1387M   Last used: 26s ago
  * PID: 14864   Sessions: 0       Processed: 14212   Uptime: 18h 45m 24s
    CPU: 8%      Memory  : 610M    Last used: 20s ago
  * PID: 16977   Sessions: 0       Processed: 13188   Uptime: 18h 45m 20s
    CPU: 7%      Memory  : 642M    Last used: 6s ago
  * PID: 22590   Sessions: 0       Processed: 8728    Uptime: 13h 19m 25s
    CPU: 7%      Memory  : 677M    Last used: 20s ago
  * PID: 4793    Sessions: 0       Processed: 3361    Uptime: 5h 5m 7s
    CPU: 8%      Memory  : 691M    Last used: 21s ago
  * PID: 26786   Sessions: 0       Processed: 806     Uptime: 1h 5m 7s
    CPU: 9%      Memory  : 529M    Last used: 19s ago
  * PID: 6874    Sessions: 0       Processed: 690     Uptime: 1h 3m 25s
    CPU: 8%      Memory  : 527M    Last used: 21s ago
  * PID: 32696   Sessions: 0       Processed: 752     Uptime: 1h 1m 25s
    CPU: 8%      Memory  : 616M    Last used: 20s ago
  * PID: 20449   Sessions: 0       Processed: 35      Uptime: 6m 3s
    CPU: 6%      Memory  : 711M    Last used: 19s ago


--------

So clearly the script as a workaround is working. Continuing to monitor to make sure this stays the case. Customer is collecting free, passenger status, and load every 5 minutes.

Are there other pieces of data that are needed to see why these processes are balooning in memory?

Cheers,
Karl
Comment 6 Justin Sherrill 2017-06-16 15:54:32 EDT
Rajan,

Working on Karl's case, we discovered that if the user has lots of hostgroups and lots of associated puppetclasses, performing a search on the hostgroup page or view the api resulted in a massive spike of memory.  

To reproduce you'd just:
1.  go to Config > hostsgroups
2.  Type the name of some hostgroup in the box 'foobar'
3.  Hit search

the request would take a long time, and cause a HUGE spike in memory.
Because i'm not sure if it is the same case, i've opened another BZ https://bugzilla.redhat.com/show_bug.cgi?id=1462350

You might go back to the user and see if this triggers the problem.  You can workaround it by simply doing 'title = value' as the search instead of just 'value'.
Comment 8 Lukas Zapletal 2017-07-11 05:43:26 EDT
We are working on getting this into foreman-maintain package.

https://github.com/theforeman/foreman_maintain/pull/78

Ivan will have more details on when this will be available downstream
Comment 9 Bryan Kearney 2017-08-08 15:21:19 EDT
Comment 6 has a workaround, so I am reducing this to high from urgent.

Note You need to log in before you can comment on or make changes to this bug.