Created attachment 1287066 [details] Passenger Recycler script Description of problem: Repeated invocation of oom-killer on Satellite 6 processes Cu has applied the fix provided in BZ 1447958 as well as updated the system to the latest one. However, There are still OOM errors on the satellite system. We have got a temporary fix from our senior satellite team member (Lukas Zapletal) which will clear the will recycle passenger worker processes in case they eat more than 2 GB of RAM. Cu has deployed this and running it from cron hourly to make sure processes are terminated cleanly rather than via hard OOM kill. https://gist.github.com/lzap/8dddbe66ec8d43cbd4277c1de7045c17 Version-Release number of selected component (if applicable): Red Hat Satellite 6.2.9 Cu is looking forward to fix this ASAP.
Have a customer using this that has reported the following around the script from comment 1: (shows a significant improvement over the previous numbers) Also, After the script: ------ Passenger processes ------ PID VMSize Private Name --------------------------------- 4969 2837.9 MB 655.6 MB Passenger RackApp: /usr/share/foreman 4970 301.6 MB 67.4 MB Passenger RackApp: /etc/puppet/rack 11386 216.1 MB 0.3 MB PassengerWatchdog 11389 1526.0 MB 5.2 MB PassengerHelperAgent 11395 216.3 MB 0.9 MB PassengerLoggingAgent 12561 2830.8 MB 658.2 MB Passenger RackApp: /usr/share/foreman 12602 2830.8 MB 644.3 MB Passenger RackApp: /usr/share/foreman 12640 2830.9 MB 650.4 MB Passenger RackApp: /usr/share/foreman 14864 2768.8 MB 591.5 MB Passenger RackApp: /usr/share/foreman 15122 2831.4 MB 638.0 MB Passenger RackApp: /usr/share/foreman 16977 2831.5 MB 618.4 MB Passenger RackApp: /usr/share/foreman 20832 687.5 MB 210.1 MB Passenger AppPreloader: /usr/share/foreman 22590 2773.9 MB 601.3 MB Passenger RackApp: /usr/share/foreman 25490 2710.1 MB 500.0 MB Passenger RackApp: /usr/share/foreman 28654 2709.8 MB 528.3 MB Passenger RackApp: /usr/share/foreman 30521 2838.0 MB 604.9 MB Passenger RackApp: /usr/share/foreman I took the exit 1 from it which meant it ran over only one process per run! -------- Previous numbers: ------ Passenger processes ------- PID VMSize Private Name ---------------------------------- 4970 301.6 MB 62.7 MB Passenger RackApp: /etc/puppet/rack 11386 216.1 MB 0.3 MB PassengerWatchdog 11389 1526.0 MB 4.6 MB PassengerHelperAgent 11395 216.3 MB 0.9 MB PassengerLoggingAgent 12529 7246.8 MB 5083.6 MB Passenger RackApp: /usr/share/foreman 12561 2830.8 MB 655.1 MB Passenger RackApp: /usr/share/foreman 12602 2830.8 MB 629.3 MB Passenger RackApp: /usr/share/foreman 12640 2830.9 MB 648.2 MB Passenger RackApp: /usr/share/foreman 13845 3471.0 MB 1278.0 MB Passenger RackApp: /usr/share/foreman 14548 7568.5 MB 5376.6 MB Passenger RackApp: /usr/share/foreman 14864 2768.8 MB 597.4 MB Passenger RackApp: /usr/share/foreman 15122 2831.4 MB 644.9 MB Passenger RackApp: /usr/share/foreman 16977 2829.5 MB 617.2 MB Passenger RackApp: /usr/share/foreman 18277 7504.7 MB 5318.3 MB Passenger RackApp: /usr/share/foreman 26818 7568.9 MB 5318.7 MB Passenger RackApp: /usr/share/foreman
Quick update from my customer overnight: One of the processes increased the memory usage to 5611M. # passenger-status Version : 4.0.18 Date : 2017-06-13 10:31:07 +0200 Instance: 11355 ----------- General information ----------- Max pool size : 12 Processes : 12 Requests in top-level queue : 0 ----------- Application groups ----------- /usr/share/foreman#default: App root: /usr/share/foreman Requests in queue: 0 * PID: 12561 Sessions: 1 Processed: 13622 Uptime: 18h 9m 33s CPU: 8% Memory : 664M Last used: 1s ago * PID: 12602 Sessions: 0 Processed: 13340 Uptime: 18h 9m 32s CPU: 7% Memory : 654M Last used: 2s ago * PID: 12640 Sessions: 1 Processed: 13651 Uptime: 18h 9m 32s CPU: 8% Memory : 1381M Last used: 1s ago * PID: 14864 Sessions: 1 Processed: 13776 Uptime: 18h 9m 3s CPU: 8% Memory : 607M Last used: 1s ago * PID: 16977 Sessions: 1 Processed: 12619 Uptime: 18h 8m 59s CPU: 7% Memory : 642M Last used: 1s ago * PID: 28654 Sessions: 0 Processed: 8860 Uptime: 12h 45m 41s CPU: 7% Memory : 5611M Last used: 1s ago * PID: 22590 Sessions: 0 Processed: 8197 Uptime: 12h 43m 4s CPU: 7% Memory : 628M Last used: 2s ago * PID: 4793 Sessions: 0 Processed: 2977 Uptime: 4h 28m 46s CPU: 8% Memory : 691M Last used: 2s ago * PID: 26786 Sessions: 1 Processed: 291 Uptime: 28m 46s CPU: 7% Memory : 524M Last used: 1s ago * PID: 6874 Sessions: 1 Processed: 270 Uptime: 27m 4s CPU: 7% Memory : 523M Last used: 2s ago * PID: 32696 Sessions: 1 Processed: 284 Uptime: 25m 4s CPU: 8% Memory : 611M Last used: 1s ago /etc/puppet/rack#default: App root: /etc/puppet/rack Requests in queue: 0 * PID: 4970 Sessions: 0 Processed: 1013 Uptime: 18h 1m 2s CPU: 0% Memory : 68M Last used: 54s ago ------- then..... ------- Looks like passenger-recycler.rb did a job here: /usr/share/foreman#default: App root: /usr/share/foreman Requests in queue: 0 * PID: 12561 Sessions: 0 Processed: 13936 Uptime: 18h 45m 54s CPU: 8% Memory : 1356M Last used: 17s ago * PID: 12602 Sessions: 0 Processed: 13784 Uptime: 18h 45m 53s CPU: 7% Memory : 654M Last used: 12s ago * PID: 12640 Sessions: 0 Processed: 14100 Uptime: 18h 45m 53s CPU: 9% Memory : 1387M Last used: 26s ago * PID: 14864 Sessions: 0 Processed: 14212 Uptime: 18h 45m 24s CPU: 8% Memory : 610M Last used: 20s ago * PID: 16977 Sessions: 0 Processed: 13188 Uptime: 18h 45m 20s CPU: 7% Memory : 642M Last used: 6s ago * PID: 22590 Sessions: 0 Processed: 8728 Uptime: 13h 19m 25s CPU: 7% Memory : 677M Last used: 20s ago * PID: 4793 Sessions: 0 Processed: 3361 Uptime: 5h 5m 7s CPU: 8% Memory : 691M Last used: 21s ago * PID: 26786 Sessions: 0 Processed: 806 Uptime: 1h 5m 7s CPU: 9% Memory : 529M Last used: 19s ago * PID: 6874 Sessions: 0 Processed: 690 Uptime: 1h 3m 25s CPU: 8% Memory : 527M Last used: 21s ago * PID: 32696 Sessions: 0 Processed: 752 Uptime: 1h 1m 25s CPU: 8% Memory : 616M Last used: 20s ago * PID: 20449 Sessions: 0 Processed: 35 Uptime: 6m 3s CPU: 6% Memory : 711M Last used: 19s ago -------- So clearly the script as a workaround is working. Continuing to monitor to make sure this stays the case. Customer is collecting free, passenger status, and load every 5 minutes. Are there other pieces of data that are needed to see why these processes are balooning in memory? Cheers, Karl
Rajan, Working on Karl's case, we discovered that if the user has lots of hostgroups and lots of associated puppetclasses, performing a search on the hostgroup page or view the api resulted in a massive spike of memory. To reproduce you'd just: 1. go to Config > hostsgroups 2. Type the name of some hostgroup in the box 'foobar' 3. Hit search the request would take a long time, and cause a HUGE spike in memory. Because i'm not sure if it is the same case, i've opened another BZ https://bugzilla.redhat.com/show_bug.cgi?id=1462350 You might go back to the user and see if this triggers the problem. You can workaround it by simply doing 'title = value' as the search instead of just 'value'.
We are working on getting this into foreman-maintain package. https://github.com/theforeman/foreman_maintain/pull/78 Ivan will have more details on when this will be available downstream
Comment 6 has a workaround, so I am reducing this to high from urgent.
@Ivan, Do you still the required information. We have very few cases of this kind. Issue mostly on the content host which recently got resolved by errata. Had this worked on in Satellite 6.3? Regards, Rajan