Bug 1369059

Summary: ruby memory consumption growing without known reason
Product: Red Hat Satellite Reporter: Jan Hutař <jhutar>
Component: InfrastructureAssignee: satellite6-bugs <satellite6-bugs>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Katello QA List <katello-qa-list>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 6.2.0CC: adprice, bbuckingham, bkearney, brubisch, cduryee, inecas, jhutar, psuriset
Target Milestone: UnspecifiedKeywords: Performance
Target Release: Unused   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-01-03 12:27:27 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Jan Hutař 2016-08-22 12:06:29 UTC
...

Comment 1 Jan Hutař 2016-08-22 20:05:19 UTC
Description of problem:
After some experiments (lots of these failed for various reasons) on big setup (10k clients), ruby on our satellite started to consume lots of memory and creates continuous load 1 or so


Version-Release number of selected component (if applicable):
Satellite 6.2 @ RHEL 7.2


How reproducible:
always on our setup


Steps to Reproduce:
1. Did a lots of things: thousands of systems registered and deleted, capsules installed and used, errata applies performed. Lot of various failures encountered.
2. Restart Satellite and Capsules
3. Measure RSS memory consumed by ruby
4. Content -> Errata -> select one and Apply
   (in our case it was applicable to almost 9k systems named like
   "ip-10-1-*-0container*...")
5. In the search bar enter: "( name !~ ip-10-1-21 AND name !~ ip-10-1-22 ) AND ( name ~ ip-10-1-2 OR name ~ ip-10-1-3 OR name ~ ip-10-1-4 ) ip-10-1-40-0container100.example.com" - note that is not a valid search pattern
6. Wait ~10 minutes and measure RSS memory consumed by ruby


Actual results:
In step 3:
# ps --no-headers -eo rss,comm>a; for comm in $( sed 's/^\s*[0-9]\+\s*\(.*\)$/\1/' a | sort -u ); do size=$( grep "\s$comm" a | sed 's/^\s*\([0-9]\+\)\s*.*$/\1/' | paste -sd+ - | bc ); echo "$size $comm"; done | sort -n | tail
51840 puppet
67544 systemd-journal
77600 systemd
180036 qdrouterd
220800 postgres
609248 httpd
813584 java
1083124 celery
1115768 ruby
3784328 qpidd

In step 6:
# ps --no-headers -eo rss,comm>a; for comm in $( sed 's/^\s*[0-9]\+\s*\(.*\)$/\1/' a | sort -u ); do size=$( grep "\s$comm" a | sed 's/^\s*\([0-9]\+\)\s*.*$/\1/' | paste -sd+ - | bc ); echo "$size $comm"; done | sort -n | tail
51840 puppet
67716 systemd-journal
77772 systemd
180088 qdrouterd
441724 postgres
612740 httpd
815328 java
1089396 celery
3672056 ruby


Expected results:
Ruby memory consumption should not go this much up.


Additional info:
I have also seen:
# ps --no-headers -eo rss,comm>a; for comm in $( sed 's/^\s*[0-9]\+\s*\(.*\)$/\1/' a | sort -u ); do size=$( grep "\s$comm" a | sed 's/^\s*\([0-9]\+\)\s*.*$/\1/' | paste -sd+ - | bc ); echo "$size $comm"; done | sort -n | tail
51728 puppet
64840 systemd-journal
72904 systemd
177912 qdrouterd
365300 postgres
548068 httpd
799376 java
1016524 celery
4044140 qpidd
47191732 ruby

Comment 5 Ivan Necas 2016-08-24 06:59:11 UTC
You it be possible to use `ps --no-headers -eo rss,cmd` at the input for the script you use for reporting the numbers to see, what processes exactly grow, to see, if it might be connected to https://bugzilla.redhat.com/show_bug.cgi?id=1362168 or https://bugzilla.redhat.com/show_bug.cgi?id=1368103

Comment 6 Jan Hutař 2016-08-24 09:11:35 UTC
Needed some more tweaks, but result for healthy satellite which did a lots of work (6k registered, 6k errata updates) by its last restart is:

# export IFS=$'\x0A'$'\x0D'; ps --no-headers -eo rss,cmd>a; for comm in $( sed 's/^\s*[0-9]\+\s*\(.*\)$/\1/' a | sort -u ); do size=$( comm="$( echo "$comm" | sed -e 's/\[/./g' -e 's/\]/./g' )"; grep "\s$comm" a | sed 's/^\s*\([0-9]\+\)\s*.*$/\1/' | paste -sd+ - | bc ); echo "$size $comm"; done | sort -n | tail

193080 /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-0@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-0.pid --heartbeat-interval=30
221504 /usr/sbin/httpd -DFOREGROUND
613168 Passenger RackApp: /usr/share/foreman
657288 /usr/bin/mongod --quiet -f /etc/mongod.conf run
693956 /usr/sbin/qdrouterd -c /etc/qpid-dispatch/qdrouterd.conf
762800 postgres: foreman foreman [local] idle
1008468 dynflow_executor
1147116 java -classpath /usr/share/tomcat/bin/bootstrap.jar:/usr/share/tomcat/bin/tomcat-juli.jar:/usr/share/java/commons-daemon.jar -Dcatalina.base=/usr/share/tomcat -Dcatalina.home=/usr/share/tomcat -Djava.endorsed.dirs= -Djava.io.tmpdir=/var/cache/tomcat/temp -Djava.util.logging.config.file=/usr/share/tomcat/conf/logging.properties -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager org.apache.catalina.startup.Bootstrap start
1700768 (wsgi:pulp)     -DFOREGROUND
14866176 /usr/sbin/qpidd --config /etc/qpid/qpidd.conf

Once Pradeep hits the issue again, we will gather this again.

Comment 7 Jan Hutař 2016-08-24 09:13:16 UTC
Slightly better (because of later WTH moments):

# OLD_IFS=$IFS; export IFS=$'\x0A'$'\x0D'; ps --no-headers -eo rss,cmd>a; for comm in $( sed 's/^\s*[0-9]\+\s*\(.*\)$/\1/' a | sort -u ); do size=$( comm="$( echo "$comm" | sed -e 's/\[/./g' -e 's/\]/./g' )"; grep "\s$comm" a | sed 's/^\s*\([0-9]\+\)\s*.*$/\1/' | paste -sd+ - | bc ); echo "$size $comm"; done | sort -n | tail; export IFS=$OLD_IFS

Comment 8 Jan Hutař 2016-08-24 09:23:51 UTC
Putting needinfo back. Sorry for a mess.

Comment 9 Jan Hutař 2016-09-05 12:59:16 UTC
Currently we do not have an environment to reproduce this :-/

Comment 10 Chris Duryee 2016-09-29 19:40:31 UTC
I was not able to repro this either given the data set size, but it may be related to https://bugzilla.redhat.com/show_bug.cgi?id=1380486.

For example, 'grep Completed var/log/foreman/production.log | sort -k 9n' shows:

2016-08-22 07:31:51 [app] [I] Completed 200 OK in 10362ms (Views: 1879.4ms | ActiveRecord: 8400.6ms)
2016-08-22 07:39:14 [app] [I] Completed 200 OK in 10385ms (Views: 1604.2ms | ActiveRecord: 8735.7ms)
2016-08-22 07:30:39 [app] [I] Completed 200 OK in 11041ms (Views: 2756.5ms | ActiveRecord: 8199.0ms)
2016-08-22 07:38:02 [app] [I] Completed 200 OK in 11110ms (Views: 1640.5ms | ActiveRecord: 9412.1ms)
2016-08-22 07:40:27 [app] [I] Completed 200 OK in 11663ms (Views: 1920.0ms | ActiveRecord: 9692.8ms)
2016-08-22 07:23:18 [app] [I] Completed 200 OK in 11780ms (Views: 1572.2ms | ActiveRecord: 10116.9ms)
2016-08-22 07:26:59 [app] [I] Completed 200 OK in 11845ms (Views: 1682.3ms | ActiveRecord: 10103.4ms)
2016-08-22 07:20:52 [app] [I] Completed 200 OK in 11936ms (Views: 1884.0ms | ActiveRecord: 9993.4ms)
2016-08-22 07:28:12 [app] [I] Completed 200 OK in 11942ms (Views: 1736.0ms | ActiveRecord: 10145.6ms)
2016-08-22 07:19:39 [app] [I] Completed 200 OK in 12008ms (Views: 1855.3ms | ActiveRecord: 10101.8ms)
2016-08-22 07:35:36 [app] [I] Completed 200 OK in 12141ms (Views: 2036.3ms | ActiveRecord: 9959.6ms)
2016-08-22 07:29:27 [app] [I] Completed 200 OK in 12220ms (Views: 1728.0ms | ActiveRecord: 10405.7ms)
2016-08-22 07:22:05 [app] [I] Completed 200 OK in 12224ms (Views: 1779.6ms | ActiveRecord: 10388.5ms)
2016-08-22 07:25:46 [app] [I] Completed 200 OK in 12401ms (Views: 2183.6ms | ActiveRecord: 10143.7ms)
2016-08-22 07:24:32 [app] [I] Completed 200 OK in 12413ms (Views: 1824.8ms | ActiveRecord: 10528.5ms)
2016-08-22 07:34:22 [app] [I] Completed 200 OK in 12530ms (Views: 2366.7ms | ActiveRecord: 10050.6ms)
2016-08-22 07:36:50 [app] [I] Completed 200 OK in 12548ms (Views: 2123.5ms | ActiveRecord: 10362.2ms)
2016-08-22 07:15:09 [app] [I] Completed 200 OK in 12574ms (Views: 2203.0ms | ActiveRecord: 10315.4ms)
2016-08-22 07:33:09 [app] [I] Completed 200 OK in 16162ms (Views: 2529.6ms | ActiveRecord: 13558.0ms)
2016-08-22 07:30:27 [app] [I] Completed 200 OK in 17618ms (Views: 28.4ms | ActiveRecord: 17564.5ms)
2016-08-22 15:58:51 [app] [I] Completed 200 OK in 17619ms (Views: 23.1ms | ActiveRecord: 17582.7ms)
2016-08-22 07:18:25 [app] [I] Completed 200 OK in 18526ms (Views: 2000.5ms | ActiveRecord: 16459.7ms)
2016-08-22 07:36:38 [app] [I] Completed 200 OK in 20549ms (Views: 24.7ms | ActiveRecord: 20511.9ms)
2016-08-22 07:34:13 [app] [I] Completed 200 OK in 21723ms (Views: 22.9ms | ActiveRecord: 21686.2ms)
2016-08-22 07:33:33 [app] [I] Completed 200 OK in 22245ms (Views: 32.7ms | ActiveRecord: 22170.0ms)
2016-08-22 07:32:53 [app] [I] Completed 200 OK in 22806ms (Views: 61.3ms | ActiveRecord: 22723.2ms)
2016-08-22 15:37:16 [app] [I] Completed 200 OK in 96733ms (Views: 61.7ms | ActiveRecord: 96401.8ms)
2016-08-22 15:51:49 [app] [I] Completed 200 OK in 103118ms (Views: 127.4ms | ActiveRecord: 102920.8ms)
2016-08-22 16:00:58 [app] [I] Completed 200 OK in 127748ms (Views: 172.9ms | ActiveRecord: 127109.4ms)
2016-08-22 15:19:03 [app] [I] Completed 200 OK in 177412ms (Views: 90.3ms | ActiveRecord: 177104.9ms)


These all appear to be related to /api/v2/hosts/auto_complete_search