Hide Forgot
...
Description of problem: After some experiments (lots of these failed for various reasons) on big setup (10k clients), ruby on our satellite started to consume lots of memory and creates continuous load 1 or so Version-Release number of selected component (if applicable): Satellite 6.2 @ RHEL 7.2 How reproducible: always on our setup Steps to Reproduce: 1. Did a lots of things: thousands of systems registered and deleted, capsules installed and used, errata applies performed. Lot of various failures encountered. 2. Restart Satellite and Capsules 3. Measure RSS memory consumed by ruby 4. Content -> Errata -> select one and Apply (in our case it was applicable to almost 9k systems named like "ip-10-1-*-0container*...") 5. In the search bar enter: "( name !~ ip-10-1-21 AND name !~ ip-10-1-22 ) AND ( name ~ ip-10-1-2 OR name ~ ip-10-1-3 OR name ~ ip-10-1-4 ) ip-10-1-40-0container100.example.com" - note that is not a valid search pattern 6. Wait ~10 minutes and measure RSS memory consumed by ruby Actual results: In step 3: # ps --no-headers -eo rss,comm>a; for comm in $( sed 's/^\s*[0-9]\+\s*\(.*\)$/\1/' a | sort -u ); do size=$( grep "\s$comm" a | sed 's/^\s*\([0-9]\+\)\s*.*$/\1/' | paste -sd+ - | bc ); echo "$size $comm"; done | sort -n | tail 51840 puppet 67544 systemd-journal 77600 systemd 180036 qdrouterd 220800 postgres 609248 httpd 813584 java 1083124 celery 1115768 ruby 3784328 qpidd In step 6: # ps --no-headers -eo rss,comm>a; for comm in $( sed 's/^\s*[0-9]\+\s*\(.*\)$/\1/' a | sort -u ); do size=$( grep "\s$comm" a | sed 's/^\s*\([0-9]\+\)\s*.*$/\1/' | paste -sd+ - | bc ); echo "$size $comm"; done | sort -n | tail 51840 puppet 67716 systemd-journal 77772 systemd 180088 qdrouterd 441724 postgres 612740 httpd 815328 java 1089396 celery 3672056 ruby Expected results: Ruby memory consumption should not go this much up. Additional info: I have also seen: # ps --no-headers -eo rss,comm>a; for comm in $( sed 's/^\s*[0-9]\+\s*\(.*\)$/\1/' a | sort -u ); do size=$( grep "\s$comm" a | sed 's/^\s*\([0-9]\+\)\s*.*$/\1/' | paste -sd+ - | bc ); echo "$size $comm"; done | sort -n | tail 51728 puppet 64840 systemd-journal 72904 systemd 177912 qdrouterd 365300 postgres 548068 httpd 799376 java 1016524 celery 4044140 qpidd 47191732 ruby
You it be possible to use `ps --no-headers -eo rss,cmd` at the input for the script you use for reporting the numbers to see, what processes exactly grow, to see, if it might be connected to https://bugzilla.redhat.com/show_bug.cgi?id=1362168 or https://bugzilla.redhat.com/show_bug.cgi?id=1368103
Needed some more tweaks, but result for healthy satellite which did a lots of work (6k registered, 6k errata updates) by its last restart is: # export IFS=$'\x0A'$'\x0D'; ps --no-headers -eo rss,cmd>a; for comm in $( sed 's/^\s*[0-9]\+\s*\(.*\)$/\1/' a | sort -u ); do size=$( comm="$( echo "$comm" | sed -e 's/\[/./g' -e 's/\]/./g' )"; grep "\s$comm" a | sed 's/^\s*\([0-9]\+\)\s*.*$/\1/' | paste -sd+ - | bc ); echo "$size $comm"; done | sort -n | tail 193080 /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-0@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-0.pid --heartbeat-interval=30 221504 /usr/sbin/httpd -DFOREGROUND 613168 Passenger RackApp: /usr/share/foreman 657288 /usr/bin/mongod --quiet -f /etc/mongod.conf run 693956 /usr/sbin/qdrouterd -c /etc/qpid-dispatch/qdrouterd.conf 762800 postgres: foreman foreman [local] idle 1008468 dynflow_executor 1147116 java -classpath /usr/share/tomcat/bin/bootstrap.jar:/usr/share/tomcat/bin/tomcat-juli.jar:/usr/share/java/commons-daemon.jar -Dcatalina.base=/usr/share/tomcat -Dcatalina.home=/usr/share/tomcat -Djava.endorsed.dirs= -Djava.io.tmpdir=/var/cache/tomcat/temp -Djava.util.logging.config.file=/usr/share/tomcat/conf/logging.properties -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager org.apache.catalina.startup.Bootstrap start 1700768 (wsgi:pulp) -DFOREGROUND 14866176 /usr/sbin/qpidd --config /etc/qpid/qpidd.conf Once Pradeep hits the issue again, we will gather this again.
Slightly better (because of later WTH moments): # OLD_IFS=$IFS; export IFS=$'\x0A'$'\x0D'; ps --no-headers -eo rss,cmd>a; for comm in $( sed 's/^\s*[0-9]\+\s*\(.*\)$/\1/' a | sort -u ); do size=$( comm="$( echo "$comm" | sed -e 's/\[/./g' -e 's/\]/./g' )"; grep "\s$comm" a | sed 's/^\s*\([0-9]\+\)\s*.*$/\1/' | paste -sd+ - | bc ); echo "$size $comm"; done | sort -n | tail; export IFS=$OLD_IFS
Putting needinfo back. Sorry for a mess.
Currently we do not have an environment to reproduce this :-/
I was not able to repro this either given the data set size, but it may be related to https://bugzilla.redhat.com/show_bug.cgi?id=1380486. For example, 'grep Completed var/log/foreman/production.log | sort -k 9n' shows: 2016-08-22 07:31:51 [app] [I] Completed 200 OK in 10362ms (Views: 1879.4ms | ActiveRecord: 8400.6ms) 2016-08-22 07:39:14 [app] [I] Completed 200 OK in 10385ms (Views: 1604.2ms | ActiveRecord: 8735.7ms) 2016-08-22 07:30:39 [app] [I] Completed 200 OK in 11041ms (Views: 2756.5ms | ActiveRecord: 8199.0ms) 2016-08-22 07:38:02 [app] [I] Completed 200 OK in 11110ms (Views: 1640.5ms | ActiveRecord: 9412.1ms) 2016-08-22 07:40:27 [app] [I] Completed 200 OK in 11663ms (Views: 1920.0ms | ActiveRecord: 9692.8ms) 2016-08-22 07:23:18 [app] [I] Completed 200 OK in 11780ms (Views: 1572.2ms | ActiveRecord: 10116.9ms) 2016-08-22 07:26:59 [app] [I] Completed 200 OK in 11845ms (Views: 1682.3ms | ActiveRecord: 10103.4ms) 2016-08-22 07:20:52 [app] [I] Completed 200 OK in 11936ms (Views: 1884.0ms | ActiveRecord: 9993.4ms) 2016-08-22 07:28:12 [app] [I] Completed 200 OK in 11942ms (Views: 1736.0ms | ActiveRecord: 10145.6ms) 2016-08-22 07:19:39 [app] [I] Completed 200 OK in 12008ms (Views: 1855.3ms | ActiveRecord: 10101.8ms) 2016-08-22 07:35:36 [app] [I] Completed 200 OK in 12141ms (Views: 2036.3ms | ActiveRecord: 9959.6ms) 2016-08-22 07:29:27 [app] [I] Completed 200 OK in 12220ms (Views: 1728.0ms | ActiveRecord: 10405.7ms) 2016-08-22 07:22:05 [app] [I] Completed 200 OK in 12224ms (Views: 1779.6ms | ActiveRecord: 10388.5ms) 2016-08-22 07:25:46 [app] [I] Completed 200 OK in 12401ms (Views: 2183.6ms | ActiveRecord: 10143.7ms) 2016-08-22 07:24:32 [app] [I] Completed 200 OK in 12413ms (Views: 1824.8ms | ActiveRecord: 10528.5ms) 2016-08-22 07:34:22 [app] [I] Completed 200 OK in 12530ms (Views: 2366.7ms | ActiveRecord: 10050.6ms) 2016-08-22 07:36:50 [app] [I] Completed 200 OK in 12548ms (Views: 2123.5ms | ActiveRecord: 10362.2ms) 2016-08-22 07:15:09 [app] [I] Completed 200 OK in 12574ms (Views: 2203.0ms | ActiveRecord: 10315.4ms) 2016-08-22 07:33:09 [app] [I] Completed 200 OK in 16162ms (Views: 2529.6ms | ActiveRecord: 13558.0ms) 2016-08-22 07:30:27 [app] [I] Completed 200 OK in 17618ms (Views: 28.4ms | ActiveRecord: 17564.5ms) 2016-08-22 15:58:51 [app] [I] Completed 200 OK in 17619ms (Views: 23.1ms | ActiveRecord: 17582.7ms) 2016-08-22 07:18:25 [app] [I] Completed 200 OK in 18526ms (Views: 2000.5ms | ActiveRecord: 16459.7ms) 2016-08-22 07:36:38 [app] [I] Completed 200 OK in 20549ms (Views: 24.7ms | ActiveRecord: 20511.9ms) 2016-08-22 07:34:13 [app] [I] Completed 200 OK in 21723ms (Views: 22.9ms | ActiveRecord: 21686.2ms) 2016-08-22 07:33:33 [app] [I] Completed 200 OK in 22245ms (Views: 32.7ms | ActiveRecord: 22170.0ms) 2016-08-22 07:32:53 [app] [I] Completed 200 OK in 22806ms (Views: 61.3ms | ActiveRecord: 22723.2ms) 2016-08-22 15:37:16 [app] [I] Completed 200 OK in 96733ms (Views: 61.7ms | ActiveRecord: 96401.8ms) 2016-08-22 15:51:49 [app] [I] Completed 200 OK in 103118ms (Views: 127.4ms | ActiveRecord: 102920.8ms) 2016-08-22 16:00:58 [app] [I] Completed 200 OK in 127748ms (Views: 172.9ms | ActiveRecord: 127109.4ms) 2016-08-22 15:19:03 [app] [I] Completed 200 OK in 177412ms (Views: 90.3ms | ActiveRecord: 177104.9ms) These all appear to be related to /api/v2/hosts/auto_complete_search