Description of problem: Clicking on templates or Hosts causes puma memory to grow until it crashes Version-Release number of selected component (if applicable): 5.8.3.4 How reproducible: Repeatedly Steps to Reproduce: 1. Navigate to Compute -> Infrastructure -> Providers 2. click on clusters 3. click on hosts Actual results: puma crashes Expected results: puma doesn't crash Additional info: Found CPU spikes as well for both appliances happening consistently examples = appliance 01 PID PPID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 40248 1648 root 21 1 771116 355312 3744 S 100.0 1.4 109:25.09 puma 3.3.0 (tcp://127.0.0.1:5000) [MIQ: Web Server Worker] appliance 02 PID PPID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 57374 1581 root 21 1 745544 363932 5012 S 99.6 1.5 109:14.80 puma 3.3.0 (tcp://127.0.0.1:5000) [MIQ: Web Server Worker]
The customer has pointed out the following: I have noticed that now templates have a relationship to every VM that has been provisioned from them. I think this is our issue when navigating to Compute -> Infrastructure -> Virtual Machines then clicking on templates and searching for a template name. This action also causes puma to spike memory past 5GB until the process dies. This hasn't been repeatable every time like the issue when clicking on hosts.
The offending controller and action is EmsClusterController#show: appliance 02 logs/production.log:[----] I, [2018-11-07T11:10:57.997723 #18268:f4b0f0] INFO -- : Started GET "/ems_cluster/show/12300000000115?display=hosts" for [REDACTED] at 2018-11-07 11:10:57 -0500 appliance 02 logs/production.log:[----] I, [2018-11-07T11:10:58.001356 #18268:f4b0f0] INFO -- : Processing by EmsClusterController#show as HTML appliance 02 logs/production.log:[----] I, [2018-11-07T11:10:58.001492 #18268:f4b0f0] INFO -- : Parameters: {"display"=>"hosts", "id"=>"12300000000115"} appliance 02 logs/production.log:[----] I, [2018-11-07T12:19:34.912987 #34255:f1e154] INFO -- : Started GET "/ems_cluster/show/12300000000117?display=hosts" for [REDACTED] at 2018-11-07 12:19:34 -0500 appliance 02 logs/production.log:[----] I, [2018-11-07T12:19:34.916055 #34255:f1e154] INFO -- : Processing by EmsClusterController#show as HTML appliance 02 logs/production.log:[----] I, [2018-11-07T12:19:34.916188 #34255:f1e154] INFO -- : Parameters: {"display"=>"hosts", "id"=>"1230000000117"} These don't complete before the worker exceeds 3.2 GB and gets killed.
Hello Ryan, Has the customer gotten back on whether upgrading from 5.8.3 to 5.8.5 fixes this issue? We have been finding minor issues, but they only will give modest gains. If the problem is not solved, I want to start looking for larger issues with this report. Thanks, Keenan
Just an FYI, I started taking a look at this on Friday, and will be continuing that again today.
Proposed fix: https://github.com/ManageIQ/manageiq-ui-classic/pull/5283
https://github.com/ManageIQ/manageiq-ui-classic/pull/5283 has been merged and should address this issue.
*** Bug 1679276 has been marked as a duplicate of this bug. ***
I don't know how to reproduce this. The seed script fails [root@dhcp-8-198-68 vmdb]# bin/rails r bz_1648412_replication_database_seed.rb Creating Storage records... EMS records created! Creating Cluster records... Creating Tag Records... Creating Host records... Traceback (most recent call last): 12: from bin/rails:4:in `<main>' 11: from bin/rails:4:in `require' 10: from /opt/rh/cfme-gemset/gems/railties-5.1.7/lib/rails/commands.rb:16:in `<top (required)>' 9: from /opt/rh/cfme-gemset/gems/railties-5.1.7/lib/rails/command.rb:44:in `invoke' 8: from /opt/rh/cfme-gemset/gems/railties-5.1.7/lib/rails/command/base.rb:63:in `perform' 7: from /opt/rh/cfme-gemset/gems/thor-0.20.3/lib/thor.rb:387:in `dispatch' 6: from /opt/rh/cfme-gemset/gems/thor-0.20.3/lib/thor/invocation.rb:126:in `invoke_command' 5: from /opt/rh/cfme-gemset/gems/thor-0.20.3/lib/thor/command.rb:27:in `run' 4: from /opt/rh/cfme-gemset/gems/railties-5.1.7/lib/rails/commands/runner/runner_command.rb:34:in `perform' 3: from /opt/rh/cfme-gemset/gems/railties-5.1.7/lib/rails/commands/runner/runner_command.rb:34:in `load' 2: from bz_1648412_replication_database_seed.rb:119:in `<top (required)>' 1: from bz_1648412_replication_database_seed.rb:119:in `times' bz_1648412_replication_database_seed.rb:127:in `block in <top (required)>': undefined method `sub' for nil:NilClass (NoMethodError)
Jaroslav, Thanks for the update (and sorry for the delay). I was able to reproduce your issue and just updated the replication script: https://gist.github.com/NickLaMuro/f36576f9fa73c2bcc0d1ef7b7d0adafb As there were a few oversights, but that should now be working. -Nick
with the Nikcs benchmark tool that I had to first convince to talk to localhost with https: cfme-5.11.0.16-1.el8cf.x86_64: ... Started POST "/vm_infra/report_data" for 127.0.0.1 at 2019-07-25 08:38:24 -0400 Processing by VmInfraController#report_data as HTML Parameters: {"page"=>"1", "ppsetting"=>"20"} PostgreSQLAdapter#log_after_checkout, connection_pool: size: 5, connections: 1, in use: 1, waiting_in_queue: 0 Completed 200 OK in 3603ms (Views: 8.3ms | ActiveRecord: 324.7ms | Rows: 2028) PostgreSQLAdapter#log_after_checkin, connection_pool: size: 5, connections: 1, in use: 0, waiting_in_queue: 0 Started POST "/vm_infra/report_data" for 127.0.0.1 at 2019-07-25 08:38:28 -0400 Processing by VmInfraController#report_data as HTML Parameters: {"page"=>"2", "ppsetting"=>"20"} PostgreSQLAdapter#log_after_checkout, connection_pool: size: 5, connections: 1, in use: 1, waiting_in_queue: 0 Completed 200 OK in 1098ms (Views: 6.7ms | ActiveRecord: 253.6ms | Rows: 2027) PostgreSQLAdapter#log_after_checkin, connection_pool: size: 5, connections: 1, in use: 0, waiting_in_queue: 0 cfme-5.10.1.2-2.el7cf.x86_64: Started POST "/vm_infra/report_data" for 127.0.0.1 at 2019-07-25 08:31:30 -0400 Processing by VmInfraController#report_data as HTML Parameters: {"page"=>"1", "ppsetting"=>"20"} PostgreSQLAdapter#log_after_checkout, connection_pool: size: 5, connections: 1, in use: 1, waiting_in_queue: 0 Completed 200 OK in 11385ms (Views: 8.8ms | ActiveRecord: 1945.0ms | Rows: 35328) PostgreSQLAdapter#log_after_checkin, connection_pool: size: 5, connections: 1, in use: 0, waiting_in_queue: 0 Started POST "/vm_infra/report_data" for 127.0.0.1 at 2019-07-25 08:31:42 -0400 Processing by VmInfraController#report_data as HTML Parameters: {"page"=>"2", "ppsetting"=>"20"} PostgreSQLAdapter#log_after_checkout, connection_pool: size: 5, connections: 1, in use: 1, waiting_in_queue: 0 Completed 200 OK in 45464ms (Views: 38.4ms | ActiveRecord: 5898.7ms | Rows: 125327) PostgreSQLAdapter#log_after_checkin, connection_pool: size: 5, connections: 1, in use: 0, waiting_in_queue: 0 [root@dhcp-8-197-130 vmdb]# rpm -q cfme ... I see something like a 45x better time than for the POST request. I think I can call this VERIFIED.