Bug 1648412
| Summary: | Clicking on templates or Hosts causes puma memory to grow until it crashes | |||
|---|---|---|---|---|
| Product: | Red Hat CloudForms Management Engine | Reporter: | Ryan Spagnola <rspagnol> | |
| Component: | Performance | Assignee: | Nick LaMuro <nlamuro> | |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Jaroslav Henner <jhenner> | |
| Severity: | high | Docs Contact: | Red Hat CloudForms Documentation <cloudforms-docs> | |
| Priority: | high | |||
| Version: | 5.8.3 | CC: | abellott, dmetzger, gekis, jdeubel, jprause, jrafanie, kbrock, mshriver, nlamuro, obarenbo, rspagnol, simaishi, tuado | |
| Target Milestone: | GA | Keywords: | Reopened, TestOnly, ZStream | |
| Target Release: | 5.11.0 | |||
| Hardware: | All | |||
| OS: | All | |||
| Whiteboard: | ||||
| Fixed In Version: | 5.11.0.1 | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1686043 1690610 (view as bug list) | Environment: | ||
| Last Closed: | 2019-12-13 15:00:34 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | Bug | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | CFME Core | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1679295, 1686043, 1690610 | |||
| 
 
        
          Description
        
        
          Ryan Spagnola
        
        
        
        
        
          2018-11-09 16:34:45 UTC
        
       
      
      
      
    The customer has pointed out the following: I have noticed that now templates have a relationship to every VM that has been provisioned from them. I think this is our issue when navigating to Compute -> Infrastructure -> Virtual Machines then clicking on templates and searching for a template name. This action also causes puma to spike memory past 5GB until the process dies. This hasn't been repeatable every time like the issue when clicking on hosts. The offending controller and action is EmsClusterController#show:
appliance 02 logs/production.log:[----] I, [2018-11-07T11:10:57.997723 #18268:f4b0f0]  INFO -- : Started GET "/ems_cluster/show/12300000000115?display=hosts" for [REDACTED] at 2018-11-07 11:10:57 -0500
appliance 02 logs/production.log:[----] I, [2018-11-07T11:10:58.001356 #18268:f4b0f0]  INFO -- : Processing by EmsClusterController#show as HTML
appliance 02 logs/production.log:[----] I, [2018-11-07T11:10:58.001492 #18268:f4b0f0]  INFO -- :   Parameters: {"display"=>"hosts", "id"=>"12300000000115"}
appliance 02 logs/production.log:[----] I, [2018-11-07T12:19:34.912987 #34255:f1e154]  INFO -- : Started GET "/ems_cluster/show/12300000000117?display=hosts" for [REDACTED] at 2018-11-07 12:19:34 -0500
appliance 02 logs/production.log:[----] I, [2018-11-07T12:19:34.916055 #34255:f1e154]  INFO -- : Processing by EmsClusterController#show as HTML
appliance 02 logs/production.log:[----] I, [2018-11-07T12:19:34.916188 #34255:f1e154]  INFO -- :   Parameters: {"display"=>"hosts", "id"=>"1230000000117"}
These don't complete before the worker exceeds 3.2 GB and gets killed.
    Hello Ryan, Has the customer gotten back on whether upgrading from 5.8.3 to 5.8.5 fixes this issue? We have been finding minor issues, but they only will give modest gains. If the problem is not solved, I want to start looking for larger issues with this report. Thanks, Keenan Just an FYI, I started taking a look at this on Friday, and will be continuing that again today. https://github.com/ManageIQ/manageiq-ui-classic/pull/5283 has been merged and should address this issue. *** Bug 1679276 has been marked as a duplicate of this bug. *** I don't know how to reproduce this. The seed script fails [root@dhcp-8-198-68 vmdb]# bin/rails r bz_1648412_replication_database_seed.rb Creating Storage records... EMS records created! Creating Cluster records... Creating Tag Records... Creating Host records... Traceback (most recent call last): 12: from bin/rails:4:in `<main>' 11: from bin/rails:4:in `require' 10: from /opt/rh/cfme-gemset/gems/railties-5.1.7/lib/rails/commands.rb:16:in `<top (required)>' 9: from /opt/rh/cfme-gemset/gems/railties-5.1.7/lib/rails/command.rb:44:in `invoke' 8: from /opt/rh/cfme-gemset/gems/railties-5.1.7/lib/rails/command/base.rb:63:in `perform' 7: from /opt/rh/cfme-gemset/gems/thor-0.20.3/lib/thor.rb:387:in `dispatch' 6: from /opt/rh/cfme-gemset/gems/thor-0.20.3/lib/thor/invocation.rb:126:in `invoke_command' 5: from /opt/rh/cfme-gemset/gems/thor-0.20.3/lib/thor/command.rb:27:in `run' 4: from /opt/rh/cfme-gemset/gems/railties-5.1.7/lib/rails/commands/runner/runner_command.rb:34:in `perform' 3: from /opt/rh/cfme-gemset/gems/railties-5.1.7/lib/rails/commands/runner/runner_command.rb:34:in `load' 2: from bz_1648412_replication_database_seed.rb:119:in `<top (required)>' 1: from bz_1648412_replication_database_seed.rb:119:in `times' bz_1648412_replication_database_seed.rb:127:in `block in <top (required)>': undefined method `sub' for nil:NilClass (NoMethodError) Jaroslav, Thanks for the update (and sorry for the delay). I was able to reproduce your issue and just updated the replication script: https://gist.github.com/NickLaMuro/f36576f9fa73c2bcc0d1ef7b7d0adafb As there were a few oversights, but that should now be working. -Nick with the Nikcs benchmark tool that I had to first convince to talk to localhost with https:
cfme-5.11.0.16-1.el8cf.x86_64:
...
Started POST "/vm_infra/report_data" for 127.0.0.1 at 2019-07-25 08:38:24 -0400
Processing by VmInfraController#report_data as HTML
  Parameters: {"page"=>"1", "ppsetting"=>"20"}
PostgreSQLAdapter#log_after_checkout, connection_pool: size: 5, connections: 1, in use: 1, waiting_in_queue: 0
Completed 200 OK in 3603ms (Views: 8.3ms | ActiveRecord: 324.7ms | Rows: 2028)
PostgreSQLAdapter#log_after_checkin, connection_pool: size: 5, connections: 1, in use: 0, waiting_in_queue: 0
Started POST "/vm_infra/report_data" for 127.0.0.1 at 2019-07-25 08:38:28 -0400
Processing by VmInfraController#report_data as HTML
  Parameters: {"page"=>"2", "ppsetting"=>"20"}
PostgreSQLAdapter#log_after_checkout, connection_pool: size: 5, connections: 1, in use: 1, waiting_in_queue: 0
Completed 200 OK in 1098ms (Views: 6.7ms | ActiveRecord: 253.6ms | Rows: 2027)
PostgreSQLAdapter#log_after_checkin, connection_pool: size: 5, connections: 1, in use: 0, waiting_in_queue: 0
cfme-5.10.1.2-2.el7cf.x86_64:
Started POST "/vm_infra/report_data" for 127.0.0.1 at 2019-07-25 08:31:30 -0400
Processing by VmInfraController#report_data as HTML
  Parameters: {"page"=>"1", "ppsetting"=>"20"}
PostgreSQLAdapter#log_after_checkout, connection_pool: size: 5, connections: 1, in use: 1, waiting_in_queue: 0
Completed 200 OK in 11385ms (Views: 8.8ms | ActiveRecord: 1945.0ms | Rows: 35328)
PostgreSQLAdapter#log_after_checkin, connection_pool: size: 5, connections: 1, in use: 0, waiting_in_queue: 0
Started POST "/vm_infra/report_data" for 127.0.0.1 at 2019-07-25 08:31:42 -0400
Processing by VmInfraController#report_data as HTML
  Parameters: {"page"=>"2", "ppsetting"=>"20"}
PostgreSQLAdapter#log_after_checkout, connection_pool: size: 5, connections: 1, in use: 1, waiting_in_queue: 0
Completed 200 OK in 45464ms (Views: 38.4ms | ActiveRecord: 5898.7ms | Rows: 125327)
PostgreSQLAdapter#log_after_checkin, connection_pool: size: 5, connections: 1, in use: 0, waiting_in_queue: 0
[root@dhcp-8-197-130 vmdb]# rpm -q cfme
... I see something like a 45x better time than for the POST request. I think I can call this VERIFIED.
     |