Bug 1182777 - CFME 5.2.4.2 - UI never returns when attempting to sort 3200 datastores by freespace
Summary: CFME 5.2.4.2 - UI never returns when attempting to sort 3200 datastores by fr...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat CloudForms Management Engine
Classification: Red Hat
Component: Performance
Version: 5.3.0
Hardware: x86_64
OS: Linux
high
high
Target Milestone: GA
: 5.6.0
Assignee: dmetzger
QA Contact: Alex Krzos
URL:
Whiteboard: ui:perf:datastore
Depends On:
Blocks: 1110527 1290184
TreeView+ depends on / blocked
 
Reported: 2015-01-15 21:41 UTC by Thomas Hennessy
Modified: 2019-11-14 06:35 UTC (History)
15 users (show)

Fixed In Version: 5.6.0.4
Doc Type: Bug Fix
Doc Text:
In previous versions of CloudForms, the user interface could not successfully sort thousands of datastores by amount of free space. This task failed because the database could not filter that volume of information, and consequently returned all of the objects to Ruby to process. Additionally, the MIQ_Report code passed/copied the set of objects multiple times during processing, which increased memory and CPU utilization. The code now uses SQL virtual column sorting for this task and succeeds.
Clone Of:
: 1290184 (view as bug list)
Environment:
Last Closed: 2016-06-29 14:51:50 UTC
Category: ---
Cloudforms Team: ---
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
full production log with rails debug trace active for the last UI sort attempt (3.27 MB, text/plain)
2015-01-15 21:41 UTC, Thomas Hennessy
no flags Details
gz of full evm.log from the appliance (1.68 MB, application/x-gzip)
2015-01-15 21:43 UTC, Thomas Hennessy
no flags Details
extract of production log lines assocated with UI worker pid 11645 (270.19 KB, text/plain)
2015-01-15 21:45 UTC, Thomas Hennessy
no flags Details
evm.log expract of log lines associated with pid 11645 (9.97 KB, text/plain)
2015-01-15 21:46 UTC, Thomas Hennessy
no flags Details
ssl_access log from appliance (8.77 KB, text/plain)
2015-01-15 21:52 UTC, Thomas Hennessy
no flags Details
ssl_error log from /var/www/miq/vmdb/log/apache directory (3.58 KB, text/plain)
2015-01-15 21:53 UTC, Thomas Hennessy
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2016:1348 0 normal SHIPPED_LIVE CFME 5.6.0 bug fixes and enhancement update 2016-06-29 18:50:04 UTC

Description Thomas Hennessy 2015-01-15 21:41:17 UTC
Created attachment 980647 [details]
full production log with rails debug trace active for the last UI sort attempt

Description of problem:Customer has a regional database with about 3200 datastore instances which  he attempts to sort in order by freespace.  The UI never returns.


Version-Release number of selected component (if applicable): 5.2.4.2


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:
the UI worker consumes a great amount of virtual memory (> 2 GB)and jumps up to 99% cpu utilization (based on top data) for several minutes before apache decides that the server is not going to respond.

Testing with other customer's provided VMDB's confirms that when hundreds of datastore instances are in the VMDB a UI request to order by free space can take a very long time.

The bug being reported is that the request for a list of datastores sorted in some order supported by the UI worker never completes.

Customer has generated a failing example with rails debug trace active.  logs from this will be attached to this case.

Comment 1 Thomas Hennessy 2015-01-15 21:43:59 UTC
Created attachment 980648 [details]
gz of full evm.log from the appliance

Comment 2 Thomas Hennessy 2015-01-15 21:45:32 UTC
Created attachment 980649 [details]
extract of production log lines assocated with UI worker pid 11645

Comment 3 Thomas Hennessy 2015-01-15 21:46:55 UTC
Created attachment 980650 [details]
evm.log expract of log lines associated with pid 11645

Comment 4 Thomas Hennessy 2015-01-15 21:52:19 UTC
Created attachment 980674 [details]
ssl_access log from appliance

Comment 6 Thomas Hennessy 2015-01-15 21:53:32 UTC
Created attachment 980675 [details]
ssl_error log from /var/www/miq/vmdb/log/apache directory

Comment 8 Dave Johnson 2015-04-17 18:24:01 UTC
Removing blocker flag for 5.4 as we believe there is a work around with going through generate reports (or atleast hoping so).  In the mean time, it stays as a high priority issue to chase through bug fix days.

Tom, is that something you can confirm with the customer?

Comment 9 Dave Johnson 2015-04-17 18:25:03 UTC
Alex, is this something you can reproduce (again) for dev?

Comment 10 Alex Krzos 2015-04-27 12:23:36 UTC
Dave,

I turned on the original appliance I have that reproduces the problem.  I can update that appliance with new code or export its database to other appliances if needed to reproduce on different versions.

Comment 13 Keenan Brock 2015-11-17 02:27:26 UTC
Does look like it could be optimized if the count were performed in sql.

explain analyze
  select storages.*,
         (
          select count(*)
          from vms
          where storage_id = storages.id
            and ((template = true and ems_id is not null) or host_id is not null)
         ) as unmanaged_vm_counts
  from storages
    order by unmanaged_vm_counts

Probably would like to add indexes to the vms and file_storages table to more easily run those sub queries.

create_index :vms, [:storage_id, :ems_id, :template, :host_id]

Most seem to have the same conditions on those, so adding a where clause to the index would greatly reduce the index size and speed up queries.

Comment 14 dmetzger 2015-12-10 14:47:47 UTC
The undlerlying issue is the database cannot (at least is not presently) able to perform the filter for us and simply returns all the objects which leaves Ruby with a much larger (everything) set to process.

Currently the MIQ_Report code passes / copies the set of objects multiple times during processing which increases memory and cpu utilization.

Testing indicates the current (5.5) code filters ~1,000 datastores per second on an essentailly idle appliance with no memory constraint, thus page rendering time is approximately (# of objects / 1,000) per second best case.

The following filters are not performed within the database, thus can be expected to experience slowness at scale:

- % Free Space
- Total Provisioned Space
- Total Hosts
- Managed/Registered VMs
- Managed/Unregistered Vms
- Unmanaged VMs

Comment 17 CFME Bot 2016-04-19 20:30:42 UTC
New commit detected on ManageIQ/manageiq/master:
https://github.com/ManageIQ/manageiq/commit/9278071094c0bd1ca580f1dfdcf4623ff50c9214

commit 9278071094c0bd1ca580f1dfdcf4623ff50c9214
Author:     Keenan Brock <kbrock>
AuthorDate: Mon Apr 18 14:55:41 2016 -0400
Commit:     Keenan Brock <kbrock>
CommitDate: Tue Apr 19 14:52:32 2016 -0400

    implement v_pct_free_disk_space in hardware
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1182777

 app/models/hardware.rb             | 23 ++++++++++++++
 app/models/vm_or_template.rb       | 20 ++----------
 spec/models/hardware_spec.rb       | 62 ++++++++++++++++++++++++++++++++++++++
 spec/models/vm_or_template_spec.rb | 10 ++++++
 spec/support/arel_spec_helper.rb   | 11 +++++++
 5 files changed, 109 insertions(+), 17 deletions(-)

Comment 19 errata-xmlrpc 2016-06-29 14:51:50 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1348


Note You need to log in before you can comment on or make changes to this bug.