Bug 1182777

Summary:

CFME 5.2.4.2 - UI never returns when attempting to sort 3200 datastores by freespace

Product:

Red Hat CloudForms Management Engine

Reporter:

Thomas Hennessy <thenness>

Component:

Performance

Assignee:

dmetzger

Status:

CLOSED ERRATA

QA Contact:

Alex Krzos <akrzos>

Severity:

high

Docs Contact:

Priority:

high

Version:

5.3.0

CC:

akrzos, clasohm, cpelland, jdeubel, jfrey, jhardy, jocarter, jprause, kbrock, mfeifer, nlane, obarenbo, simaishi, thenness, xlecauch

Target Milestone:

Target Release:

5.6.0

Hardware:

x86_64

OS:

Linux

Whiteboard:

ui:perf:datastore

Fixed In Version:

5.6.0.4

Doc Type:

Bug Fix

Doc Text:

In previous versions of CloudForms, the user interface could not successfully sort thousands of datastores by amount of free space. This task failed because the database could not filter that volume of information, and consequently returned all of the objects to Ruby to process. Additionally, the MIQ_Report code passed/copied the set of objects multiple times during processing, which increased memory and CPU utilization. The code now uses SQL virtual column sorting for this task and succeeds.

Story Points:

---

Clone Of:

Clones:

1290184 (view as bug list)

Environment:

Last Closed:

2016-06-29 14:51:50 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

1110527, 1290184

Attachments:

Description	Flags
full production log with rails debug trace active for the last UI sort attempt	none
gz of full evm.log from the appliance	none
extract of production log lines assocated with UI worker pid 11645	none
evm.log expract of log lines associated with pid 11645	none
ssl_access log from appliance	none
ssl_error log from /var/www/miq/vmdb/log/apache directory	none

Description Thomas Hennessy 2015-01-15 21:41:17 UTC

Created attachment 980647 [details]
full production log with rails debug trace active for the last UI sort attempt

Description of problem:Customer has a regional database with about 3200 datastore instances which  he attempts to sort in order by freespace.  The UI never returns.


Version-Release number of selected component (if applicable): 5.2.4.2


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:
the UI worker consumes a great amount of virtual memory (> 2 GB)and jumps up to 99% cpu utilization (based on top data) for several minutes before apache decides that the server is not going to respond.

Testing with other customer's provided VMDB's confirms that when hundreds of datastore instances are in the VMDB a UI request to order by free space can take a very long time.

The bug being reported is that the request for a list of datastores sorted in some order supported by the UI worker never completes.

Customer has generated a failing example with rails debug trace active.  logs from this will be attached to this case.

Comment 1 Thomas Hennessy 2015-01-15 21:43:59 UTC

Created attachment 980648 [details]
gz of full evm.log from the appliance

Comment 2 Thomas Hennessy 2015-01-15 21:45:32 UTC

Created attachment 980649 [details]
extract of production log lines assocated with UI worker pid 11645

Comment 3 Thomas Hennessy 2015-01-15 21:46:55 UTC

Created attachment 980650 [details]
evm.log expract of log lines associated with pid 11645

Comment 4 Thomas Hennessy 2015-01-15 21:52:19 UTC

Created attachment 980674 [details]
ssl_access log from appliance

Comment 6 Thomas Hennessy 2015-01-15 21:53:32 UTC

Created attachment 980675 [details]
ssl_error log from /var/www/miq/vmdb/log/apache directory

Comment 8 Dave Johnson 2015-04-17 18:24:01 UTC

Removing blocker flag for 5.4 as we believe there is a work around with going through generate reports (or atleast hoping so).  In the mean time, it stays as a high priority issue to chase through bug fix days.

Tom, is that something you can confirm with the customer?

Comment 9 Dave Johnson 2015-04-17 18:25:03 UTC

Alex, is this something you can reproduce (again) for dev?

Comment 10 Alex Krzos 2015-04-27 12:23:36 UTC

Dave,

I turned on the original appliance I have that reproduces the problem.  I can update that appliance with new code or export its database to other appliances if needed to reproduce on different versions.

Comment 13 Keenan Brock 2015-11-17 02:27:26 UTC

Does look like it could be optimized if the count were performed in sql.

explain analyze
  select storages.*,
         (
          select count(*)
          from vms
          where storage_id = storages.id
            and ((template = true and ems_id is not null) or host_id is not null)
         ) as unmanaged_vm_counts
  from storages
    order by unmanaged_vm_counts

Probably would like to add indexes to the vms and file_storages table to more easily run those sub queries.

create_index :vms, [:storage_id, :ems_id, :template, :host_id]

Most seem to have the same conditions on those, so adding a where clause to the index would greatly reduce the index size and speed up queries.

Comment 14 dmetzger 2015-12-10 14:47:47 UTC

The undlerlying issue is the database cannot (at least is not presently) able to perform the filter for us and simply returns all the objects which leaves Ruby with a much larger (everything) set to process.

Currently the MIQ_Report code passes / copies the set of objects multiple times during processing which increases memory and cpu utilization.

Testing indicates the current (5.5) code filters ~1,000 datastores per second on an essentailly idle appliance with no memory constraint, thus page rendering time is approximately (# of objects / 1,000) per second best case.

The following filters are not performed within the database, thus can be expected to experience slowness at scale:

- % Free Space
- Total Provisioned Space
- Total Hosts
- Managed/Registered VMs
- Managed/Unregistered Vms
- Unmanaged VMs

Comment 15 CFME Bot 2016-01-27 17:11:24 UTC

https://github.com/ManageIQ/manageiq/pull/6368

Comment 16 CFME Bot 2016-04-08 04:25:35 UTC

https://github.com/ManageIQ/manageiq/pull/7813

Comment 17 CFME Bot 2016-04-19 20:30:42 UTC

New commit detected on ManageIQ/manageiq/master:
https://github.com/ManageIQ/manageiq/commit/9278071094c0bd1ca580f1dfdcf4623ff50c9214

commit 9278071094c0bd1ca580f1dfdcf4623ff50c9214
Author:     Keenan Brock <kbrock>
AuthorDate: Mon Apr 18 14:55:41 2016 -0400
Commit:     Keenan Brock <kbrock>
CommitDate: Tue Apr 19 14:52:32 2016 -0400

    implement v_pct_free_disk_space in hardware
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1182777

 app/models/hardware.rb             | 23 ++++++++++++++
 app/models/vm_or_template.rb       | 20 ++----------
 spec/models/hardware_spec.rb       | 62 ++++++++++++++++++++++++++++++++++++++
 spec/models/vm_or_template_spec.rb | 10 ++++++
 spec/support/arel_spec_helper.rb   | 11 +++++++
 5 files changed, 109 insertions(+), 17 deletions(-)

Comment 19 errata-xmlrpc 2016-06-29 14:51:50 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1348