Bug 1389058

Summary: CFME 3.0: UI attempting to present > 18k VMs instances spins for > 1 hour and then re-begins accepting UI requests without presenting requested view
Product: Red Hat CloudForms Management Engine Reporter: Thomas Hennessy <thenness>
Component: PerformanceAssignee: Nick LaMuro <nlamuro>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Dave Johnson <dajohnso>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 5.3.0CC: benglish, dclarizi, hkataria, jdeubel, jhardy, jocarter, mpovolny, myoder, obarenbo, saali
Target Milestone: GAFlags: dajohnso: needinfo? (jhardy)
Target Release: cfme-future   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-01-11 17:01:39 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Thomas Hennessy 2016-10-26 18:17:17 UTC
Description of problem: Customer running CFME 3.0 in global region has about 18k active VMs and about 15k retained archived VMs in vms table and when he attempts to view the vms, the explorer screen never returns with any information.


Version-Release number of selected component (if applicable):5.2.5.3


How reproducible:


Steps to Reproduce:
1. instantiate a CFME 3.0 vm
2. CREATE 33K VM instances into the VMs table
3. attempt to view the VMs thru the UI explorer

Actual results:
the UI panel goes away indefinitely, but in the logs while it appears that the spinner is active for about 100 minutes, the UI worker appliance seems to become reactivated and to process new requests.

Expected results: either an Apache timeout or a proxy error on the screen, or the UI worker to termineate due to memory limit exceeded, but none of these things happens.


Additional info:
Customer has 3 UI Worker appliances behind a load balancer so it is a bit trickey to determine which appliance will handle any one incident.  We have logs from two appliances that are contemporaneous with each other, and a third which is a few days before the other two.

I have determined what I explain in the "Actual Results" section by following several of the UI workers in both the evm.log and production.log with the  surprising results described in that section.  I cannot explain what I do see happening, and why what I expect to see does not happen....

The logs and output of some analysis can be found in http://file.rdu.redhat.com/~thenness/SF-01722597

Comment 2 Dave Johnson 2016-10-27 02:38:59 UTC
John, what is the priority here?  I believe some of the UI performance issues have already been resolved (although we would need to confirm this particular issue) in younger releases.