Bug 1625966 - 502 Proxy Error - while checking the provider details on UI
Summary: 502 Proxy Error - while checking the provider details on UI
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat CloudForms Management Engine
Classification: Red Hat
Component: UI - OPS
Version: 5.9.4
Hardware: Unspecified
OS: Linux
medium
medium
Target Milestone: GA
: 5.9.6
Assignee: dmetzger
QA Contact: Jad Haj Yahya
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-09-06 10:00 UTC by Avinash Kumar Dasoundhi
Modified: 2018-10-10 21:17 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-09-07 18:10:12 UTC
Category: ---
Cloudforms Team: ---
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
evm log (131.71 KB, application/x-gzip)
2018-09-06 10:00 UTC, Avinash Kumar Dasoundhi
no flags Details
production.log (15.17 KB, application/x-gzip)
2018-09-06 10:01 UTC, Avinash Kumar Dasoundhi
no flags Details
ssl error log (256 bytes, application/x-gzip)
2018-09-06 10:01 UTC, Avinash Kumar Dasoundhi
no flags Details
proxy error UI screenshot (139.46 KB, image/png)
2018-09-06 10:02 UTC, Avinash Kumar Dasoundhi
no flags Details
proxy error browser screenshot (43.41 KB, image/png)
2018-09-06 10:03 UTC, Avinash Kumar Dasoundhi
no flags Details
Log review summary (5.05 KB, text/plain)
2018-09-07 18:03 UTC, dmetzger
no flags Details
Max worker USS (23.50 KB, image/png)
2018-09-07 18:04 UTC, dmetzger
no flags Details
Runnable process count over time (31.99 KB, image/png)
2018-09-07 18:05 UTC, dmetzger
no flags Details

Description Avinash Kumar Dasoundhi 2018-09-06 10:00:06 UTC
Created attachment 1481248 [details]
evm log

Description of problem:
Getting 502 proxy error when checking the details of the added provider.

After adding the provider which is VMware here and has 10k VMs in it. It is showing a popup containing 502 proxy error while checking the details of the provider. After doing a refresh relationship and power stats for the added provider, it is showing the proxy error. When tried to access the URL from the browser, It is showing the same proxy error. (screenshots attached)

Version-Release number of selected component (if applicable):
CFME 5.9.4.7

How reproducible:
Everytime

Steps to Reproduce:
1. Add the provider (In this case, it is VMware, which has 10k VMs in it)
2. Select the provider & do refresh relationship and power stats.
3. Click on the provider to see the details like hosts, vms etc

Actual results:
After clicking on the provider to check host, VM numbers etc, a popup is coming on the UI saying 502 proxy error. 

Expected results:
It should show the details of the provider like number of hosts, VMs etc

Additional info:
Attaching the evm.log, production.log, and apache/ssl* logs with the error screenshots.

Comment 2 Avinash Kumar Dasoundhi 2018-09-06 10:01:06 UTC
Created attachment 1481249 [details]
production.log

Comment 3 Avinash Kumar Dasoundhi 2018-09-06 10:01:40 UTC
Created attachment 1481250 [details]
ssl error log

Comment 4 Avinash Kumar Dasoundhi 2018-09-06 10:02:29 UTC
Created attachment 1481251 [details]
proxy error UI screenshot

Comment 5 Avinash Kumar Dasoundhi 2018-09-06 10:03:23 UTC
Created attachment 1481252 [details]
proxy error browser screenshot

Comment 9 dmetzger 2018-09-07 18:03:17 UTC
Created attachment 1481632 [details]
Log review summary

Comment 10 dmetzger 2018-09-07 18:04:24 UTC
Created attachment 1481633 [details]
Max worker USS

Comment 11 dmetzger 2018-09-07 18:05:04 UTC
Created attachment 1481634 [details]
Runnable process count over time

Comment 12 dmetzger 2018-09-07 18:05:46 UTC
Based on reviewing the logs from reference worker appliance, the underlying issue appears to be in the work load configured for this appliance.

Based on log review (summary in analysis.txt) we can see:
- Workers Web Services (UI), Reporting and Metrics Collector were exceeding their configured maximum memory usage (max USS usage shown in max_workerP_uss.png) and being restarted.
- The Metrics Collectors were not able to keep up with the data, resulting in many misses
- All worker appliances are in a single Zone, 10K is a very large number of VMs for a single zone
- Number of runnable of processes (workers) over time (vmstat_runnable.png) was often twice the CPU count, resultring in a high system load avcerage, causing high application latency

Overall, the logs reflect an appliance running in a large environment that requires additional deployment configuration in order to run error free.

Comment 13 dmetzger 2018-09-07 18:10:12 UTC
Closing as triaging this ticket indicates the problem appears to be an environment / configuration issue.

Please see https://access.redhat.com/documentation/en-us/reference_architectures/2017/html/deploying_cloudforms_at_scale/index for guidance on configuring CFME at scale.


Note You need to log in before you can comment on or make changes to this bug.