Bug 1498878

Summary: Web UI is unreachable after performing adhoc metric in Openshift
Product: Red Hat CloudForms Management Engine Reporter: Neha Chugh <nchugh>
Component: UI - OPSAssignee: Joe Rafaniello <jrafanie>
Status: CLOSED DUPLICATE QA Contact: Einat Pacifici <epacific>
Severity: high Docs Contact:
Priority: high    
Version: 5.8.0CC: bazulay, cpelland, dclarizi, fsimonce, hkataria, jhardy, jrafanie, mhradil, mpovolny, nchugh, obarenbo, pmukhedk, sacpatil, yzamir
Target Milestone: GA   
Target Release: 5.8.3   
Hardware: All   
OS: All   
Whiteboard: container:ui
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-01-11 18:27:04 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: CFME Core Target Upstream Version:
Embargoed:
Attachments:
Description Flags
the error message I get when the metrics server is down
none
Proxy issue video none

Description Neha Chugh 2017-10-05 13:20:18 UTC
Description of problem:
Web UI is unreachable after performing adhoc metric in Openshift

Version-Release number of selected component (if applicable):
Cloudforms 4.5

How reproducible:
Always


Steps to Reproduce:
1. Navigate to compute -> container -> provider -> Monitoring -> Adhoc metric
2. It will give 502 proxy error and Web UI is unreachable
3. After restarting the evm service, the Web UI is reachable.

Actual results:
It is giving below exception:
The proxy server received an invalid response from an upstream server.
The proxy server could not handle the request GET /ops/explorer.

Reason: Error reading from remote server

after performing adhoc metric in openshift.

Expected results:
It should not throw any exception.


Additional info:
In order to get latest logs, please use below commands:

[yank] complete - access files in /cases/01930822
    [browse] the files here: http://collab-shell.usersys.redhat.com/01930822/
    [images] are available here: http://collab-shell.usersys.redhat.com/01930822/x-image

Comment 4 Federico Simoncelli 2017-10-05 14:11:15 UTC
Yaacov can you look into this?

Comment 5 Yaacov Zamir 2017-10-15 11:24:31 UTC
> Steps to Reproduce:
> 1. Navigate to compute -> container -> provider -> Monitoring -> Adhoc metric
> 2. It will give 502 proxy error and Web UI is unreachable

I can't reproduce :-( on my system I get the metrics page, if the metrics server is down I get a regular error message.

Do you have / can you prepare a system I can login and see this happening ?

Comment 6 Yaacov Zamir 2017-10-15 11:25:29 UTC
Created attachment 1338801 [details]
the error message I get when the metrics server is down

Comment 13 Yaacov Zamir 2017-10-18 11:02:53 UTC
Neha Chugh, hi, any news ?

Comment 14 Neha Chugh 2017-10-18 11:53:03 UTC
Hello Yaacov,

The issue is not reproducible in all the environments, give me a day or so time so to reproduce the issue and provide the environment details accordingly.

Regards,
Neha chugh

Comment 16 Neha Chugh 2017-11-14 07:50:11 UTC
Helllo Yaacov,

I am unable to reproduce the issue in any of the test environments, currently we are checking with customer if there is any network connectivity issue between hawkular and Cloudforms.

Currently we are waiting for customer response on this, will update the BZ once we get the required inputs from customer.

Regards,
Neha Chugh

Comment 19 Yaacov Zamir 2017-11-27 12:22:39 UTC
Created attachment 1359449 [details]
hawkular is not responsive (try to connect to hawkular fails)

Comment 22 Neha Chugh 2017-12-01 11:46:38 UTC
Created attachment 1361533 [details]
Proxy issue video

Comment 40 Joe Rafaniello 2017-12-13 22:17:36 UTC
Yes, if the UI worker is dying because it's exceeding memory thresholds, then I'd imagine you might see something similar to that.  

I haven't had a chance to look at the logs but is the evm_worker_memory_exceeded happening to the UI worker?

Comment 45 Joe Rafaniello 2017-12-15 21:35:28 UTC
bug 1478434 was a clone of the above "two miq servers" bug and was released in 5.8.2.0.  The logs indicate the customer logs come from 5.8.1.5.

Neha, I notice you recreated this on your system with version 5.8.1.5, can you recreate this issue in cfme 5.8.2.0+?

If you still hit this issue on 5.8.2.0, we'd have to investigate the timeouts and amazingly long requests highlighted in comment 43.

Comment 46 Neha Chugh 2017-12-18 04:59:48 UTC
Alright Joe, Let me check at cfme 5.8.2.0 and will come back with my findings.

Regards,
Neha Chugh