Bug 1498878

Summary:

Web UI is unreachable after performing adhoc metric in Openshift

Product:

Red Hat CloudForms Management Engine

Reporter:

Neha Chugh <nchugh>

Component:

UI - OPS

Assignee:

Joe Rafaniello <jrafanie>

Status:

CLOSED DUPLICATE

QA Contact:

Einat Pacifici <epacific>

Severity:

high

Docs Contact:

Priority:

high

Version:

5.8.0

CC:

bazulay, cpelland, dclarizi, fsimonce, hkataria, jhardy, jrafanie, mhradil, mpovolny, nchugh, obarenbo, pmukhedk, sacpatil, yzamir

Target Milestone:

Target Release:

5.8.3

Hardware:

All

OS:

All

Whiteboard:

container:ui

Fixed In Version:

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2018-01-11 18:27:04 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

CFME Core

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
the error message I get when the metrics server is down	none
Proxy issue video	none

Description Neha Chugh 2017-10-05 13:20:18 UTC

Description of problem:
Web UI is unreachable after performing adhoc metric in Openshift

Version-Release number of selected component (if applicable):
Cloudforms 4.5

How reproducible:
Always


Steps to Reproduce:
1. Navigate to compute -> container -> provider -> Monitoring -> Adhoc metric
2. It will give 502 proxy error and Web UI is unreachable
3. After restarting the evm service, the Web UI is reachable.

Actual results:
It is giving below exception:
The proxy server received an invalid response from an upstream server.
The proxy server could not handle the request GET /ops/explorer.

Reason: Error reading from remote server

after performing adhoc metric in openshift.

Expected results:
It should not throw any exception.


Additional info:
In order to get latest logs, please use below commands:

[yank] complete - access files in /cases/01930822
    [browse] the files here: http://collab-shell.usersys.redhat.com/01930822/
    [images] are available here: http://collab-shell.usersys.redhat.com/01930822/x-image

Comment 4 Federico Simoncelli 2017-10-05 14:11:15 UTC

Yaacov can you look into this?

Comment 5 Yaacov Zamir 2017-10-15 11:24:31 UTC

> Steps to Reproduce:
> 1. Navigate to compute -> container -> provider -> Monitoring -> Adhoc metric
> 2. It will give 502 proxy error and Web UI is unreachable

I can't reproduce :-( on my system I get the metrics page, if the metrics server is down I get a regular error message.

Do you have / can you prepare a system I can login and see this happening ?

Comment 6 Yaacov Zamir 2017-10-15 11:25:29 UTC

Created attachment 1338801 [details]
the error message I get when the metrics server is down

Comment 13 Yaacov Zamir 2017-10-18 11:02:53 UTC

Neha Chugh, hi, any news ?

Comment 14 Neha Chugh 2017-10-18 11:53:03 UTC

Hello Yaacov,

The issue is not reproducible in all the environments, give me a day or so time so to reproduce the issue and provide the environment details accordingly.

Regards,
Neha chugh

Comment 16 Neha Chugh 2017-11-14 07:50:11 UTC

Helllo Yaacov,

I am unable to reproduce the issue in any of the test environments, currently we are checking with customer if there is any network connectivity issue between hawkular and Cloudforms.

Currently we are waiting for customer response on this, will update the BZ once we get the required inputs from customer.

Regards,
Neha Chugh

Comment 19 Yaacov Zamir 2017-11-27 12:22:39 UTC

Created attachment 1359449 [details]
hawkular is not responsive (try to connect to hawkular fails)

Comment 22 Neha Chugh 2017-12-01 11:46:38 UTC

Created attachment 1361533 [details]
Proxy issue video

Comment 40 Joe Rafaniello 2017-12-13 22:17:36 UTC

Yes, if the UI worker is dying because it's exceeding memory thresholds, then I'd imagine you might see something similar to that.  

I haven't had a chance to look at the logs but is the evm_worker_memory_exceeded happening to the UI worker?

Comment 45 Joe Rafaniello 2017-12-15 21:35:28 UTC

bug 1478434 was a clone of the above "two miq servers" bug and was released in 5.8.2.0.  The logs indicate the customer logs come from 5.8.1.5.

Neha, I notice you recreated this on your system with version 5.8.1.5, can you recreate this issue in cfme 5.8.2.0+?

If you still hit this issue on 5.8.2.0, we'd have to investigate the timeouts and amazingly long requests highlighted in comment 43.

Comment 46 Neha Chugh 2017-12-18 04:59:48 UTC

Alright Joe, Let me check at cfme 5.8.2.0 and will come back with my findings.

Regards,
Neha Chugh