Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
This project is now read‑only. Starting Monday, February 2, please use https://ibm-ceph.atlassian.net/ for all bug tracking management.

Bug 1646884

Summary: Dashboard goes unreachable if active mgr service stops working
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Uday kurundwade <ukurundw>
Component: Ceph-DashboardAssignee: Ernesto Puerta <epuertat>
Status: CLOSED NOTABUG QA Contact: Ernesto Puerta <epuertat>
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.2CC: branto, ceph-eng-bugs, epuertat, ukurundw, vpoliset
Target Milestone: rc   
Target Release: 3.*   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1647154 (view as bug list) Environment:
Last Closed: 2018-11-22 12:56:56 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Screenshot of dashboard site
none
Screenshot of upstream dashboard none

Description Uday kurundwade 2018-11-06 08:44:40 UTC
Created attachment 1502316 [details]
Screenshot of dashboard site

Description of problem:
Dashboard goes unreachable if active mgr service stops working

Version-Release number of selected component (if applicable):
ceph-common-12.2.8-23.el7cp.x86_64
ceph-ansible-3.2.0-0.1.beta9.el7cp.noarch
ceph-mgr-12.2.8-23.el7cp.x86_64

How reproducible:
Always

Steps to Reproduce:
1.Install ceph storage with minimum 3 mgr hosts
2.Install "dashboard v2" by running ansible playbook
3.Login to dashboard after installation of dashboard v2
4.stop mgr service of active mgr daemon using cli command
5.try to login to dashboard

Actual results:
Site is unreachable

Expected results:
Dashboard should take info from another active mgr node 

Additional info:

Comment 3 Ernesto Puerta 2018-11-06 09:39:51 UTC
Unfortunately, dashboard is embedded in every single Ceph-mgr instance, so if a Ceph-mgr becomes unavailable, that dashboard instance becomes so too.

As per the current Ceph-mgr and Dashboard architecture & design, the expected workflow for a user when the active Ceph-mgr becomes unavailable is as follows: the user should manually open the Dashboard URL provided by the new active Ceph-Mgr (this can be obtained with `ceph mgr services`).

An optional workaround to this behaviour could be using an intermediate L4-7 load balancer or a DNS with multiple A records and round-robin balancing/Service discover & availability detection (e.g: Consul, etc).

@Uday, is it ok to close this as expected behaviour, and open, if deemed necessary, another RFE for improving this?

Comment 6 Uday kurundwade 2018-11-13 12:08:06 UTC
Additional info:

after the active mgr goes down ,the standby mgr is becoming active  

AND we are able to access the upstream dashboard site with the new current mgr DASHBORAD URL without even logging in(no prompt ,directly showing the dashboard main page).
This behavior of > if active mgr service stops working and Dashboard being unreachable/upstream dashboard being accessible  is little inconsistent and varying.

Comment 7 Uday kurundwade 2018-11-13 12:13:51 UTC
Created attachment 1505206 [details]
Screenshot of upstream dashboard

Comment 8 Boris Ranto 2018-11-21 14:40:16 UTC
We already provide redirect from inactive mgr daemon to active mgr daemon in dashboard. However, if the mgr node (or the mgr daemon) goes down, the redirect can't work since we are not serving anything on the node. If the mgr daemon is running, it should redirect you even if it is not an active mgr node.

I guess, we could setup a HAProxy on the grafana node in the dashboard-ansible deployment tool that would work around this issue. It is also probably the only solution that would actually make this work as requested in the original comment.

Comment 9 Ernesto Puerta 2018-11-22 12:56:56 UTC
Retested and it works.