1726739 – Enable ceph-dashboard by default

Bug 1726739 - Enable ceph-dashboard by default

Summary: Enable ceph-dashboard by default

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Ceph Storage
Classification:	Red Hat Storage
Component:	Ceph-Ansible
Sub Component:
Version:	4.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	rc
Target Release:	4.0
Assignee:	Rishabh Dave
QA Contact:	Madhavi Kasturi
Docs Contact:
URL:	https://github.com/ceph/ceph-ansible/...
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2019-07-03 15:29 UTC by Ernesto Puerta
Modified:	2020-01-31 12:46 UTC (History)
CC List:	14 users (show)
Fixed In Version:	ceph-ansible-4.0.0-0.1.rc13.el8cp
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-01-31 12:46:20 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	ceph ceph-ansible issues 4269	'None'	closed	Enable ceph-dashboard by default	2021-01-01 13:47:27 UTC
Github	ceph ceph-ansible pull 4268	'None'	closed	dashboard: enable dashboard by default	2021-01-01 13:47:28 UTC
Red Hat Product Errata	RHBA-2020:0312	None	None	None	2020-01-31 12:46:56 UTC

Description Ernesto Puerta 2019-07-03 15:29:12 UTC

Description of problem:
'dashboard_enabled' is currently set to False. Starting from RHCS 4.0 Dashboard should be installed by default.

Version-Release number of selected component (if applicable):
RHCS 4.0 beta #2

How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Boris Ranto 2019-07-09 20:26:03 UTC

FYI: This is somewhat tricky. You need to specify a specific ansible role (grafana-server) to get the full dashboard. While you could fall back to e.g. ceph-mgr node if grafana-server is not specified, I would not recommend doing so. The entire stack would suffer a performance impact. Especially for larger clusters, this could ruin the dashboard experience.

Comment 2 Ernesto Puerta 2019-07-10 09:12:45 UTC

Thanks for warning us, Boris!

In some sense Grafana might be considered an optional add-on to the Ceph-Dashboard (it can properly run without Grafana). If it's easy to install Grafana later, and we don't like the colocated/fallback approach, we might just proceed with the installation without Grafana.

Comment 3 John Brier 2019-07-10 13:46:00 UTC

(In reply to Boris Ranto from comment #1)
> FYI: This is somewhat tricky. You need to specify a specific ansible role
> (grafana-server) to get the full dashboard. While you could fall back to
> e.g. ceph-mgr node if grafana-server is not specified, I would not recommend
> doing so. 

I initially installed Dashboard on RHCS 4 by just setting 'dashboard_enabled: True' in 'group_vars/all.yml'. It was installed on the monitor node and I was able to log in and do things. I didn't realize it, but I guess grafana wasn't available because it wasn't installed, which seems to be what you're implying when you say you don't get the "full" dashboard without using [grafana-server].

Later I defined a dedicated node under [grafana-server]. Then dashbaord didn't work because there was no ceph-mgr daemon on the dedicated node. I had to add the node under [mgrs] to get ceph-mgr on that node so dashboard could work.

After I did that dashboard worked but I had the dashboard node running the active ceph-mgr daemon while ceph-mgr on the monitor node was running as standby.

> The entire stack would suffer a performance impact. Especially for
> larger clusters, this could ruin the dashboard experience.

Based on my experience this is kind of what is already happening.

This is a concern I have too as I am writing the docs on dashboard and I don't know how to tell the customer to configure RHCS and Dashboard so they won't have a resource usage conflict between the ceph-mgr daemon running on the dashboard node and ceph-mgr daemons running on other nodes.

I don't know if it is possible, but it might be ideal to have the ceph-mgr daemon on the dashboard node only perform dashboard functions, i.e. it could not become the active ceph-mgr.

Comment 4 John Brier 2019-07-10 14:06:18 UTC

(In reply to John Brier from comment #3)
> (In reply to Boris Ranto from comment #1)
> > FYI: This is somewhat tricky. You need to specify a specific ansible role
> > (grafana-server) to get the full dashboard. While you could fall back to
> > e.g. ceph-mgr node if grafana-server is not specified, I would not recommend
> > doing so. 
> 
> I initially installed Dashboard on RHCS 4 by just setting
> 'dashboard_enabled: True' in 'group_vars/all.yml'. It was installed on the
> monitor node and I was able to log in and do things. I didn't realize it,
> but I guess grafana wasn't available because it wasn't installed, which
> seems to be what you're implying when you say you don't get the "full"
> dashboard without using [grafana-server].
> 
> Later I defined a dedicated node under [grafana-server]. Then dashbaord
> didn't work because there was no ceph-mgr daemon on the dedicated node. I
> had to add the node under [mgrs] to get ceph-mgr on that node so dashboard
> could work.
> 
> After I did that dashboard worked but I had the dashboard node running the
> active ceph-mgr daemon while ceph-mgr on the monitor node was running as
> standby.
> 
> > The entire stack would suffer a performance impact. Especially for
> > larger clusters, this could ruin the dashboard experience.
> 
> Based on my experience this is kind of what is already happening.
> 
> This is a concern I have too as I am writing the docs on dashboard and I
> don't know how to tell the customer to configure RHCS and Dashboard so they
> won't have a resource usage conflict between the ceph-mgr daemon running on
> the dashboard node and ceph-mgr daemons running on other nodes.
> 
> I don't know if it is possible, but it might be ideal to have the ceph-mgr
> daemon on the dashboard node only perform dashboard functions, i.e. it could
> not become the active ceph-mgr.

Another thing I'm seeing is when I go to http://ip-of-dashboard-node:8234 I get redirected to http://hostname-of-mon-node:8234... Very odd. And it works because ceph-mgr on the monitor node is also serving 8234. Is this expected?

Comment 5 Boris Ranto 2019-07-11 18:54:13 UTC

Hi John,

there is a couple of things happening, here. You don't need to run ceph-mgr on grafana-server node (if you really do then it is a bug). The grafana-server node is only supposed to host grafana (port 3000), prometheus (port 9090), alert manager and probably node exporter. There should not be anything running on port 8234 on the grafana-server node.

The new ceph-dashboard (port 8234, btw: it is changing to 8443 to conform to upstream) is running as a ceph-mgr module (i.e. it has to run on the mgr node at the moment). This ceph-dashboard then integrates grafana output running on another node.

The fact that you are being redirected can be expected. There is always only one active ceph-mgr daemon and the other ceph-mgr daemons are only running a simple re-direct to the active ceph-mgr node. None of this has anything to the with rest of the stack (grafana, prometheus, ...).

All in all, it could be OK even if we ran the rest of the stack (grafana+prom+...) on the ceph-mgr node, at least for smaller clusters (a couple of nodes) and I suppose we could have this fallback in ceph-ansible but we should definitely have it documented that we recommend using a separate grafana-server node.

btw: The ansible code for dashboard deployment is still being modified (and quite heavily) so bugs are still being expected, here.

Regards,
Boris

Comment 8 Rishabh Dave 2019-07-25 10:33:24 UTC

Raised a PR for this - https://github.com/ceph/ceph-ansible/pull/4268. The PR sets `dashboard_enabled` to `True` in group_vars/rhcs.yml.sample and adds a task to ceph-validate role to abort execution if the deployment is happening downstream and Grafana server isn't being deployed. If there's something more/different that needs to be done, please let me know.

Comment 17 errata-xmlrpc 2020-01-31 12:46:20 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0312

Note You need to log in before you can comment on or make changes to this bug.