Description of problem: 'dashboard_enabled' is currently set to False. Starting from RHCS 4.0 Dashboard should be installed by default. Version-Release number of selected component (if applicable): RHCS 4.0 beta #2 How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
FYI: This is somewhat tricky. You need to specify a specific ansible role (grafana-server) to get the full dashboard. While you could fall back to e.g. ceph-mgr node if grafana-server is not specified, I would not recommend doing so. The entire stack would suffer a performance impact. Especially for larger clusters, this could ruin the dashboard experience.
Thanks for warning us, Boris! In some sense Grafana might be considered an optional add-on to the Ceph-Dashboard (it can properly run without Grafana). If it's easy to install Grafana later, and we don't like the colocated/fallback approach, we might just proceed with the installation without Grafana.
(In reply to Boris Ranto from comment #1) > FYI: This is somewhat tricky. You need to specify a specific ansible role > (grafana-server) to get the full dashboard. While you could fall back to > e.g. ceph-mgr node if grafana-server is not specified, I would not recommend > doing so. I initially installed Dashboard on RHCS 4 by just setting 'dashboard_enabled: True' in 'group_vars/all.yml'. It was installed on the monitor node and I was able to log in and do things. I didn't realize it, but I guess grafana wasn't available because it wasn't installed, which seems to be what you're implying when you say you don't get the "full" dashboard without using [grafana-server]. Later I defined a dedicated node under [grafana-server]. Then dashbaord didn't work because there was no ceph-mgr daemon on the dedicated node. I had to add the node under [mgrs] to get ceph-mgr on that node so dashboard could work. After I did that dashboard worked but I had the dashboard node running the active ceph-mgr daemon while ceph-mgr on the monitor node was running as standby. > The entire stack would suffer a performance impact. Especially for > larger clusters, this could ruin the dashboard experience. Based on my experience this is kind of what is already happening. This is a concern I have too as I am writing the docs on dashboard and I don't know how to tell the customer to configure RHCS and Dashboard so they won't have a resource usage conflict between the ceph-mgr daemon running on the dashboard node and ceph-mgr daemons running on other nodes. I don't know if it is possible, but it might be ideal to have the ceph-mgr daemon on the dashboard node only perform dashboard functions, i.e. it could not become the active ceph-mgr.
(In reply to John Brier from comment #3) > (In reply to Boris Ranto from comment #1) > > FYI: This is somewhat tricky. You need to specify a specific ansible role > > (grafana-server) to get the full dashboard. While you could fall back to > > e.g. ceph-mgr node if grafana-server is not specified, I would not recommend > > doing so. > > I initially installed Dashboard on RHCS 4 by just setting > 'dashboard_enabled: True' in 'group_vars/all.yml'. It was installed on the > monitor node and I was able to log in and do things. I didn't realize it, > but I guess grafana wasn't available because it wasn't installed, which > seems to be what you're implying when you say you don't get the "full" > dashboard without using [grafana-server]. > > Later I defined a dedicated node under [grafana-server]. Then dashbaord > didn't work because there was no ceph-mgr daemon on the dedicated node. I > had to add the node under [mgrs] to get ceph-mgr on that node so dashboard > could work. > > After I did that dashboard worked but I had the dashboard node running the > active ceph-mgr daemon while ceph-mgr on the monitor node was running as > standby. > > > The entire stack would suffer a performance impact. Especially for > > larger clusters, this could ruin the dashboard experience. > > Based on my experience this is kind of what is already happening. > > This is a concern I have too as I am writing the docs on dashboard and I > don't know how to tell the customer to configure RHCS and Dashboard so they > won't have a resource usage conflict between the ceph-mgr daemon running on > the dashboard node and ceph-mgr daemons running on other nodes. > > I don't know if it is possible, but it might be ideal to have the ceph-mgr > daemon on the dashboard node only perform dashboard functions, i.e. it could > not become the active ceph-mgr. Another thing I'm seeing is when I go to http://ip-of-dashboard-node:8234 I get redirected to http://hostname-of-mon-node:8234... Very odd. And it works because ceph-mgr on the monitor node is also serving 8234. Is this expected?
Hi John, there is a couple of things happening, here. You don't need to run ceph-mgr on grafana-server node (if you really do then it is a bug). The grafana-server node is only supposed to host grafana (port 3000), prometheus (port 9090), alert manager and probably node exporter. There should not be anything running on port 8234 on the grafana-server node. The new ceph-dashboard (port 8234, btw: it is changing to 8443 to conform to upstream) is running as a ceph-mgr module (i.e. it has to run on the mgr node at the moment). This ceph-dashboard then integrates grafana output running on another node. The fact that you are being redirected can be expected. There is always only one active ceph-mgr daemon and the other ceph-mgr daemons are only running a simple re-direct to the active ceph-mgr node. None of this has anything to the with rest of the stack (grafana, prometheus, ...). All in all, it could be OK even if we ran the rest of the stack (grafana+prom+...) on the ceph-mgr node, at least for smaller clusters (a couple of nodes) and I suppose we could have this fallback in ceph-ansible but we should definitely have it documented that we recommend using a separate grafana-server node. btw: The ansible code for dashboard deployment is still being modified (and quite heavily) so bugs are still being expected, here. Regards, Boris
Raised a PR for this - https://github.com/ceph/ceph-ansible/pull/4268. The PR sets `dashboard_enabled` to `True` in group_vars/rhcs.yml.sample and adds a task to ceph-validate role to abort execution if the deployment is happening downstream and Grafana server isn't being deployed. If there's something more/different that needs to be done, please let me know.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0312