Bug 1650397 - [RFE] Dashboard to display MTU setting per node under network details
Summary: [RFE] Dashboard to display MTU setting per node under network details
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: Ceph-Dashboard
Version: 4.0
Hardware: Unspecified
OS: Unspecified
high
medium
Target Milestone: ---
: 5.0
Assignee: Aashish sharma
QA Contact: Sunil Angadi
Ranjini M N
URL:
Whiteboard:
Depends On:
Blocks: 1832807 1959686
TreeView+ depends on / blocked
 
Reported: 2018-11-16 05:23 UTC by Mike Hackett
Modified: 2021-08-30 08:23 UTC (History)
12 users (show)

Fixed In Version: ceph-16.1.0-486.el8cp
Doc Type: Enhancement
Doc Text:
.The Prometheus Alertmanager rule triggers an alert for different MTU settings on the {storage-product} Dashboard Previously, mismatch in MTU settings, which is a well-known cause of networking issues, had to be identified and managed using the command-line interface. With this release, when a node or a minority of them have an MTU setting that differs from the majority of nodes, an alert is triggered on the {storage-product} Dashboard. The user can either mute the alert or fix the MTU mismatched settings. See the link:{dashboard-guide}#management-of-alerts-on-the-ceph-dashboard[_Management of Alerts on the Ceph dashboard_] section in the _{storage-product} Dashboard Guide_ for more information.
Clone Of:
Environment:
Last Closed: 2021-08-30 08:22:53 UTC
Embargoed:


Attachments (Terms of Use)
Dashboard Hosts Grafana (45.82 KB, image/png)
2018-12-14 10:48 UTC, Ernesto Puerta
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github ceph ceph-medic issues 7 0 None open checks: system MTU values should be consistent across the cluster nodes 2021-02-16 08:56:17 UTC
Github ceph ceph pull 38764 0 None closed mgr/dashboard: trigger alert if some nodes have a MTU different than the median value 2021-02-16 08:56:18 UTC
Github prometheus/node_exporter/blob/3ddc82c2d8d11eec53ed5faa8db969a1bb81f8bb/collector/netclass_linux.go#L134 0 None None None 2020-10-05 17:04:15 UTC
Red Hat Issue Tracker RHCEPH-1114 0 None None None 2021-08-30 00:11:14 UTC
Red Hat Issue Tracker RHCSDASH-225 0 None None None 2021-08-30 00:11:11 UTC
Red Hat Product Errata RHBA-2021:3294 0 None None None 2021-08-30 08:23:38 UTC

Description Mike Hackett 2018-11-16 05:23:00 UTC
Description of problem:
Since the Ceph Dashboard can display network speeds of nics in the network pane for a node then we should also display configured MTU. This will help verify all nodes are configured with same MTU size, (1500 or Jumbo frames for example).

Version-Release number of selected component (if applicable):
4.0

Comment 3 Ernesto Puerta 2018-12-14 10:48:43 UTC
Created attachment 1514326 [details]
Dashboard Hosts Grafana

Comment 4 Ernesto Puerta 2018-12-14 10:53:58 UTC
Mike, RHCS 4.0 Dashboard integrates several Cephmetrics charts, but as you may see in the attached picture, there's no similar placeholder for detailed networks stats.

Another question I have, given this is mostly a static setting. Does it make sense to display it into Grafana dashboard, or can it be moved somewhere else?

@Federico?

Comment 7 Federico Lucifredi 2018-12-17 22:46:58 UTC
I agree with the idea of putting the MTU value in the host's info page.

Targeting 4.1 for delivery.

Comment 10 Ernesto Puerta 2020-02-18 08:33:42 UTC
@Mike, when reviewing this for 4.1, I think this request does not fit into the dashboard concerns. Let me explain:
- The goal of this is to let the user know about a sub-optimal/wrong setting and 'call for action' on that setting.
- MTU is not highly valuable data to display: it's not a changing setting, and once fixed it is unlikely to change again.

Based on the above, I'd suggest to move this to ceph-medic. In fact there's an upstream RFE for this (https://github.com/ceph/ceph-medic/issues/7).

Comment 11 Yaniv Kaul 2020-02-19 17:05:58 UTC
(In reply to Ernesto Puerta from comment #10)
> @Mike, when reviewing this for 4.1, I think this request does not fit into
> the dashboard concerns. Let me explain:
> - The goal of this is to let the user know about a sub-optimal/wrong setting
> and 'call for action' on that setting.
> - MTU is not highly valuable data to display: it's not a changing setting,
> and once fixed it is unlikely to change again.
> 
> Based on the above, I'd suggest to move this to ceph-medic. In fact there's
> an upstream RFE for this (https://github.com/ceph/ceph-medic/issues/7).

So why is this on 4.1? Please handle:
- Close
- Move to 5.x
- Fix.

Comment 12 Ernesto Puerta 2020-02-19 17:17:01 UTC
@Yaniv, Federico targeted this specifically at 4.1 (https://bugzilla.redhat.com/show_bug.cgi?id=1650397#c7). That's why I asked him (and Mike & Paul) for their thoughts.

Comment 13 Yaniv Kaul 2020-02-26 16:14:28 UTC
(In reply to Ernesto Puerta from comment #12)
> @Yaniv, Federico targeted this specifically at 4.1

That was a year+ ago...

> (https://bugzilla.redhat.com/show_bug.cgi?id=1650397#c7). That's why I asked
> him (and Mike & Paul) for their thoughts.

BTW, a different idea would be to launch the host's Cockpit - which has all this data + configuration for this host.

Anyway, I'm moving it to 5.0. We can always bring it back.

Comment 14 Mike Hackett 2020-02-26 16:47:07 UTC
@Yaniv @Ernesto

No issues with the 5.0 plan, I think ceph-medic and even an insights rule would cover the supportability aspect I was looking to address.

Comment 15 Ernesto Puerta 2020-02-26 17:18:00 UTC
Thanks @Yaniv (and @Mike for the clarification)!

Comment 18 Yaniv Kaul 2020-09-23 15:10:20 UTC
I think the intention of this feature was lost with all the details. The idea is to alert the user if a specific network is configured with a custom MTU and a host is mis-configured.
The idea is to alert the user. If it is already part of node exporter, so a nice chunk of the work is done. We now need some integration work - understand which NIC is under which network and what is the expected MTU.

Comment 21 Yaniv Kaul 2020-12-02 16:18:44 UTC
Once completed in RHCS 5, please evaluate if we can backport it to 4.

Comment 31 errata-xmlrpc 2021-08-30 08:22:53 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat Ceph Storage 5.0 bug fix and enhancement), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:3294


Note You need to log in before you can comment on or make changes to this bug.