Bug 1650397

Summary: [RFE] Dashboard to display MTU setting per node under network details
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Mike Hackett <mhackett>
Component: Ceph-DashboardAssignee: Aashish sharma <aasharma>
Status: CLOSED ERRATA QA Contact: Sunil Angadi <sangadi>
Severity: medium Docs Contact: Ranjini M N <rmandyam>
Priority: high    
Version: 4.0CC: bniver, ceph-eng-bugs, epuertat, flucifre, gmeno, kdreyer, mkasturi, pcuzner, rmandyam, sangadi, vereddy, vumrao
Target Milestone: ---Keywords: FutureFeature
Target Release: 5.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ceph-16.1.0-486.el8cp Doc Type: Enhancement
Doc Text:
.The Prometheus Alertmanager rule triggers an alert for different MTU settings on the {storage-product} Dashboard Previously, mismatch in MTU settings, which is a well-known cause of networking issues, had to be identified and managed using the command-line interface. With this release, when a node or a minority of them have an MTU setting that differs from the majority of nodes, an alert is triggered on the {storage-product} Dashboard. The user can either mute the alert or fix the MTU mismatched settings. See the link:{dashboard-guide}#management-of-alerts-on-the-ceph-dashboard[_Management of Alerts on the Ceph dashboard_] section in the _{storage-product} Dashboard Guide_ for more information.
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-08-30 08:22:53 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1832807, 1959686    
Attachments:
Description Flags
Dashboard Hosts Grafana none

Description Mike Hackett 2018-11-16 05:23:00 UTC
Description of problem:
Since the Ceph Dashboard can display network speeds of nics in the network pane for a node then we should also display configured MTU. This will help verify all nodes are configured with same MTU size, (1500 or Jumbo frames for example).

Version-Release number of selected component (if applicable):
4.0

Comment 3 Ernesto Puerta 2018-12-14 10:48:43 UTC
Created attachment 1514326 [details]
Dashboard Hosts Grafana

Comment 4 Ernesto Puerta 2018-12-14 10:53:58 UTC
Mike, RHCS 4.0 Dashboard integrates several Cephmetrics charts, but as you may see in the attached picture, there's no similar placeholder for detailed networks stats.

Another question I have, given this is mostly a static setting. Does it make sense to display it into Grafana dashboard, or can it be moved somewhere else?

@Federico?

Comment 7 Federico Lucifredi 2018-12-17 22:46:58 UTC
I agree with the idea of putting the MTU value in the host's info page.

Targeting 4.1 for delivery.

Comment 10 Ernesto Puerta 2020-02-18 08:33:42 UTC
@Mike, when reviewing this for 4.1, I think this request does not fit into the dashboard concerns. Let me explain:
- The goal of this is to let the user know about a sub-optimal/wrong setting and 'call for action' on that setting.
- MTU is not highly valuable data to display: it's not a changing setting, and once fixed it is unlikely to change again.

Based on the above, I'd suggest to move this to ceph-medic. In fact there's an upstream RFE for this (https://github.com/ceph/ceph-medic/issues/7).

Comment 11 Yaniv Kaul 2020-02-19 17:05:58 UTC
(In reply to Ernesto Puerta from comment #10)
> @Mike, when reviewing this for 4.1, I think this request does not fit into
> the dashboard concerns. Let me explain:
> - The goal of this is to let the user know about a sub-optimal/wrong setting
> and 'call for action' on that setting.
> - MTU is not highly valuable data to display: it's not a changing setting,
> and once fixed it is unlikely to change again.
> 
> Based on the above, I'd suggest to move this to ceph-medic. In fact there's
> an upstream RFE for this (https://github.com/ceph/ceph-medic/issues/7).

So why is this on 4.1? Please handle:
- Close
- Move to 5.x
- Fix.

Comment 12 Ernesto Puerta 2020-02-19 17:17:01 UTC
@Yaniv, Federico targeted this specifically at 4.1 (https://bugzilla.redhat.com/show_bug.cgi?id=1650397#c7). That's why I asked him (and Mike & Paul) for their thoughts.

Comment 13 Yaniv Kaul 2020-02-26 16:14:28 UTC
(In reply to Ernesto Puerta from comment #12)
> @Yaniv, Federico targeted this specifically at 4.1

That was a year+ ago...

> (https://bugzilla.redhat.com/show_bug.cgi?id=1650397#c7). That's why I asked
> him (and Mike & Paul) for their thoughts.

BTW, a different idea would be to launch the host's Cockpit - which has all this data + configuration for this host.

Anyway, I'm moving it to 5.0. We can always bring it back.

Comment 14 Mike Hackett 2020-02-26 16:47:07 UTC
@Yaniv @Ernesto

No issues with the 5.0 plan, I think ceph-medic and even an insights rule would cover the supportability aspect I was looking to address.

Comment 15 Ernesto Puerta 2020-02-26 17:18:00 UTC
Thanks @Yaniv (and @Mike for the clarification)!

Comment 18 Yaniv Kaul 2020-09-23 15:10:20 UTC
I think the intention of this feature was lost with all the details. The idea is to alert the user if a specific network is configured with a custom MTU and a host is mis-configured.
The idea is to alert the user. If it is already part of node exporter, so a nice chunk of the work is done. We now need some integration work - understand which NIC is under which network and what is the expected MTU.

Comment 21 Yaniv Kaul 2020-12-02 16:18:44 UTC
Once completed in RHCS 5, please evaluate if we can backport it to 4.

Comment 31 errata-xmlrpc 2021-08-30 08:22:53 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat Ceph Storage 5.0 bug fix and enhancement), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:3294