Bug 1455711 - [RFE] OSD: Add heartbeat message for Jumbo Frames(MTU 9000)
Summary: [RFE] OSD: Add heartbeat message for Jumbo Frames(MTU 9000)
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: RADOS
Version: 1.3.3
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: rc
: 3.0
Assignee: Josh Durgin
QA Contact: Manohar Murthy
Bara Ancincova
URL:
Whiteboard:
Depends On:
Blocks: 1494421
TreeView+ depends on / blocked
 
Reported: 2017-05-25 22:27 UTC by Vikhyat Umrao
Modified: 2020-07-16 09:40 UTC (History)
13 users (show)

Fixed In Version: RHEL: ceph-12.1.4-1.el7cp Ubuntu: ceph_12.1.4-2redhat1xenial
Doc Type: Bug Fix
Doc Text:
.A heartbeat message for Jumbo frames has been added Previously, if a network included jumbo frames and the maximum transmission unit (MTU) was not configured properly on all network parts, a lot of problems, such as slow requests, and stuck peering and backfilling processes occurred. In addition, the OSD logs did not include any heartbeat timeout messages because the heartbeat message packet size is below 1500 bytes. This update adds a heartbeat message for Jumbo frames.
Clone Of:
: 1461581 (view as bug list)
Environment:
Last Closed: 2017-12-05 23:33:43 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Ceph Project Bug Tracker 20087 0 None None None 2017-05-25 22:28:39 UTC
Red Hat Product Errata RHBA-2017:3387 0 normal SHIPPED_LIVE Red Hat Ceph Storage 3.0 bug fix and enhancement update 2017-12-06 03:03:45 UTC

Description Vikhyat Umrao 2017-05-25 22:27:12 UTC
Description of problem:
[RFE] OSD: Add heartbeat message for Jumbo Frames(MTU 900)
http://tracker.ceph.com/issues/20087

- When we have jumbo frames enabled in cluster network and if MTU is not configured properly like the recommendation is all interconnecting network gear must also have jumbo frames enabled but if any device is misconfigured for jumbo frames then we see a lot of issues like peering stuck, slow requests and backfilling not progressing.

- And the issue is we do not see heartbeat timeout messages in the OSD logs because heartbeat messages packet size is below 1500.

- We checked the communication issue with below command:

~~~
# ping -W 2 -I <interface> -M do -s <pkt size> <IP address>
~~~


Version-Release number of selected component (if applicable):
Red Hat Ceph Storage 1.3.2

Comment 9 Manohar Murthy 2017-11-08 10:41:38 UTC
Hi Vikhyat,

Can you please provide steps to recreate this bug and verification steps too.



Thanks,
Manohar

Comment 10 Michael J. Kidd 2017-11-08 22:14:49 UTC
Manohar, reproduction steps are as follows:

* Configure OSD and MON nodes to use jumbo frames ( typically, 9000 byte MTU )
* Configure interconnecting switch gear to *NOT* allow jumbo frames ( typically configured for 1500 byte MTU )
* Start MON and OSD processes

Comment 16 errata-xmlrpc 2017-12-05 23:33:43 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:3387


Note You need to log in before you can comment on or make changes to this bug.