Bug 1309886

Summary: Diagnostic Enhancement: check MTU size is valid across the Openshift Cluster
Product: OpenShift Container Platform Reporter: Matt Woodson <mwoodson>
Component: NetworkingAssignee: Eric Paris <eparis>
Status: CLOSED DEFERRED QA Contact: Meng Bo <bmeng>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.1.0CC: agrimm, aos-bugs, bbennett, ccoleman
Target Milestone: ---Keywords: UpcomingRelease
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-04-12 19:23:32 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1303130    

Description Matt Woodson 2016-02-18 21:46:19 UTC
Description of problem:

When installing clusters we have found inconsistent MTU sizes between interfaces.  The eth0 would have MTU of 9000, the tun0 would have MTU of 1500.  This has caused problems with pods being deployed.  This is also a very hard bug to diagnose, taking hours to understand and pinpoint.

It would be helpful if Openshift could alert the administrator that there are MTU differences and/or offer to help correct the MTU configurations.  This could help prevent possible problems that would arise.



Version-Release number of selected component (if applicable):

atomic-openshift-master-3.1.1.6-2.git.10.15b47fc.el7aos.x86_64
atomic-openshift-node-3.1.1.6-2.git.10.15b47fc.el7aos.x86_64

Comment 1 Clayton Coleman 2016-02-18 21:51:22 UTC
At a minimum we could try a TLS connection and suggest MTU misconfiguration as an option.  But MTU checking is even better.

Comment 2 Ben Bennett 2016-04-12 19:23:32 UTC
This was fixed in the ansible installer when it detects the interface MTU and assigns the SDN MTU appropriately.

Further work should be done in the network diagnostics tool to help catch this case, we already gather the right data across the cluset, but should flag the discrpancy.

Closing this because the work is tracked in https://trello.com/c/HtaFZbiR/68-13-network-diagnostics-utility-supportability