Bug 1568017

Summary: incorrect mtu in node-config.yaml on GCE
Product: OpenShift Container Platform Reporter: Weihua Meng <wmeng>
Component: InstallerAssignee: Russell Teague <rteague>
Status: CLOSED ERRATA QA Contact: Weihua Meng <wmeng>
Severity: high Docs Contact:
Priority: high    
Version: 3.10.0CC: aos-bugs, bmeng, hongli, jokerman, mmccomas, wmeng
Target Milestone: ---   
Target Release: 3.10.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: Default value of MTU was not being determined by looking at the local host interfaces. Consequence: Default value was set to one that was hard coded Fix: Updated SDN MTU value to default based on value of local host interfaces. Result: Default SDN MTU set as expected for host.
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-07-30 19:13:03 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Weihua Meng 2018-04-16 15:22:07 UTC
Description of problem:
incorrect mtu in node-config.yaml on GCE

Version-Release number of the following components:
openshift-ansible-3.10.0-0.21.0.git.0.0b1d180.el7.noarch.rpm

How reproducible:
Always

Steps to Reproduce:
1. set up OCP 3.10 cluster on GCE
2. check cluster

Actual results:
# ip link show eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1460 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 42:01:0a:f0:00:05 brd ff:ff:ff:ff:ff:ff

# grep mtu /etc/origin/node/node-config.yaml 
  mtu: 8951


Expected results:
# grep mtu /etc/origin/node/node-config.yaml 
  mtu: 1410

Additional info:
When a packet is larger than the MTU size that is transmitted over HTTP, the physical network router is able to break the packet into multiple packets to transmit the data. However, when a packet is larger than the MTU size is that transmitted over HTTPS, the router is forced to drop the packet.

To fix this issue, adjust the MTU size within the /etc/origin/node/node-config.yaml to 50 bytes smaller than the MTU size being used by the OpenShift SDN Ethernet device.

https://docs.openshift.com/container-platform/3.9/day_two_guide/environment_health_checks.html#day-two-guide-verifying_mtu

Comment 1 Scott Dodson 2018-04-16 19:37:11 UTC
Can you provide a complete list of interfaces on the host?

Comment 2 Weihua Meng 2018-04-16 22:52:27 UTC
[root@qe-wmeng21-master-etcd-1 ~]# grep -r mtu /etc/origin/node/
/etc/origin/node/bootstrap-node-config.yaml:  mtu: 8951
/etc/origin/node/node-config.yaml:  mtu: 8951

[root@qe-wmeng21-master-etcd-1 ~]# ip link 
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1460 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 42:01:0a:f0:00:04 brd ff:ff:ff:ff:ff:ff
3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default 
    link/ether 02:42:46:1e:fa:f9 brd ff:ff:ff:ff:ff:ff
4: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether e6:99:c0:61:9f:fc brd ff:ff:ff:ff:ff:ff
5: br0: <BROADCAST,MULTICAST> mtu 8951 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 8a:c8:a8:4f:40:4a brd ff:ff:ff:ff:ff:ff
6: vxlan_sys_4789: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65000 qdisc noqueue master ovs-system state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether be:2a:56:f1:c5:ea brd ff:ff:ff:ff:ff:ff
7: tun0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8951 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether ca:ec:45:c9:09:e4 brd ff:ff:ff:ff:ff:ff
9: vethf87cb036@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8951 qdisc noqueue master ovs-system state UP mode DEFAULT group default 
    link/ether 1a:3b:d4:76:6e:5f brd ff:ff:ff:ff:ff:ff link-netnsid 1
10: veth7a30e967@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8951 qdisc noqueue master ovs-system state UP mode DEFAULT group default 
    link/ether fe:26:f6:9a:a2:30 brd ff:ff:ff:ff:ff:ff link-netnsid 0
11: veth79ab9764@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8951 qdisc noqueue master ovs-system state UP mode DEFAULT group default 
    link/ether 4e:68:30:0f:46:e8 brd ff:ff:ff:ff:ff:ff link-netnsid 2
12: vethe2f6f3a6@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8951 qdisc noqueue master ovs-system state UP mode DEFAULT group default 
    link/ether 8a:6f:48:c8:f7:95 brd ff:ff:ff:ff:ff:ff link-netnsid 3
[root@qe-wmeng21-master-etcd-1 ~]#

Comment 3 Scott Dodson 2018-04-19 13:30:14 UTC
Need to change roles/openshift_node_group/defaults/main.yml to use openshift.node.sdn_mtu as default rather than 8951

Comment 5 Russell Teague 2018-04-20 12:24:31 UTC
openshift-ansible-3.10.0-0.24.0.git.0.fd7f37c.el7

Comment 6 Weihua Meng 2018-04-21 06:47:01 UTC
Fixed.
openshift-ansible-3.10.0-0.26.0.git.0.dbc127c.el7.noarch

# grep mtu /etc/origin/node/node-config.yaml 
  mtu: 1410


  Operating System: Red Hat Enterprise Linux Server 7.5 (Maipo)
       CPE OS Name: cpe:/o:redhat:enterprise_linux:7.5:GA:server
            Kernel: Linux 3.10.0-862.el7.x86_64

Comment 8 errata-xmlrpc 2018-07-30 19:13:03 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:1816