Bug 1891887 - OCP on Azure while configuring MTU values results in install failure
Summary: OCP on Azure while configuring MTU values results in install failure
Keywords:
Status: CLOSED DUPLICATE of bug 1877570
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.5
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.7.0
Assignee: Dan Winship
QA Contact: zhaozhanqi
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-10-27 15:31 UTC by Shubhag Saxena
Modified: 2024-03-25 16:50 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-11-04 14:10:38 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Shubhag Saxena 2020-10-27 15:31:13 UTC
Description of problem:
While installing ocp cluster on azure with customized MTU value, installation is failing. Refer the steps mentioned as per doc:
[1] https://docs.openshift.com/container-platform/4.5/installing/installing_azure/installing-azure-network-customizations.html#modifying-nwoperator-config-startup_installing-azure-network-customizations

How reproducible:
When configuring MTU to 3950 in cluster-network-03-config.yml


Steps to Reproduce:
1. Create cluster-network-03-config.yml. See example config below.
2. Configure MTU to 3950 (50 smaller than 4000 on Azure)
3. Start installation

Invoked the create cluster command
./openshift-install create cluster --dir=env/azure-test-cluster --log-level=debug

After sometime encountered the error:
DEBUG Gathering master failed systemd unit status ...
DEBUG Gathering master journals ...
DEBUG Gathering master containers ...
DEBUG Waiting for logs ...
DEBUG Log bundle written to /var/home/core/log-bundle-20201021014157.tar.gz
INFO Bootstrap gather logs captured here "/root/hrishi/env/azure-test-cluster/log-bundle-20201021014157.tar.gz"
FATAL Bootstrap failed to complete: failed to wait for bootstrapping to complete: timed

~~~~~~~~
apiVersion: operator.openshift.io/v1
kind: Network
metadata:
  name: cluster
spec:
  defaultNetwork:
    type: OpenShiftSDN
    openshiftSDNConfig:
      mode: NetworkPolicy
      mtu: 3950
...
~~~~~~~~~

Actual results:
cluster creation is failed 

Expected results:
Allowed to change MTU value of cluster and Nodes on azure cloud provider at installation.

Additional Information:
Azure as per the docs not recommends to change the MTU but cu asked azure support regarding this and per discussion got this reply, "we are in the process of allowing frame sizes up to 4,000 bytes on our network. While this isn’t officially supported today, we are in an experimental phase now. You are welcome to try frame sizes up to 4,000 bytes on Azure Virtual Networks." 

So needs to know why installation is failing for OCP.

Comment 3 Ben Bennett 2020-10-28 14:12:46 UTC
Setting the target release to the current development branch so we can investigate this.  We will consider a backport once the problem, and solution, is understood.

Comment 4 Ben Bennett 2020-10-28 14:16:42 UTC
Are the nodes able to exchange packets of 4000 bytes?

I'd try:
  ping -s $((4000 - 28)) -D node2 -c 1

From the command line of one of the nodes.

Then you can play with that 4000 in the line above to see when you get the pings to work.  For instance, does 1500 work?  Does 1501?

Comment 6 Dan Winship 2020-11-04 14:10:38 UTC
Setting the mtu option in the network config only adjusts the MTU of the pod-to-pod tunnel. You need to set the MTU on the "eth0" (or whatever) interfaces on each node. OCP does not provide any simple way to do this because normally the cloud will configure node MTUs correctly for you (eg by returning the correct MTU in the DHCP response).

So, the customer will need to create MachineConfigs to override the default MTU on each node. https://bugzilla.redhat.com/show_bug.cgi?id=1877570#c31 shows an example of what one customer is doing on vSphere. You should be able to adapt the approach there to work on Azure.

Note that once they do this, they don't need to create the cluster-network-03.yaml manifest, because OCP will automatically set the tunnel MTU correctly based on the node MTU.

Eventually when the higher MTU feature on Azure becomes officially, then presumably Azure's DHCP servers will start indicating that higher MTU to nodes, and then the MachineConfig approach would not be necessary.

*** This bug has been marked as a duplicate of bug 1877570 ***


Note You need to log in before you can comment on or make changes to this bug.