Bug 1451342

Summary: configure guest MTU based on underlying network
Product: [oVirt] ovirt-engine Reporter: Dominik Holler <dholler>
Component: BLL.NetworkAssignee: Dominik Holler <dholler>
Status: CLOSED CURRENTRELEASE QA Contact: Michael Burman <mburman>
Severity: medium Docs Contact:
Priority: medium    
Version: ---CC: bugs, danken, dholler, mburman, michal.skrivanek, mkalfon, myakove, ylavi
Target Milestone: ovirt-4.2.5Flags: rule-engine: ovirt-4.2+
ylavi: exception+
mburman: testing_plan_complete?
mburman: testing_ack+
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Enhancement
Doc Text:
Feature: The feature adds the ability to manage the MTU of VM networks in a centralized way. This extends the ability of oVirt to manage the MTU of host networks. Reason: This feature enables the usage of big MTUs ("Jumbo Frames") for OVN networks. This improves the network throughput for OVN networks. Result: The MTU of the network is propagated the whole way down to the guest in the VM.
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-07-31 15:25:08 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Network RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1412234, 1452756, 1590327    
Bug Blocks: 1510336    

Description Dominik Holler 2017-05-16 12:45:46 UTC
Description of problem:

It is not possible to transport a big amount of data between two VMs connected by a logical network provided by ovirt-provider-ovn, if the two VMs are located on different host.

Version-Release number of selected component (if applicable):
ovirt-provider-ovn-driver-1.0-6.el7.centos.noarch
openvswitch-ovn-central-2.7.0-1.el7.centos.x86_64
openvswitch-ovn-host-2.7.0-1.el7.centos.x86_64
openvswitch-ovn-common-2.7.0-1.el7.centos.x86_64
openvswitch-2.7.0-1.el7.centos.x86_64
python-openvswitch-2.7.0-1.el7.centos.noarch


How reproducible:


Steps to Reproduce:
1. Create VMs in oVirt on two differnet hosts
2. Create logical network on ovirt-provider-ovn 
3. Connect the two VMs via the logical network
4. Ensure that the setup is correct by pinging between the VMs
5. Transport of a big amount of data between the two VMs by:
   a) waiting for data on the first VM by:
        nc -l 9999 > /dev/null
   b) creating random data in second VM by: 
        dd if=/dev/urandom of=data bs=4k count=256k
   c) sending the data from second to first VM by:
        time nc $IP_OF_FIRST_VM 9999 < data    

Actual results:

nc does not succeed.

Expected results:

nc succeeds, if both VMs are on different host the same way like both VMs are on the same host.


Additional info:

Comment 1 Dan Kenigsberg 2017-05-16 13:10:16 UTC
Most likely, this is a problem with the underlying OVN, not ovirt-provider-ovn.
Dominik, would you provide more information on how nc "does not succeed"?

Mor, can you reproduce this, attaching openvswitch logs, possibly running it in debug mode?

Comment 2 Dominik Holler 2017-05-16 14:22:28 UTC
> Dominik, would you provide more information on how nc "does not succeed"?

Actual results:

nc is blocking.

Comment 3 Mor 2017-05-17 07:40:17 UTC
(In reply to Dan Kenigsberg from comment #1)
> Most likely, this is a problem with the underlying OVN, not
> ovirt-provider-ovn.
> Dominik, would you provide more information on how nc "does not succeed"?
> 
> Mor, can you reproduce this, attaching openvswitch logs, possibly running it
> in debug mode?

I can confirm that it is reproducible on my environment, with default MTU of 1500 the transfer test (I used iperf) did not even start. When I set the MTU size on the interface to be 1400, the test worked as expected, with results of ~840Mbps. 

I also tested it on OVN network without subnet, and it also relevant.

I see that we plan to fix it on the subnet entity, but maybe we need should think more generally? and also provide documentation for this issue as well? We need to support different environments running on various network configurations that could affect MTU value.

P.S: In the automations we test packet size of 1300 over the tunnel, and I quickly adjust that to higher values. When I started testing OVN 2.6, I remember testing higher values successfully.

Comment 4 Dan Kenigsberg 2017-05-17 07:52:26 UTC
shouldn't OVN fragmentize the guest packets in such a case?

Comment 5 Dan Kenigsberg 2017-05-23 15:45:30 UTC
(In reply to Dan Kenigsberg from comment #4)
> shouldn't OVN fragmentize the guest packets in such a case?

According to Lance, OVS does not do so by default.

We'd better propagate the mtu from the tunnel to the vNIC via libvirt's new <mtu> element.

Comment 6 Dominik Holler 2017-05-23 15:50:13 UTC
The functionality of libvirt's new <mtu> element depends on #1412234 and #1452756

Comment 7 Dan Kenigsberg 2017-07-12 11:28:26 UTC
We no longer use the too-big default MTU since a smaller one is advertised by OVN DHCP. Let us keep this bug in order to set a nice MTU per interface based on the underlying network for the interface.

Comment 8 Michal Skrivanek 2018-04-17 07:45:39 UTC
please re-target is you require machine type update in 4.2. 4.2 GA was released with i440fx-7.3.0 and that's how it needs to stay until 4.3

Comment 9 Dominik Holler 2018-06-12 11:03:23 UTC
During 4.2.z the new functionality would be effective only for users manually choosing machineType>=7.4.0 for the VM or as default in their engine-config by

engine-config --set "ClusterEmulatedMachines=pc-i440fx-rhel7.4.0,pc-i440fx-2.9,pseries-rhel7.5.0,s390-ccw-virtio-2.6" --cver=4.2

The related code changes without changing the default machine type in 4.2
should be ready in 4.2.5.

Comment 10 RHV bug bot 2018-07-02 15:34:07 UTC
INFO: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:

[Open patch attached]

For more info please contact: infra

Comment 11 Michael Burman 2018-07-05 14:47:52 UTC
Tested flows:

1) libvirt flow - native networks - PASS 
mtu is passed to xml and to guest's interface 
 <mtu size='9000'/>

ping with custom MTU is working

2) libvirt flow - physnet networks + OVN in OVS cluster - BLOCKED BZ 1598461
OVS switch type doesn't support custom MTU

3) libvirt flow - auto define - BLOCKED BZ 1598461
OVS switch type doesn't support custom MTU

4) OVN dhcp flow - PASS
dhcp MTU is passed to xml and to the guest's interface
ping with custom MTU is working
- In order to ping VMs on different host, we need to set the tunnel network(ovirtmgmt as default) with higher MTU as well
mtu of ovn network + 58 geneve overhead = MTU of underlying network

5) Hotplug flow - PASS

Not sure if this can be verified at the moment as OVS and custom MTU doesn't work.

Comment 12 Dan Kenigsberg 2018-07-05 20:15:08 UTC
(In reply to Michael Burman from comment #11)

> 
> 3) libvirt flow - auto define - BLOCKED BZ 1598461
> OVS switch type doesn't support custom MTU

Thanks for filing it.

> 
> Not sure if this can be verified at the moment as OVS and custom MTU doesn't
> work.

Since OVS switchType is still under TechPreview, and MTU feature is dearly required by production-ready OVN, I believe we should accept the feature in its partial state.

Comment 13 Michael Burman 2018-07-08 07:38:54 UTC
(In reply to Dan Kenigsberg from comment #12)
> (In reply to Michael Burman from comment #11)
> 
> > 
> > 3) libvirt flow - auto define - BLOCKED BZ 1598461
> > OVS switch type doesn't support custom MTU
> 
> Thanks for filing it.
> 
> > 
> > Not sure if this can be verified at the moment as OVS and custom MTU doesn't
> > work.
> 
> Since OVS switchType is still under TechPreview, and MTU feature is dearly
> required by production-ready OVN, I believe we should accept the feature in
> its partial state.

Fine with me. Based on comments 11 and 12 moving this to verified.
Verified on - rhvm-4.2.5.1_SNAPSHOT-71.g54dde01.0.scratch.master.el7ev.noarch

Comment 14 Dan Kenigsberg 2018-07-16 11:39:15 UTC
Dominik, should we formally document this feature now? If so, how?

Comment 15 Dominik Holler 2018-07-16 12:06:40 UTC
Good idea. In the
Administration Guide
the sections
6.1.7. Logical Network General Settings Explained
and
6.1.2. Creating a New Logical Network in a Data Center or Cluster
would require an update.

Comment 16 Sandro Bonazzola 2018-07-31 15:25:08 UTC
This bugzilla is included in oVirt 4.2.5 release, published on July 30th 2018.

Since the problem described in this bug report should be
resolved in oVirt 4.2.5 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.