Bug 1414901

Summary: Can't set MTU value for VLAN interface in RHEL 7.3
Product: Red Hat Enterprise Linux 7 Reporter: Akhil John <ajohn>
Component: NetworkManagerAssignee: Thomas Haller <thaller>
Status: CLOSED ERRATA QA Contact: Desktop QE <desktop-qa-list>
Severity: medium Docs Contact:
Priority: medium    
Version: 7.3CC: aloughla, atragler, bgalvani, desktop-qa-list, fgiudici, lrintel, mabrown, prpatel, rkhan, rkharwar, sababu, sukulkar, thaller, tpelka, vbenes
Target Milestone: rcKeywords: Reopened
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-04-10 13:19:11 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Akhil John 2017-01-19 17:06:00 UTC
Description of problem:
Can't set MTU value for VLAN interface in RHEL 7.3
Version-Release number of selected component (if applicable):
RHEL 7.3
NetworkManager-1.4.0-12.el7.x86_64

How reproducible:
Everytime

Steps to Reproduce:
1. Create VLAN with MTU value 9000 for any network interface.
# cat etc/sysconfig/network-scripts/ifcfg-em10 
VLAN=yes
TYPE=Vlan
DEVICE=em10
PHYSDEV=em1
VLAN_ID=50
REORDER_HDR=0
BOOTPROTO=none
IPADDR=192.168.10.1
PREFIX=24
DEFROUTE=yes
IPV4_FAILURE_FATAL=no
IPV6INIT=no
NAME=eth0
UUID=421b310a-5154-ta4a-b294-10741d46e0d1
ONBOOT=yes
MTU=9000

# cat etc/sysconfig/network-scripts/ifcfg-em1 
TYPE=Ethernet
BOOTPROTO=dhcp
PEERDNS=yes
IPV4_FAILURE_FATAL=no
IPV6INIT=yes
IPV6_AUTOCONF=yes
IPV6_DEFROUTE=yes
IPV6_PEERDNS=yes
IPV6_PEERROUTES=yes
IPV6_FAILURE_FATAL=no
NAME=em1
UUID=85bd5823-dc78-1c9b-9746-fdf61d24f104
DEVICE=em1
ONBOOT=no


2.# reboot


Actual results:
# ip a
em1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    link/ether b8:2a:72:54:23:77 brd ff:ff:ff:ff:ff:ff
em10@em1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP qlen 1000
    link/ether b8:2a:92:14:43:07 brd ff:ff:ff:ff:ff:ff
    inet 192.168.10.1/24 brd 192.168.10.254 scope global em10
       valid_lft forever preferred_lft forever
    inet6 fe80::ba2a:72ff:fe54:2377/64 scope link 
       valid_lft forever preferred_lft forever


Expected results:

# ip a
em1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP qlen 1000
    link/ether b8:2a:72:54:23:77 brd ff:ff:ff:ff:ff:ff
em10@em1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UP qlen 1000
    link/ether b8:2a:92:14:43:07 brd ff:ff:ff:ff:ff:ff
    inet 192.168.10.1/24 brd 192.168.10.254 scope global em10
       valid_lft forever preferred_lft forever
    inet6 fe80::ba2a:72ff:fe54:2377/64 scope link 
       valid_lft forever preferred_lft forever

Additional info:
The issue is seen only with VLAN.

# nmcli con show em10 | grep mtu
802-3-ethernet.mtu:       9000

But the ip link shows the default MTU value.

Comment 1 Thomas Haller 2017-01-19 17:21:54 UTC
This looks like a duplicate of bug 1414186.

Please reopen if you disagree. Thanks

*** This bug has been marked as a duplicate of bug 1414186 ***

Comment 2 Thomas Haller 2017-08-15 14:50:05 UTC
Hm, re-reading this bug, it doesn't seem a dupe of bug 1414186.

bug 1414186 is about configured MTU if the VLAN connection profile leaves it unspecified (correct behavior with the fix: it should inherit the MTU from the parent).

comment 0 sets the MTU on the VLAN connection to 9000. However, the connection of the parent device leaves the MTU at 1500.
This seems like a configuration error, because the MTU is limited by the MTU of the parent device.
I don't think that NetworkManager should try to workaround this -- meaning: I think NM should not automatically try to increase the parents MTU, because the parent might itself be a master interface (bridge/bond), and it's unclear how to automatically choose the correct MTU in more complex scenarios. It's up to the user to get it right.


However, there are issues:

1) possibly NM should better notify about the error when it fails to set the MTU. At least, there should be a clear failure message in the logfile. Maybe, activation of the connection should fail altogether, and not just silently proceed (with wrong MTU).


2) Then there is a more serious bug here. Even if the parent connection has the MTU correctly set as well, NetworkManager does not wait for the parent's MTU to be set. So, what can happen is that the VLAN interface activates first, at that point setting the MTU fails. Later, when the parent interface proceeds at Layer2 configuration, the MTU gets set, but it's not retried for the VLAN interface.
Either, the VLAN interface should retry setting the MTU after the parent increases it (this is possibly simpler), or the VLAN interface should wait (block) until the parent interface is ready (seems more correct, but more complicated).


Reopening.

Comment 9 Thomas Haller 2017-10-19 05:51:15 UTC
please review th/device-mtu-rh1414901

Comment 10 Beniamino Galvani 2017-10-19 13:11:55 UTC
(In reply to Thomas Haller from comment #9)
> please review th/device-mtu-rh1414901

LGTM

Comment 15 Vladimir Benes 2017-12-05 23:32:54 UTC
Configuration in comment #0 is considered invalid as stated above. Fix in this bug corrects situation when master device is connected to slow DHCP server and vlan is activated before. 
If such situation happens, NM 1.8 stays in incorrect state and master device and vlan are never activated with desired MTU for both. 
See https://github.com/NetworkManager/NetworkManager-ci/blob/master/nmcli/features/vlan.feature#L428 for more details about verification.

Comment 23 errata-xmlrpc 2018-04-10 13:19:11 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0778

Comment 24 Sangam 2018-04-19 17:29:14 UTC
Hello Team,

Can we open this bugzilla again. Looks like we still have the issue.

I have upgraded the system to said kernel and NM version and it looks like the issue still exist but in a different way.

Environment:- 
kernel-3.10.0-862.el7.x86_64
NetworkManager-1.10.2-13.el7.x86_64

Issue :-

Where I tried to setup below with mtu 9000 but after reboot mtu is changing to 1500 for VLAN and bridge device

  NIC  -->  TEAM  --> VLAN --> Bridge 


# nmcli connection add type team con-name team0 ifname team0 ipv4.method disabled ipv6.method ignore mtu 9000 config '{ "runner": {"name":"lacp", "fast_rate":true }}'
# nmcli connection add type team-slave con-name team0-ens3 ifname ens3 mtu 9000 master team0
# nmcli connection add type team-slave con-name team0-ens8 ifname ens8 mtu 9000 master team0
# nmcli connection add type vlan con-name team0.10 ifname team0.10 id 10 dev team0 ipv4.method disabled ipv6.method ignore connection.master team0-bridge connection.slave-type bridge
# nmcli connection add type bridge con-name team0-bridge ifname team0-bridge ipv4.method manual ipv6.method ignore ipv4.addresses "11.0.0.1/24"


# ip a | grep -i team
2: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc pfifo_fast master team0 state UP group default qlen 1000
3: ens8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc pfifo_fast master team0 state UP group default qlen 1000
4: team0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UP group default qlen 1000
5: team0.10@team0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue master team0-bridge state UP group default qlen 1000
6: team0-bridge: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UP group default qlen 1000
    inet 11.0.0.1/24 brd 11.0.0.255 scope global noprefixroute team0-bridge


Reboot

# reboot

# ip a | grep -i team
2: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc pfifo_fast master team0 state UP group default qlen 1000
3: ens8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc pfifo_fast master team0 state UP group default qlen 1000
4: team0-bridge: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    inet 11.0.0.1/24 brd 11.0.0.255 scope global noprefixroute team0-bridge
5: team0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UP group default qlen 1000
6: team0.10@team0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master team0-bridge state UP group default qlen 1000


But the strange this here is that if I mark "ONBOOT=no" for the bridge device, MTU is getting set "9000" properly and bridge device is also coming up.

# nmcli connection modify team0-bridge connection.autoconnect no

# reboot

# ip a | grep -i team
2: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc pfifo_fast master team0 state UP group default qlen 1000
3: ens8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc pfifo_fast master team0 state UP group default qlen 1000
4: team0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UP group default qlen 1000
5: team0.10@team0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue master team0-bridge state UP group default qlen 1000
6: team0-bridge: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UP group default qlen 1000
    inet 11.0.0.1/24 brd 11.0.0.255 scope global noprefixroute team0-bridge

Comment 25 Thomas Haller 2018-04-20 05:58:19 UTC
Hi Sangam,

Let's not re-open closed bugs.

First of all, it's a bit unlikely that the underlying issue is the same -- even if the symptoms look similar.

Even if this would be an incarnation of exactly the same issue, it would still warrant a new bugzilla entry to track this separately. However, from the new bug, it could be helpful to comment something like "this looks related to bug XYZ".

Also, this bug was already investigated, ~fixed~, tested and closed. If you experience issues that make you think something is wrong, we need fresh debugging information. Which means: please attach a full level=TRACE logfile. Note the hints about logging at https://cgit.freedesktop.org/NetworkManager/NetworkManager/tree/contrib/fedora/rpm/NetworkManager.conf.

And lastly, this bug is already quite messy (discussing various aspects). Let's focus on your issue on a new bug.

Thank you!!


In general, the issue is that kernel does not allow that the MTU of a VLAN is larger than its parent's. Likewise, for bridge, etc.. That's why when NM activates all these layers, it may not be able initially to configure the desired MTU. It must be smart enough to retry later, when it becomes possible. By changing connection.autoconnect=no, the order of activation is changed, and it might accidentally work. We need full logfiles to understand where this goes wrong.

Comment 26 Sangam 2018-04-20 15:06:40 UTC
Hello Thomas,

Thanks for your reply.

I partially agreed with you. I am happy to open a new bugzilla but the issue as well the setup remains the same.

We are following the same approach or the same steps to create below setup or for which the bugzilla was initially raised

  NIC  -->  TEAM  --> VLAN --> Bridge 

and the issue still stays there.

However this is just our other observation that marking " connection.autoconnect=no" to bridge device is setting mtu properly.

I think this is a result of the fix provided through this bugzilla and should be considered under same bugzilla reference.

In previous version of NetworkManager even if I mark " connection.autoconnect=no" to bridge device, mtu doesn't get set to 9000.

I hope this will be considered. If required, I will gather debug logs and will attach to this.