Bug 1325752

Summary: Vlan devices do not inherit the bonding device MAC address when bonding driver is reloaded
Product: Red Hat Enterprise Linux 7 Reporter: Jonathan Maxwell <jmaxwell>
Component: NetworkManagerAssignee: Beniamino Galvani <bgalvani>
Status: CLOSED ERRATA QA Contact: Desktop QE <desktop-qa-list>
Severity: urgent Docs Contact:
Priority: medium    
Version: 7.2CC: atragler, bgalvani, dcbw, jmaxwell, lrintel, npatil, rkhan, thaller, vbenes, villapla
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: NetworkManager-1.2.0-1.el7 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-11-03 19:08:53 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
The debug logs of NetworkManager
none
[PATCH] device/vlan: update hw address also during prepare phase none

Description Jonathan Maxwell 2016-04-11 06:05:03 UTC
Description of problem:

This is similar to:

https://bugzilla.redhat.com/show_bug.cgi?id=1281617

Which resolved this problem during a reboot. However after the reboot if the following is executed:

# nmcli net off; modprobe -r bonding; nmcli net on 

Then the vlan bonding device gets a random MAC address again.

Version-Release number of selected component (if applicable):

RHEL 7.2 (with NetworkManager with bz 1281617 fix)

# uname -r
3.10.0-327.el7.x86_64

# rpm -qa|grep NetworkM
NetworkManager-1.0.6-27.el7.x86_64

How reproducible:

Always. Just add a vlan bonding device as follows:

# nmcli connection add type bond con-name bond0 ifname bond0 mode active-backup
# nmcli connection add type bond-slave con-name em3 ifname em3 master bond0

Now add the vlan device:

# nmcli connection add type vlan con-name bond0.692 ifname bond0.692 dev bond0 id 692

All looks good ie: bond has slaves MAC addr, vlan bond has same MAC addr as slave and bonding device.

4: em3: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP qlen 1000
    link/ether 44:a8:42:32:8e:38 brd ff:ff:ff:ff:ff:ff
5: em4: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP qlen 1000
    link/ether 44:a8:42:32:8e:38 brd ff:ff:ff:ff:ff:ff
19: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP 
    link/ether 44:a8:42:32:8e:38 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::46a8:42ff:fe32:8e38/64 scope link 
       valid_lft forever preferred_lft forever
20: bond0.692@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP 
    link/ether 44:a8:42:32:8e:38 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::46a8:42ff:fe32:8e38/64 scope link 
       valid_lft forever preferred_lft forever

But execute:

# nmcli net off; modprobe -r bonding; nmcli net on

4: em3: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP qlen 1000
    link/ether 44:a8:42:32:8e:38 brd ff:ff:ff:ff:ff:ff
5: em4: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP qlen 1000
    link/ether 44:a8:42:32:8e:38 brd ff:ff:ff:ff:ff:ff
25: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP 
    link/ether 44:a8:42:32:8e:38 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::46a8:42ff:fe32:8e38/64 scope link 
       valid_lft forever preferred_lft forever
26: bond0.692@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP 
    link/ether f2:54:09:e5:b8:be brd ff:ff:ff:ff:ff:ff
    inet6 fe80::f054:9ff:fee5:b8be/64 scope link 
       valid_lft forever preferred_lft forever
# 

Actual results:

The vlan bonding has a random MAC address.
 
Expected results:

It should have the same MAC address as the parent bonding device and active slave.

Additional info:

I have opened this bug in response to:

https://bugzilla.redhat.com/show_bug.cgi?id=1281617

which was made a duplicate of bug 1264322. But even with that fix we still have this problem.

Comment 2 Beniamino Galvani 2016-04-11 15:49:20 UTC
I had no luck in reproducing this both on virtual and physical machines. Could you please provide logs with debug enabled (level=DEBUG in [logging] section of /etc/NetworkManager/NetworkManager.conf; a restart of the service is required)? Thanks!

Comment 3 Sibu Thomas Mathew 2016-04-11 15:56:43 UTC
Hi Beniamino,

The issue isn't reproducible on virtual machines. I suspect it has something to do with Dell machine because that's the one the customer is using. I was able to reproduce the issue on dell-r430-24.gsslab.rdu2.redhat.com

Comment 4 Rashid Khan 2016-04-11 16:50:37 UTC
Hi Sibu,
Can we please get the logs as requested in Comment 2 above, from the machine where it is reproducible. 

Thanks

Comment 5 Sibu Thomas Mathew 2016-04-11 19:19:07 UTC
Created attachment 1146095 [details]
The debug logs of NetworkManager

Comment 6 Sibu Thomas Mathew 2016-04-11 19:23:13 UTC
Hi Rashid/Beniamino,

Please find attached the debug logs of NetworkManager from /var/log/messages.

The reproducer starts with the following log: "Starting Producer for Bugzilla 1325752"

And the following are the sequence of steps executed:

# nmcli net off; modprobe -r bonding; nmcli net on

# nmcli connection show

# nmcli connection add type vlan con-name bond0.692 ifname bond0.692 dev bond0 id 692

# nmcli net off; modprobe -r bonding; nmcli net on

# nmcli connection show

Comment 10 Beniamino Galvani 2016-04-12 13:11:02 UTC
Created attachment 1146445 [details]
[PATCH] device/vlan: update hw address also during prepare phase

Proposed fix.

Comment 11 Dan Williams 2016-04-13 15:27:12 UTC
LGTM

Comment 12 Thomas Haller 2016-04-13 16:22:53 UTC
lgtm too

Comment 15 Sibu Thomas Mathew 2016-04-27 11:06:59 UTC
Hi Beniamino,

Can I a test package be shared so that it can be passed on to the customer to verify whether is fixes the issue?

Comment 17 Vladimir Benes 2016-06-07 14:49:44 UTC
This is very much hw related. Tested on  dell-r430-2. 

with NM-1.0.6:
11: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP qlen 1000
    link/ether 44:a8:42:32:8e:35 brd ff:ff:ff:ff:ff:ff
    inet 10.10.183.217/21 brd 10.10.183.255 scope global dynamic bond0
       valid_lft 43195sec preferred_lft 43195sec
    inet6 2620:52:0:ab0:46a8:42ff:fe32:8e35/64 scope global noprefixroute dynamic 
       valid_lft 2591994sec preferred_lft 604794sec
    inet6 fe80::46a8:42ff:fe32:8e35/64 scope link 
       valid_lft forever preferred_lft forever
12: bond0.692@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP qlen 1000
    link/ether 22:7c:70:59:73:f2 brd ff:ff:ff:ff:ff:ff
                 ^^^^^^^^^^^


several attempts with NM-1.2.0:
14: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP qlen 1000
    link/ether 44:a8:42:32:8e:35 brd ff:ff:ff:ff:ff:ff
    inet 10.10.183.217/21 brd 10.10.183.255 scope global dynamic bond0
       valid_lft 43154sec preferred_lft 43154sec
    inet6 2620:52:0:ab0:46a8:42ff:fe32:8e35/64 scope global noprefixroute dynamic 
       valid_lft 2591954sec preferred_lft 604754sec
    inet6 fe80::46a8:42ff:fe32:8e35/64 scope link 
       valid_lft forever preferred_lft forever
16: bond0.692@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP qlen 1000
    link/ether 44:a8:42:32:8e:35 brd ff:ff:ff:ff:ff:ff

Comment 18 Beniamino Galvani 2016-07-17 08:10:32 UTC
*** Bug 1357026 has been marked as a duplicate of this bug. ***

Comment 19 Juanjo Villaplana 2016-07-18 07:09:24 UTC
Hi Beniamino,

I come from bz#1357026 and i'm very interested in getting this bug fixed ASAP, any chance to have access to the updated NetworkManager RPM (or SRC)?

Comment 20 Juanjo Villaplana 2016-09-15 07:16:37 UTC
Installed NetworkManager-1.4.0-0.5.beta1.el7.x86_64.rpm from 7.3 Beta-1 on a 7.2 test box and the bug is FIXED (in our case in a team interface).

Comment 22 errata-xmlrpc 2016-11-03 19:08:53 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2016-2581.html