Bug 1388286 - Incorrect MAC address set on em1 after interface renaming
Summary: Incorrect MAC address set on em1 after interface renaming
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: NetworkManager
Version: 7.2
Hardware: x86_64
OS: Linux
urgent
medium
Target Milestone: rc
: 7.4
Assignee: Thomas Haller
QA Contact: Desktop QE
Mirek Jahoda
URL:
Whiteboard:
: 1384187 1400764 (view as bug list)
Depends On:
Blocks: 1384256 1446211 1354032 1356451 1402514
TreeView+ depends on / blocked
 
Reported: 2016-10-25 01:43 UTC by Bob Fournier
Modified: 2018-02-06 16:53 UTC (History)
23 users (show)

Fixed In Version: NetworkManager-1.4.0-14.el7
Doc Type: Bug Fix
Doc Text:
Previously, when the udev helper utility was renaming a networking interface, NetworkManager occasionally confused interfaces due to a race condition. Consequently, NetworkManager assigned wrong MAC addresses. With this update, NetworkManager does not access an interface before udev finishes the renaming, and this problem no longer occurs.
Clone Of:
: 1402514 (view as bug list)
Environment:
Last Closed: 2017-08-01 09:19:37 UTC
Target Upstream Version:


Attachments (Terms of Use)
ramdisk journal (302.75 KB, text/x-vhdl)
2016-10-25 01:43 UTC, Bob Fournier
no flags Details
log from 2nd failure (303.77 KB, text/x-vhdl)
2016-10-25 13:15 UTC, Bob Fournier
no flags Details


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2017:2299 normal SHIPPED_LIVE Moderate: NetworkManager and libnl3 security, bug fix and enhancement update 2017-08-01 12:40:28 UTC

Description Bob Fournier 2016-10-25 01:43:23 UTC
Created attachment 1213646 [details]
ramdisk journal

Description of problem:

This was seen on an Openstack ramdisk image using NetworkManager (version 1.4.0-10.el7)

Please reassign if this should be under a different component.   

This is a system with 4 (PCI express) embedded network interfaces and 2 PCI 
network cards each with 2 ports.
The MAC Addresses on all the interfaces is as follows:

eth0 - d4:ae:52:89:6e:31
eth1 - d4:ae:52:89:6e:32
eth2 - d4:ae:52:89:6e:33
eth3 - d4:ae:52:89:6e:34
Network card 1 port 1 - a0:36:9f:32:71:48
Network card 1 port 2 - a0:36:9f:32:71:4a
Network card 2 port 1 - a0:36:9f:14:49:90
Network card 2 port 2 - a0:36:9f:14:49:92

After NetworkManager runs the MAC address assigned to em1 is incorrect, it ends up with the MAC address of p4p1 as can be seen in output of "ip a":
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: em1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    link/ether a0:36:9f:14:49:90 brd ff:ff:ff:ff:ff:ff
    inet 192.168.120.159/24 brd 192.168.120.255 scope global dynamic em1
       valid_lft 95sec preferred_lft 95sec
    inet6 fe80::a236:9fff:fe14:4990/64 scope link 
       valid_lft forever preferred_lft forever
3: em2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN qlen 1000
    link/ether d4:ae:52:89:6e:32 brd ff:ff:ff:ff:ff:ff
4: em3: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN qlen 1000
    link/ether d4:ae:52:89:6e:33 brd ff:ff:ff:ff:ff:ff
5: em4: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN qlen 1000
    link/ether d4:ae:52:89:6e:34 brd ff:ff:ff:ff:ff:ff
6: p6p1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    link/ether a0:36:9f:32:71:48 brd ff:ff:ff:ff:ff:ff
7: p6p2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    link/ether a0:36:9f:32:71:4a brd ff:ff:ff:ff:ff:ff
8: p4p1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    link/ether a0:36:9f:14:49:90 brd ff:ff:ff:ff:ff:ff
9: p4p2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    link/ether a0:36:9f:14:49:92 brd ff:ff:ff:ff:ff:ff

What appears to be happening is that both em1 (/Device/1) and p4p1 (/Device7) are assigned the interim interface name eth0:
Oct 13 16:33:44 localhost.localdomain NetworkManager[426]: <info>  [1476390824.1340] manager: (eth0): new Ethernet device (/org/freedesktop/NetworkManager/Devices/1)
Oct 13 16:33:44 localhost.localdomain NetworkManager[426]: <info>  [1476390824.1367] manager: (eth1): new Ethernet device (/org/freedesktop/NetworkManager/Devices/2)
Oct 13 16:33:44 localhost.localdomain NetworkManager[426]: <info>  [1476390824.1386] manager: (eth2): new Ethernet device (/org/freedesktop/NetworkManager/Devices/3)
Oct 13 16:33:44 localhost.localdomain NetworkManager[426]: <info>  [1476390824.1402] manager: (eth3): new Ethernet device (/org/freedesktop/NetworkManager/Devices/4)
Oct 13 16:33:44 localhost.localdomain NetworkManager[426]: <info>  [1476390824.1416] manager: (eth4): new Ethernet device (/org/freedesktop/NetworkManager/Devices/5)
Oct 13 16:33:44 localhost.localdomain NetworkManager[426]: <info>  [1476390824.2779] manager: (eth5): new Ethernet device (/org/freedesktop/NetworkManager/Devices/6)
Oct 13 16:33:44 localhost.localdomain NetworkManager[426]: <info>  [1476390824.2801] manager: (eth0): new Ethernet device (/org/freedesktop/NetworkManager/Devices/7)
Oct 13 16:33:44 localhost.localdomain NetworkManager[426]: <info>  [1476390824.2820] manager: (eth3): new Ethernet device (/org/freedesktop/NetworkManager/Devices/8)

To show in more detail.  The embedded interfaces are read first and renamed:
Oct 13 16:33:44 localhost.localdomain NetworkManager[426]: <info>  [1476390824.2782] device (eth0): interface index 2 renamed iface from 'eth0' to 'em1'
Oct 13 16:33:44 localhost.localdomain NetworkManager[426]: <info>  [1476390824.2803] device (eth3): interface index 5 renamed iface from 'eth3' to 'em4'
Oct 13 16:33:44 localhost.localdomain NetworkManager[426]: <info>  [1476390824.2823] device (eth2): interface index 4 renamed iface from 'eth2' to 'em3'
Oct 13 16:33:44 localhost.localdomain NetworkManager[426]: <info>  [1476390824.4389] device (eth1): interface index 3 renamed iface from 'eth1' to 'em2'

Then its almost as if once eth0 is no longer being used, so eth0 is used for /Device/7, p4p1:
Oct 13 16:33:45 localhost.localdomain NetworkManager[426]: <info>  [1476390825.3173] device (eth0): interface index 8 renamed iface from 'eth0' to 'p4p1'
Oct 13 16:33:46 localhost.localdomain NetworkManager[426]: <info>  [1476390826.8992] device (eth4): interface index 6 renamed iface from 'eth4' to 'p6p1'
Oct 13 16:33:46 localhost.localdomain NetworkManager[426]: <info>  [1476390826.9033] device (eth5): interface index 7 renamed iface from 'eth5' to 'p6p2'
Oct 13 16:33:46 localhost.localdomain NetworkManager[426]: <info>  [1476390826.9071] device (eth3): interface index 9 renamed iface from 'eth3' to 'p4p2'

This causes set-hw-addr to be called on em1 only (no other interfaces are affected) and em1 to have the incorrect MAC address:
Oct 13 16:33:48 localhost.localdomain NetworkManager[426]: <info>  [1476390828.2222] device (em1): set-hw-addr: set-cloned MAC address to A0:36:9F:14:49:90 (permanent)


Version-Release number of selected component (if applicable):

Linux version 3.10.0-510.el7.x86_64 (mockbuild@x86-034.build.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) 

NetworkManager (version 1.4.0-10.el7)

How reproducible:

Cannot be reproduced consistently, seems to be dependent on order that links go up and down.


Steps to Reproduce:
1. Run ramdisk image and verify wrong MAC address assigned to em1

Actual results:
em1 would have the correct MAC address - d4:ae:52:89:6e:31

Expected results:
em1 has incorrect MAC address - a0:36:9f:14:49:90, the same as p4p1

Additional info:

Comment 2 Thomas Haller 2016-10-25 07:32:14 UTC
Could you enable debug logging of NetworkManager (level=TRACE) and provide a logfile? Configure the loglevel in /etc/NetworkManager/NetworkManager.conf and reproduce the issue.


I think the problem is that the permanent MAC address is read via ethtool, for which we specify the interface by name, not by ifindex.

That might be somewhat avoided, by ensuring we don't read the permanent MAC address before UDEV has done it's business. Of course, the interface could still be renamed later on, and the race still exists.

Comment 3 Bob Fournier 2016-10-25 13:15:22 UTC
Created attachment 1213900 [details]
log from 2nd failure

Comment 4 Bob Fournier 2016-10-25 13:17:27 UTC
Thomas - thanks for looking at this.  I will get logs with the updated log level but it may take a few days.  This problem has only been seen in Dell's configuration and, as its a ramdisk image, we don't have the ability to change the NetworkManager.conf 
without rebuilding a new image for them and having them redeploy.  In addition the Dell contact (Chris Dearborn) is out this week.

For reference, this is the Openstack bug from Dell that describes the original issue - https://bugzilla.redhat.com/show_bug.cgi?id=1384187

As described in that bug, this problem is intermittent and appears to be timing related.  It worked fine on 4 of the 6 nodes, on the other node that failed it was the mac of p6p2 instead of p4p1 that was used for em1.  I've included the journal of this 2nd case in case its useful to see the differences.

I would also be interested to know of of any suggested workarounds, for example I assume that disabling interface renaming would resolve this, e.g. via kernel flags 'net.ifnames=0' and 'biosdevname=0.  That however could lead to different Openstack problems as the interface names between deployed images may be different.

Comment 5 Thomas Haller 2016-10-26 13:22:31 UTC
Hi,

if it's too cumbersome, don't bother with the logfiles. I think the issue is clear. Thanks.



The problem is that NetworkManager caches the permanent MAC address and only attempts to read it once.


NM gets the permanent MAC address via ethtool ioctl, which has the interface name as argument. Thus, there is a race of using the wrong name. The real fix for this would be to use a different kernel API that uses the ifindex (which currently does not exist).

The race is especially likely before UDEV is finished renaming the device. Later on, the device usually doesn't get renamed and is much more unlikely that NetworkManager hits the race.


A part of the problem is, that NM already tries to read the permanent MAC address before UDEV is done renaming the interface.
Another reason why that is bad (not related to this bug) is for software devices without permanent MAC address (veth), NM takes the current MAC address and caches it (using it as pseudo permanent address). The user would be advised to configure the initial MAC address of the device via UDEV, so NM should not read the MAC address before UDEV completed.



Due to the race you end up with NetworkManager thinking that a certain device has the wrong permanent MAC address. Workarounds are:

(1) don't rename the interfaces :)

(2) avoid NetworkManager using the wrong MAC address:

(2a) NetworkManager uses it for example for configurations in /etc/NetworkManager/NetworkManager.conf (see Device List Format in `man NetworkManager.conf`):

  [keyfile]
  unmanaged-devices=mac:A0:36:9F:14:49:90

use instead the interface name there.

(2b) There is the per-connection setting ethernet.cloned-mac-address (see `nmcli connection show $NAME`). Don't set it to "permanent", because then NetworkManager will try to configure the (wrong) permanent address on the device. Set it either to an explicit MAC address or "preserve" to not change it.

  nmcli connection modify $NAME ethernet.cloned-mac-address preserve

Note, that the property has by default the value "", which means to fallback to a globally configured default value. you can configure that in a file like /etc/NetworkManager/conf.d/90-cloned-mac-address-preserve.conf

  [connection-cloned-mac-address-preserve]
  ethernet.cloned-mac-address=preserve

Followed by `killall -SIGHUP NetworkManager`.
See "Connection section" in `man NetworkManager.conf` for details.
This will then take effect for all connections that have their per-connection value unset.






There is upstream branch th/preserve-fake-perm-hwaddr-bgo772880 on review to delay reading the permanent MAC address before UDEV is done.

Comment 7 Thomas Haller 2016-11-03 11:35:37 UTC
also backported to upstream nm-1-4 branch:
https://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?id=221851436b3bc5c3b43f574269adee9ca052a50d

Comment 8 Bob Fournier 2016-11-07 21:42:53 UTC
Would it be possible to get a 7.3-z patch for this?

Comment 9 Karl Hastings 2016-11-08 14:58:19 UTC
We need qa and pm acks before we can propose for z-stream.

Comment 10 Tomas Pelka 2016-11-08 18:02:41 UTC
(In reply to Karl Hastings from comment #9)
> We need qa and pm acks before we can propose for z-stream.

Rerouting to Vlad.

Comment 11 Thomas Haller 2016-11-10 10:21:12 UTC
Bob, any chance to test scratch-build https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=12071007 ?

Comment 12 Bob Fournier 2016-11-10 22:04:20 UTC
Thomas - we've tested the patch on two different systems that were exhibiting the issue earlier and have verified that it works with this patch.

These are the rpms used:
- NetworkManager-1.4.0-13.test.rh1388286.01.el7_3.x86_64.rpm
- NetworkManager-libnm-1.4.0-13.test.rh1388286.01.el7_3.x86_64.rpm
- NetworkManager-tui-1.4.0-13.test.rh1388286.01.el7_3.x86_64.rpm
- NetworkManager-config-server-1.4.0-13.test.rh1388286.01.el7_3.x86_64.rpm
- NetworkManager-team-1.4.0-13.test.rh1388286.01.el7_3.x86_64.rpm

Comment 13 Bob Fournier 2016-11-14 20:17:49 UTC
*** Bug 1384187 has been marked as a duplicate of this bug. ***

Comment 22 Bob Fournier 2016-12-19 19:07:04 UTC
*** Bug 1400764 has been marked as a duplicate of this bug. ***

Comment 28 Sai Sindhur Malleni 2017-03-07 15:19:33 UTC
I am seeing this in latest OSP10. Do we know if we have a fix yet?

Comment 29 Vladimir Benes 2017-03-07 15:36:21 UTC
(In reply to Sai Sindhur Malleni from comment #28)
> I am seeing this in latest OSP10. Do we know if we have a fix yet?

Update should be out in 7.3.3 z stream. Not sure how this affects OSP10. This bug tracks 7.4.

Comment 31 Bob Fournier 2017-06-29 22:01:01 UTC
I tested with RHEL 7.4 and the following NetworkManager rpms:
NetworkManager-tui-1.8.0-9.el7.x86_64
NetworkManager-config-server-1.8.0-9.el7.noarch
NetworkManager-team-1.8.0-9.el7.x86_64
NetworkManager-libnm-1.8.0-9.el7.x86_64
NetworkManager-1.8.0-9.el7.x86_64

With the agent.ramdisk using these rpms, I ran introspection multiple times. I did not see any problems with interface renaming causing incorrect mac addresses to be set.

Comment 32 Vladimir Benes 2017-06-30 09:42:20 UTC
Marking as verified according to previous comment.

Comment 33 errata-xmlrpc 2017-08-01 09:19:37 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2299


Note You need to log in before you can comment on or make changes to this bug.