Bug 1661574

Summary: [Azure][RHEL8.0]NetworkManager - dhclient interfaces configuration at boot time
Product: Red Hat Enterprise Linux 8 Reporter: Chris <v-chvale>
Component: NetworkManagerAssignee: Francesco Giudici <fgiudici>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Yuhui Jiang <yujiang>
Severity: high Docs Contact:
Priority: high    
Version: 8.0CC: ailan, atragler, bgalvani, eterrell, fgiudici, huzhao, jopoulso, leiwang, lrintel, ribarry, rkhan, sukulkar, thaller, v-adsuho, v-chvale, vkuznets, wshi, xiaofwan, xuli, yacao, yujiang, yuxisun
Target Milestone: rc   
Target Release: 8.0   
Hardware: All   
OS: Windows   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-06-13 09:05:20 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1689408, 1701002    
Attachments:
Description Flags
journalctl nm output
none
RHEL 7.5 journalctl output none

Description Chris 2018-12-21 16:12:04 UTC
Description of problem:
In Azure when creating a VM with multiple NICs with SR-IOV, only eth0 gets configured and has an IP.
All the other interfaces will remain unconfigured.


Version-Release number of selected component (if applicable):
8.0 Beta image from Azure gallery
8.0 Snapshot 2

How reproducible:
100%

Steps to Reproduce:
1. Create Azure VM with redhat 8.0
2. Assign multiple NICs to VM
3. Check interfaces status after VM is up

Actual results:
Only eth0 is configured.
If we do a simple "dhclient -r", all interfaces will get an IP.

Expected results:
All interfaces should get by default an IP from dhcp.
What we observe compared to redhat 7.5 provision is that during boot, dhclient processes are not spawned by NM.

Additional info:

RHEL 8.0
Redirecting to /bin/systemctl status NetworkManager.service
● NetworkManager.service - Network Manager
   Loaded: loaded (/usr/lib/systemd/system/NetworkManager.service; enabled; vendor preset: enabled)
   Active: active (running) since Fri 2018-12-21 08:15:14 EST; 1h 53min ago
     Docs: man:NetworkManager(8)
 Main PID: 1379 (NetworkManager)
    Tasks: 3 (limit: 104856)
   Memory: 5.3M
   CGroup: /system.slice/NetworkManager.service
           └─1379 /usr/sbin/NetworkManager --no-daemon

RHEL 7.5
[root@-rhel-75 ~]# service NetworkManager status
Redirecting to /bin/systemctl status NetworkManager.service
● NetworkManager.service - Network Manager
   Loaded: loaded (/usr/lib/systemd/system/NetworkManager.service; enabled; vendor preset: enabled)
   Active: active (running) since Fri 2018-12-21 12:26:29 UTC; 2h 42min ago
     Docs: man:NetworkManager(8)
 Main PID: 1399 (NetworkManager)
   CGroup: /system.slice/NetworkManager.service
           ├─1399 /usr/sbin/NetworkManager --no-daemon
           ├─1430 /sbin/dhclient -d -q -sf /usr/libexec/nm-dhcp-helper -pf /var/run/dhclient-eth1.pid -lf /var/lib...
           ├─1439 /sbin/dhclient -d -q -sf /usr/libexec/nm-dhcp-helper -pf /var/run/dhclient-eth2.pid -lf /var/lib...
           ├─1442 /sbin/dhclient -d -q -sf /usr/libexec/nm-dhcp-helper -pf /var/run/dhclient-eth3.pid -lf /var/lib...
           └─1656 /sbin/dhclient -d -q -sf /usr/libexec/nm-dhcp-helper -pf /var/run/dhclient-eth0.pid -lf /var/lib...

Also compared to RHEL 7.5 is that /var/lib/NetworkManager
doesn't contain any dhclient-eth*.conf files

Comment 1 Beniamino Galvani 2018-12-21 18:10:37 UTC
NM starts DHCP at boot on an interface when there is a connection profile (see the 'nmcli connection' output) that matches the interface and the profile has autoconnection enabled.

If there is no connection profile for the interface, a new default one is created and activated at boot, unless the NetworkManager-config-server package is installed (which tells NetworkManager that this is a server installation and it should only connect devices that the sysadmin explicitly configured).

Can you please paste the output of the following commands on RHEL 8 (and if possible also on RHEL 7.5)?

 NetworkManager --print-config
 nmcli
 nmcli connection

Comment 2 Adrian Suhov 2018-12-21 18:15:28 UTC
RHEL 8.0
[root@adsuho-rhel80-ss2 NetworkManager]# NetworkManager --print-config
# NetworkManager configuration: /etc/NetworkManager/NetworkManager.conf

[main]
# plugins=ifcfg-rh,ibft
# rc-manager=symlink
# auth-polkit=true
# dhcp=internal

[logging]
# backend=journal
# audit=false
[root@adsuho-rhel80-ss2 NetworkManager]# nmcli
eth0: connected to eth0
        "The Linux Foundation Microsoft Hyper-V"
        ethernet (hv_netvsc), 00:0D:3A:F7:17:81, hw, mtu 1500
        ip4 default
        inet4 10.0.0.4/24
        route4 168.63.129.16/32
        route4 169.254.169.254/32
        route4 0.0.0.0/0
        route4 10.0.0.0/24
        inet6 fe80::4157:3037:e866:4a36/64
        route6 fe80::/64
        route6 ff00::/8

enP35692s2: disconnected
        "Mellanox MT27500/MT27520"
        1 connection available
        ethernet (mlx4_core), 00:0D:3A:F7:E4:FA, hw, mtu 1500

enP43104s1: disconnected
        "Mellanox MT27500/MT27520"
        1 connection available
        ethernet (mlx4_core), 00:0D:3A:F7:17:81, hw, mtu 1500

enP45325s3: disconnected
        "Mellanox MT27500/MT27520"
        1 connection available
        ethernet (mlx4_core), 00:0D:3A:F7:EF:30, hw, mtu 1500

enP47077s4: disconnected
        "Mellanox MT27500/MT27520"
        1 connection available
        ethernet (mlx4_core), 00:0D:3A:F7:E3:89, hw, mtu 1500

eth1: disconnected
        "The Linux Foundation Microsoft Hyper-V"
        1 connection available
        ethernet (hv_netvsc), 00:0D:3A:F7:E4:FA, hw, mtu 1500

eth2: disconnected
        "The Linux Foundation Microsoft Hyper-V"
        1 connection available
        ethernet (hv_netvsc), 00:0D:3A:F7:EF:30, hw, mtu 1500

eth3: disconnected
        "The Linux Foundation Microsoft Hyper-V"
        1 connection available
        ethernet (hv_netvsc), 00:0D:3A:F7:E3:89, hw, mtu 1500

lo: unmanaged
        "lo"
        loopback (unknown), 00:00:00:00:00:00, sw, mtu 65536

DNS configuration:
        servers: 168.63.129.16
        domains: whroxpbiwhpe5olujcmdqii5ta.xx.internal.cloudapp.net
        interface: eth0

Use "nmcli device show" to get complete information about known devices and
"nmcli connection show" to get an overview on active connection profiles.

Consult nmcli(1) and nmcli-examples(5) manual pages for complete usage details.
[root@adsuho-rhel80-ss2 NetworkManager]# nmcli connection
NAME                UUID                                  TYPE      DEVICE
eth0                bb0f7236-e4f4-4be4-b8bd-906ae262536e  ethernet  eth0
Wired connection 1  637a4069-3e87-3432-bf80-49e68337a66b  ethernet  --
Wired connection 2  8d404be0-0a1e-36a5-9057-dbc23ecddb56  ethernet  --
Wired connection 3  fed3f3a5-e852-3941-af4a-d4a1f8e5475d  ethernet  --
Wired connection 4  a5bf2445-8b07-32b3-9241-f150266a72a0  ethernet  --

-----------------------------------------------------------------------------------------------
RHEL 7.5
[root@adsuho-rhel-75 NetworkManager]# NetworkManager --print-config
# NetworkManager configuration: /etc/NetworkManager/NetworkManager.conf (lib: 10-slaves-order.conf)

[main]
# plugins=ifcfg-rh,ibft
# rc-manager=file
# auth-polkit=true
# dhcp=dhclient
slaves-order=index

[logging]
# backend=syslog
# audit=false
[root@adsuho-rhel-75 NetworkManager]# nmcli
eth0: connected to System eth0
        "eth0"
        ethernet (hv_netvsc), 00:0D:3A:F9:BB:0E, hw, mtu 1500
        ip4 default
        inet4 10.0.0.4/24
        route4 168.63.129.16/32
        route4 169.254.169.254/32
        route4 0.0.0.0/0
        route4 10.0.0.0/24
        inet6 fe80::20d:3aff:fef9:bb0e/64
        route6 ff00::/8
        route6 fe80::/64

eth1: connected to Wired connection 1
        "eth1"
        ethernet (hv_netvsc), 00:0D:3A:F9:B3:2E, hw, mtu 1500
        inet4 10.0.1.4/24
        route4 10.0.1.0/24
        inet6 fe80::b654:de53:7a83:73b0/64
        route6 fe80::/64
        route6 fe80::/64
        route6 ff00::/8

eth2: connected to Wired connection 2
        "eth2"
        ethernet (hv_netvsc), 00:0D:3A:F9:BB:B4, hw, mtu 1500
        inet4 10.0.2.4/24
        route4 10.0.2.0/24
        inet6 fe80::134f:cf46:ac29:fdd6/64
        route6 fe80::/64
        route6 fe80::/64
        route6 ff00::/8

eth3: connected to Wired connection 3
        "eth3"
        ethernet (hv_netvsc), 00:0D:3A:F9:BF:A8, hw, mtu 1500
        inet4 10.0.3.4/24
        route4 10.0.3.0/24
        inet6 fe80::709f:c288:de6b:8b54/64
        route6 fe80::/64
        route6 fe80::/64
        route6 ff00::/8

enP1p0s2: unmanaged
        "Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]"
        ethernet (mlx4_core), 00:0D:3A:F9:BB:0E, hw, mtu 1500

enP2p0s2: unmanaged
        "Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]"
        ethernet (mlx4_core), 00:0D:3A:F9:B3:2E, hw, mtu 1500

enP3p0s2: unmanaged
        "Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]"
        ethernet (mlx4_core), 00:0D:3A:F9:BB:B4, hw, mtu 1500

enP4p0s2: unmanaged
        "Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]"
        ethernet (mlx4_core), 00:0D:3A:F9:BF:A8, hw, mtu 1500

lo: unmanaged
        "lo"
        loopback (unknown), 00:00:00:00:00:00, sw, mtu 65536

DNS configuration:
        servers: 168.63.129.16
        domains: a02fp2bmvsyubb21hslzbl2evh.xx.internal.cloudapp.net
        interface: eth0

        servers: 168.63.129.16
        domains: a02fp2bmvsyubb21hslzbl2evh.xx.internal.cloudapp.net
        interface: eth3

        servers: 168.63.129.16
        domains: a02fp2bmvsyubb21hslzbl2evh.xx.internal.cloudapp.net
        interface: eth2

        servers: 168.63.129.16
        domains: a02fp2bmvsyubb21hslzbl2evh.xx.internal.cloudapp.net
        interface: eth1

Use "nmcli device show" to get complete information about known devices and
"nmcli connection show" to get an overview on active connection profiles.

Consult nmcli(1) and nmcli-examples(5) manual pages for complete usage details.
[root@adsuho-rhel-75 NetworkManager]# nmcli connection
NAME                UUID                                  TYPE      DEVICE
System eth0         5fb06bd0-0bb0-7ffb-45f1-d6edd65f3e03  ethernet  eth0
Wired connection 1  972849d4-9685-3f9a-b652-243d322ab0ad  ethernet  eth1
Wired connection 2  0641abe1-664e-36c2-8ec0-158e8f370e68  ethernet  eth2
Wired connection 3  ca20c9fc-d22d-3e72-be3e-f682fc128f89  ethernet  eth3

Comment 3 Beniamino Galvani 2018-12-21 18:26:24 UTC
Thanks, I don't see anything wrong in configuration. Could you also please set 'level=DEBUG' in the [logging] section of /etc/NetworkManager/NetworkManager.conf on RHEL8, reboot the system and then attach the output of 'journalctl -u NetworkManager -b' after the system has booted?

Comment 4 Adrian Suhov 2018-12-21 19:04:59 UTC
Ok, this is a big one, so here it is: https://pastebin.com/5ybdN4fX

Comment 5 Chris 2018-12-22 08:36:35 UTC
Created attachment 1516209 [details]
journalctl nm output

Comment 6 Beniamino Galvani 2018-12-22 17:40:15 UTC
I have some observations about logs.

There are 4 Hyper-V ethernet interfaces (eth*) and other 4 Mellanox
ethernet interfaces (enP*) that have the same addresses of the first
ones. Also, the Mellanox interfaces are reported by kernel as slaves
of first ones:

 2: eth0 <DOWN;broadcast,multicast> mtu 1500 arp 1 ethernet? not-init addrgenmode eui64 addr 00:0D:3A:F7:17:81 driver hv_netvsc rx:0,0 tx:0,0
 3: eth1 <DOWN;broadcast,multicast> mtu 1500 arp 1 ethernet? not-init addrgenmode eui64 addr 00:0D:3A:F7:E4:FA driver hv_netvsc rx:0,0 tx:0,0
 4: eth2 <DOWN;broadcast,multicast> mtu 1500 arp 1 ethernet? not-init addrgenmode eui64 addr 00:0D:3A:F7:EF:30 driver hv_netvsc rx:0,0 tx:0,0
 5: eth3 <DOWN;broadcast,multicast> mtu 1500 arp 1 ethernet? not-init addrgenmode eui64 addr 00:0D:3A:F7:E3:89 driver hv_netvsc rx:0,0 tx:0,0
 6: enP43104s1 <DOWN;broadcast,multicast,slave> mtu 1500 master 2 arp 1 ethernet? not-init addrgenmode eui64 addr 00:0D:3A:F7:17:81 driver mlx4_en rx:0,0 tx:0,0
 7: enP35692s2 <DOWN;broadcast,multicast,slave> mtu 1500 master 3 arp 1 ethernet? not-init addrgenmode eui64 addr 00:0D:3A:F7:E4:FA driver mlx4_en rx:0,0 tx:0,0
 8: enP45325s3 <DOWN;broadcast,multicast,slave> mtu 1500 master 4 arp 1 ethernet? not-init addrgenmode eui64 addr 00:0D:3A:F7:EF:30 driver mlx4_en rx:0,0 tx:0,0
 9: enP47077s4 <DOWN;broadcast,multicast,slave> mtu 1500 master 5 arp 1 ethernet? not-init addrgenmode eui64 addr 00:0D:3A:F7:E3:89 driver mlx4_en rx:0,0 tx:0,0

Can you please describe what is the purpose of all these interfaces,
whey some ethernet interfaces are slave of other ones with the same
MAC and whether you expect NM to automatically start DHCP on all or
only a subset of them?

I see that on RHEL7 the Mellanox cards are considered unmanaged by
NetworkManager, while they are actively managed on RHEL8. The
duplicated MACs of managed interfaces trigger a bug in NetworkManager
and so the activation of DHCP fails for some interfaces (eth*).

Do you have any special configuration (udev rules, NetworkManager
config) to make NM ignore enP* interfaces on RHEL7? Would it be
possible to capture a trace log of boot on RHEL7 (as explained in
comment 3 but with level=TRACE instead of level=DEBUG) to better
understand why those devices are unmanaged there?

Thanks!

Comment 7 Adrian Suhov 2018-12-24 09:28:21 UTC
Created attachment 1516487 [details]
RHEL 7.5 journalctl output

I'm attaching journalctl output for RHEL 7.5. I put the trace flag in the config file but I see that debug output is also present in the logs.

As for the questions from the previous comment:
 - This is how SR-IOV is handled in Linux Azure VMs (and also in Hyper-V generally speaking). 2 interfaces appear in guest VM for each Azure Network interface added to the VM. Transparent VF feature makes sure that the VF (enP device) is used primarily, but the IP is given to eth* device. This way, if something goes wrong with the VF, the connection falls back to the synthetic interface rather than the broken virtual function.
 - The pairing synthetic interface and virtual function will always have the same MAC in Azure/Hyper-V environment
 - We do not have any special config for the RHEL 7.5 VM. Both VMs, 8.0 and 7.5 are deployed in Azure without any other extra config. RHEL 7.5 (and previous versions) can assign an IP for the extra network interfaces, and RHEL 8.0 can't. From my previous experience/testing, this is the first time I'm seeing this issue on Azure.

Comment 8 Yuhui Jiang 2018-12-25 09:25:45 UTC
Hi, 
Microsoft adds an udev rule in RHEL7.X on-demand img, so multi SRIOV nics both can get ip addresses from dhcp server.
The rule is path is "/etc/udev/rules.d/68-azure-sriov-nm-unmanaged.rules". 
# cat /etc/udev/rules.d/68-azure-sriov-nm-unmanaged.rules
SUBSYSTEM=="net", DRIVERS=="hv_pci", ACTION=="add", ENV{NM_UNMANAGED}="1"

You add this rule in RHEL8, the issue will be resolved.

Comment 9 Yuhui Jiang 2018-12-25 09:31:15 UTC
This rule will make Mellanox cards unmanaged by NetworkManager.

Comment 10 Adrian Suhov 2018-12-27 10:43:27 UTC
I added the rule and now all the interfaces have IPs assigned. Thanks for the suggestion!

Comment 11 Chris 2019-01-03 12:27:05 UTC
Azure image RHEL 8-BETA 8.0.2018112803 does not have that rule by default.
Is that expected/normal for beta images and to be included only in GA?

Comment 14 Rick Barry 2019-01-04 13:48:02 UTC
Josh P., please take a look at comment 8. It looks like the Microsoft on-demand image to be created for RHEL 8 will need to include a udev rule as was done for RHEL 7.

Comment 15 Chris 2019-01-15 09:05:54 UTC
Revisiting this with Snapshot 3, on a custom created image I've added the mentioned udev rule, however the interfaces were still not automatically configured.

From my understanding, this disables NM from controlling the interfaces, however what I see is that there is no network/networking service that will handle the interfaces instead.
Has that been deprecated in RHEL 8.0?

Otherwise, if we run manually a dhclient command, the interfaces will get configured. Or is this expected to be done manually?

Comment 16 Thomas Haller 2019-01-15 09:27:47 UTC
"network.service" is the legacy networking service (ifcfg files in /etc/sysconfig/network-scripts).

In RHEL7, this was part of "initscripts" package (and installed and enabled by default, alongside NetworkManager)
In RHEL8, "initscripts" package was split into "initscripts" and "network-scripts". The former is still installed by default, but the latter (which contains the "network.service") is no longer installed by default.

"network.service" is deprecated on RHEL8, but of course, you can still install it (network-scripts.rpm) and use it the same manner as on RHEL-7.

> Or is this expected to be done manually?

well, if you tell the thing (NetworkManager) that configures the network not to configure it, you have to ensure it's configured to your needs some other way (possibly by installing "network-scripts.rpm" and configure legacy "network.service"). You may do that.

Comment 17 Beniamino Galvani 2019-01-15 09:43:47 UTC
(In reply to Chris from comment #15)
> Revisiting this with Snapshot 3, on a custom created image I've added the
> mentioned udev rule, however the interfaces were still not automatically
> configured.
> 
> From my understanding, this disables NM from controlling the interfaces,
> however what I see is that there is no network/networking service that will
> handle the interfaces instead.
> Has that been deprecated in RHEL 8.0?

As stated by Adrian in comment 7, only eth* interfaces should get an IP address, while enP* ones should not because they are used for the transparent failover mechanism. The udev rule makes NM ignore enP* interfaces.

Which interfaces are you referring to? Please attach the output of 'ip a; nmcli d; nmcli c'. Thanks.

Comment 18 Francesco Giudici 2019-05-27 11:51:38 UTC
Hi Chris,
  can you please clarify the issue in #c15 :
- are the ethX interfaces the ones that fail getting configured?
- can you please provide the information requested by Beniamino in #c17 (output of 'ip a; nmcli d; nmcli c')?

Thanks

Francesco

Comment 19 Francesco Giudici 2019-05-31 08:31:03 UTC
Brief recap of this issue status:
- A pre-release RHEL8 image installed on a VM with multiple SR-IOV NICs on Azure presented all but eth0 interfaces not configured. RHEL7 image works as expected.
- this can be tracked to two different issues:
(1) RHEL ships a particular build for Azure which contains a udev rule, explicited on #c8. This rule was missing in the pre-release RHEL8 image used on Azure.
(2) The network.service is no more active by default on RHEL8: the only network management service enabled by default is NetworkManager. To configure interfaces marked as not managed by NetworkManager in the rule above, either install the deprecated network-sripts or manage those interfaces by other means, as explained in #c16

Adrian reported successful result after manually applying the udev rule (#c10).
Chris reported unsuccessful test after manually applying the udev rule (c#15). Seems that the issue triggered here is the one in point(2). 

From a NetworkManager perspective I cannot see nothing that needs to be done here.
For the point (1) resolution the udev rule should be dropped in the Azure RHEL image.
The point (2) resolution looks like a configuration issue. The interfaces excluded by NetworkManager should be managed by other means.

Chris, Adrian, can you check if the udev rule is now present in the actual RHEL8 image for Azure?
Is the issue now managed? If not, can you please provide the info requested in #c17?

Thanks

Francesco

Comment 20 Francesco Giudici 2019-06-13 09:05:20 UTC
I guess the issue has been fixed by adding the udev rule in the Azure RHEL image and dealing properly with the devices not managed by NetworkManager.
I cannot see any action needed on NetworkManager side.
I don't have anyway any updated information to confirm my guessing: will close with insufficient data.

If more action is needed here please reopen and provide the requested information.

Thanks

F.G.

Comment 23 Red Hat Bugzilla 2023-09-14 04:44:11 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days