This bug has been migrated to another issue tracking site. It has been closed here and may no longer be being monitored.

If you would like to get updates for this issue, or to participate in it, you may do so at Red Hat Issue Tracker .
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1990786 - [Azure][RHEL-9]68-azure-sriov-nm-unmanaged.rules cannot stop NetworkManager-wait-online.service checking SRIOV interface
Summary: [Azure][RHEL-9]68-azure-sriov-nm-unmanaged.rules cannot stop NetworkManager-w...
Keywords:
Status: CLOSED MIGRATED
Alias: None
Product: Red Hat Enterprise Linux 9
Classification: Red Hat
Component: systemd
Version: 9.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: beta
: ---
Assignee: Michal Sekletar
QA Contact: Frantisek Sumsal
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-08-06 08:51 UTC by Yuxin Sun
Modified: 2024-01-20 04:25 UTC (History)
16 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-09-21 11:14:27 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker   RHEL-5880 0 None Migrated None 2023-09-21 11:12:06 UTC
Red Hat Issue Tracker RHELPLAN-92626 0 None None None 2021-08-06 08:52:40 UTC

Description Yuxin Sun 2021-08-06 08:51:11 UTC
Description of problem:
In Azure, if enable SR-IOV, there must be a rule to make NetworkManager ignore the SRIOV interface:
cat /etc/udev/rules.d/68-azure-sriov-nm-unmanaged.rules:
# Accelerated Networking on Azure exposes a new SRIOV interface to the VM.
# This interface is transparently bonded to the synthetic interface,
# so NetworkManager should just ignore any SRIOV interfaces.
SUBSYSTEM=="net", DRIVERS=="hv_pci", ACTION=="add", ENV{NM_UNMANAGED}="1"

This rule works in RHEL-8. However, it doesn't work in RHEL-9. The NetworkManager-wait-online.service still fail:

[root@wala90719sriov08050847-vm1 ~]# systemd-analyze blame
1min 27ms NetworkManager-wait-online.service

[root@wala90719sriov08050847-vm1 ~]# systemctl status NetworkManager-wait-online
× NetworkManager-wait-online.service - Network Manager Wait Online
     Loaded: loaded (/usr/lib/systemd/system/NetworkManager-wait-online.service; enabled; vendor preset: disabled)
     Active: failed (Result: exit-code) since Thu 2021-08-05 06:31:13 EDT; 13s ago
       Docs: man:nm-online(1)
    Process: 767 ExecStart=/usr/bin/nm-online -s -q (code=exited, status=1/FAILURE)
   Main PID: 767 (code=exited, status=1/FAILURE)
        CPU: 48ms

Aug 05 06:30:13 wala90719sriov08050847-vm1 systemd[1]: Starting Network Manager Wait Online...
Aug 05 06:31:13 wala90719sriov08050847-vm1 systemd[1]: NetworkManager-wait-online.service: Main process exited, code=exited, status=1/FAILURE
Aug 05 06:31:13 wala90719sriov08050847-vm1 systemd[1]: NetworkManager-wait-online.service: Failed with result 'exit-code'.
Aug 05 06:31:13 wala90719sriov08050847-vm1 systemd[1]: Failed to start Network Manager Wait Online.

Version-Release number of selected components (if applicable):

RHEL Version:
RHEL-9.0(5.14.0-0.rc3.29.el9.x86_64)
NetworkManager-1.32.2-1.el9.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Boot up a RHEL-9 VM on Azure with a SR-IOV card
2. Add a rule: /etc/udev/rules.d/68-azure-sriov-nm-unmanaged.rules (content is in the description)
3. systemctl status NetworkManager-wait-online

Actual results:
Service failed and costs ~1min30s on every boot.

Expected results:
Service runs successfully.

Additional info:
This rule is manually added.

Comment 1 Thomas Haller 2021-08-06 09:15:21 UTC
when reporting a bug against NetworkManager, it's almost always necessary to provide full level=TRACE logs.

See https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/blob/main/contrib/fedora/rpm/NetworkManager.conf#L27 for how to enable trace logging.

As this also (also) NetworkManager-dispatcher.service, don't do `journalctl -u NetworkManager`, but please provide the full journal (that is a good idea in general to upfront collect all the logs that might be necessary).

Comment 3 Thomas Haller 2021-08-09 11:15:02 UTC
we are talking about eth2, or which one?

what gives `udevadm info -q property -x /sys/class/net/eth2` on that machine?



Also,

> 1. Boot up a RHEL-9 VM on Azure with a SR-IOV card
> 2. Add a rule: /etc/udev/rules.d/68-azure-sriov-nm-unmanaged.rules (content is in the description)
> 3. systemctl status NetworkManager-wait-online

A rule only matters for *new* interfaces.
So if you boot a machine, all interfaces get detected. If you then drop a udev rule, that doesn't take effect (until you delete and recreate the interface, for example by `rmmod && modprobe`).

From the description it's not clear to me when you added the rule. Does it work after reboot?

Comment 4 Yuxin Sun 2021-08-09 15:09:14 UTC
# udevadm info -q property -x /sys/class/net/eth2
DEVPATH='/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A03:00/device:07/VMBUS:01/f674117a-9425-4330-a45c-122dd3d2040e/pci9425:00/9425:00:02.0/net/eth2'
INTERFACE='eth2'
IFINDEX='4'
SUBSYSTEM='net'
ID_RENAMING='1'

When we enable SR-IOV in an NIC on Azure, it shows 2 NICs inside the VM like this, and we need to ignore the slave one(eth2 in this VM):
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 00:22:48:22:67:97 brd ff:ff:ff:ff:ff:ff
    inet 10.0.1.4/24 brd 10.0.1.255 scope global noprefixroute eth1
       valid_lft forever preferred_lft forever
    inet6 fe80::222:48ff:fe22:6797/64 scope link 
       valid_lft forever preferred_lft forever
4: eth2: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master eth1 state UP group default qlen 1000
    link/ether 00:22:48:22:67:97 brd ff:ff:ff:ff:ff:ff
    altname enP37925p0s2
    altname enP37925s2
    inet6 fe80::7a1a:a127:3e21:bc33/64 scope link tentative noprefixroute 
       valid_lft forever preferred_lft forever

If add the rule and reboot the VM, this issue also exists.

Comment 5 Thomas Haller 2021-08-09 15:25:12 UTC
(In reply to Yuxin Sun from comment #4)
> # udevadm info -q property -x /sys/class/net/eth2
> DEVPATH='/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A03:00/device:07/VMBUS:01/
> f674117a-9425-4330-a45c-122dd3d2040e/pci9425:00/9425:00:02.0/net/eth2'
> INTERFACE='eth2'
> IFINDEX='4'
> SUBSYSTEM='net'
> ID_RENAMING='1'

It seem the udev rule didn't properly work, would you agree?

/etc/udev/rules.d/68-azure-sriov-nm-unmanaged.rules is not part of NetworkManager package.

Comment 6 Thomas Haller 2021-08-09 15:32:09 UTC
ID_NET_DRIVER= is also missing.

In the log, we see:

Aug 09 02:55:34 wala90719sriov08050847-vm1 systemd-udevd[675]: eth2: Failed to rename network interface 4 from 'eth2' to 'eth1': File exists
Aug 09 02:55:34 wala90719sriov08050847-vm1 systemd-udevd[675]: eth2: Failed to process device, ignoring: File exists

Comment 7 Yuxin Sun 2021-08-09 15:41:48 UTC
(In reply to Thomas Haller from comment #6)
> ID_NET_DRIVER= is also missing.
> 
> In the log, we see:
> 
> Aug 09 02:55:34 wala90719sriov08050847-vm1 systemd-udevd[675]: eth2: Failed
> to rename network interface 4 from 'eth2' to 'eth1': File exists
> Aug 09 02:55:34 wala90719sriov08050847-vm1 systemd-udevd[675]: eth2: Failed
> to process device, ignoring: File exists

About this message, it looks like this issue: BZ#1962421

Comment 8 Yuxin Sun 2021-08-09 15:44:17 UTC
(In reply to Thomas Haller from comment #5)
> (In reply to Yuxin Sun from comment #4)
> > # udevadm info -q property -x /sys/class/net/eth2
> > DEVPATH='/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A03:00/device:07/VMBUS:01/
> > f674117a-9425-4330-a45c-122dd3d2040e/pci9425:00/9425:00:02.0/net/eth2'
> > INTERFACE='eth2'
> > IFINDEX='4'
> > SUBSYSTEM='net'
> > ID_RENAMING='1'
> 
> It seem the udev rule didn't properly work, would you agree?
> 
> /etc/udev/rules.d/68-azure-sriov-nm-unmanaged.rules is not part of
> NetworkManager package.

Yes. This rule is manually added into Azure image. And it works in RHEL-8.5. I'm not sure if I need to change anything when I move it into RHEL-9.

Comment 9 Yuxin Sun 2021-08-09 15:52:49 UTC
This is the udevadm info in RHEL-8.5:

# udevadm info -q property -x /sys/class/net/eth2
DEVPATH='/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A03:00/device:07/VMBUS:01/a608826c-384d-4360-8457-8cbfca92b013/pci384d:00/384d:00:02.0/net/eth2'
ID_BUS='pci'
ID_MODEL_FROM_DATABASE='MT27710 Family [ConnectX-4 Lx Virtual Function]'
ID_MODEL_ID='0x1016'
ID_NET_DRIVER='mlx5_core'
ID_NET_LINK_FILE='/usr/lib/systemd/network/99-default.link'
ID_NET_NAME_MAC='enx000d3a9ebc96'
ID_NET_NAME_PATH='enP14413p0s2'
ID_NET_NAME_SLOT='enP14413s2'
ID_NET_NAMING_SCHEME='rhel-8.0'
ID_OUI_FROM_DATABASE='Microsoft Corp.'
ID_PATH='acpi-VMBUS:01-pci-384d:00:02.0'
ID_PATH_TAG='acpi-VMBUS_01-pci-384d_00_02_0'
ID_PCI_CLASS_FROM_DATABASE='Network controller'
ID_PCI_SUBCLASS_FROM_DATABASE='Ethernet controller'
ID_VENDOR_FROM_DATABASE='Mellanox Technologies'
ID_VENDOR_ID='0x15b3'
IFINDEX='4'
INTERFACE='eth2'
SUBSYSTEM='net'
SYSTEMD_ALIAS='/sys/subsystem/net/devices/eth1'
TAGS=':systemd:'
USEC_INITIALIZED='8694981'

Comment 10 Thomas Haller 2021-08-09 15:54:32 UTC
reassigning to udev/systemd for investigation.

Comment 11 David Tardon 2021-08-18 08:00:14 UTC
My first guess is that the device driver sends "bind"/"unbind" uevents now. Please replace ACTION=="add" by ACTION!="remove" in the rule and try again.

Comment 13 Yuxin Sun 2022-01-11 09:48:09 UTC
Hi Michael,

This issue still exists in RHEL-9, systemd-249-9.el9.x86_64, NetworkManager-1.36.0-0.1.el9.x86_64. This issue causes the VM reboot time over 1min:

[root@wala9sriov12010304-vm1 ~]# systemd-analyze
Startup finished in 1.773s (kernel) + 4.222s (initrd) + 1min 5.452s (userspace) = 1min 11.449s 
multi-user.target reached after 1min 4.322s in userspace
[root@wala9sriov12010304-vm1 ~]# systemd-analyze blame
1min 83ms NetworkManager-wait-online.service

Could you please help to have a look again? Thanks!


Br,
Yuxin Sun

Comment 20 Yu Watanabe 2022-02-24 16:51:20 UTC
> Aug 09 02:55:34 wala90719sriov08050847-vm1 systemd-udevd[675]: eth2: Failed
> to rename network interface 4 from 'eth2' to 'eth1': File exists

Renaming ethX to ethY (or any names used by kernel, e.g. wlanZ) is not supported, at least on systemd upstream. As we cannot rename the interface without causing races.

Comment 21 David Tardon 2022-02-25 08:59:24 UTC
(In reply to Yu Watanabe from comment #20)
> > Aug 09 02:55:34 wala90719sriov08050847-vm1 systemd-udevd[675]: eth2: Failed
> > to rename network interface 4 from 'eth2' to 'eth1': File exists
> 
> Renaming ethX to ethY (or any names used by kernel, e.g. wlanZ) is not
> supported, at least on systemd upstream. As we cannot rename the interface
> without causing races.

Yes, but that was a consequence, not the cause, wasn't it? That renaming was attempted only because the interface didn't have any ID_NET_NAME_ properties (comment 4):

> # udevadm info -q property -x /sys/class/net/eth2
> DEVPATH='/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A03:00/device:07/VMBUS:01/f674117a-9425-4330-a45c-122dd3d2040e/pci9425:00/9425:00:02.0/net/eth2'
> INTERFACE='eth2'
> IFINDEX='4'
> SUBSYSTEM='net'
> ID_RENAMING='1'

Comment 24 Michal Sekletar 2022-08-01 17:06:42 UTC
Yuxin, are you trying to rename network interfaces using eth* names? 

It seems so from udevadm info output that you have provided in comment https://bugzilla.redhat.com/show_bug.cgi?id=1990786#c19. If that is the case then we need to work on that part of the test first because this is not supported. Btw, if you want to have predictable and simple interface names which don't use "eth" prefix feel free to use prefixdevname (e.g. put "net.ifnames.prefix=net" on kernel command line to get all interfaces named like net0, net1, ...). 

https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/configuring_and_managing_networking/consistent-network-interface-device-naming_configuring-and-managing-networking#proc_customizing-the-prefix-of-ethernet-interfacesconsistent-network-interface-device-naming

Comment 29 RHEL Program Management 2023-02-06 07:27:48 UTC
After evaluating this issue, there are no plans to address it further or fix it in an upcoming release.  Therefore, it is being closed.  If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.

Comment 30 Dexuan Cui 2023-05-15 16:44:33 UTC
I'm re-opening the bug with more technical details.

Comment 32 Yu Watanabe 2023-05-15 20:34:03 UTC
This is caused by wrong NAME= assignment by 60-net.rules.

From the log https://bugzilla.redhat.com/show_bug.cgi?id=1990786#c15,
> eth1: /usr/lib/udev/rules.d/60-net.rules:1 NAME 'eth0'
You already see the warning about that (though, interface names here are different):
> systemd-udevd[675]: eth2: Failed to rename network interface 4 from 'eth2' to 'eth1': File exists
> systemd-udevd[675]: eth2: Failed to process device, ignoring: File exists

So, I guess that the VM is booted with spurious kernel command line e.g. ifname=, or has misconfigured ifcfg file.
Please provide the kernel command line /proc/cmdline, and all relevant ifcfg files.

Note, the upstream commit https://github.com/systemd/systemd/commit/210033847c340c43dd6835520f21f8b23ba29579 makes udevd handle misconfiguration more gracefully.

Comment 33 Dexuan Cui 2023-05-15 23:34:17 UTC
(Looks like comment 15 is hidden from me)

@Yu Watanabe: I think your analysis is correct.

An RHEL 9 VM running on legacy Azure clusters always has a delay of about 1.5 minutes every time it boots:
[  *** ] A start job is running for Network …nager Wait Online (28s / no limit)
(It looks like NetworkManager (NM) is trying to get an IP address for the Mellanox VF interface eth1, and this always times out)
[root@decui-RHEL9-latest-0424-2023 ~]# nmcli device
DEVICE  TYPE      STATE                                  CONNECTION         
eth0    ethernet  connected                              System eth0        
eth1    ethernet  connecting (getting IP configuration)  Wired connection 1 
lo      loopback  unmanaged  
(If I wait for several minutes, the STATE of eth1 will become "disconnected")

Note: "udevadm info /sys/class/net/eth1" is missing some properties for eth1:
[root@decui-RHEL9-latest-0424-2023 ~]# udevadm info /sys/class/net/eth1
P: /devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0004:00/VMBUS:00/55484382-a12d-4162-a3dd-e50a69ddbc9a/pcia12d:00/a12d:00:02.0/net/eth1
L: 0
E: DEVPATH=/devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0004:00/VMBUS:00/55484382-a12d-4162-a3dd-e50a69ddbc9a/pcia12d:00/a12d:00:02.0/net/eth1
E: INTERFACE=eth1
E: IFINDEX=3
E: SUBSYSTEM=net
E: ID_RENAMING=1

I do see the NIC renaming failure:
May 15 23:03:52 decui-RHEL9-latest-0424-2023 systemd[1]: Starting Network Manager...
May 15 23:03:52 decui-RHEL9-latest-0424-2023 systemd-udevd[689]: Using default interface naming scheme 'rhel-9.1'.
May 15 23:03:52 decui-RHEL9-latest-0424-2023 systemd-udevd[689]: eth1: Failed to rename network interface 3 from 'eth1' to 'eth0': File exists
May 15 23:03:52 decui-RHEL9-latest-0424-2023 systemd-udevd[689]: eth1: Failed to process device, ignoring: File exists


I do have a kernel parameter "net.ifnames=0":
[root@decui-RHEL9-latest-0424-2023 ~]# cat /proc/cmdline
BOOT_IMAGE=(hd0,gpt2)/vmlinuz-5.14.0-162.22.2.el9_1.x86_64 root=/dev/mapper/rootvg-rootlv ro loglevel=3 crashkernel=1G-4G:192M,4G-64G:256M,64G-:512M console=tty1 console=ttyS0,115200n8 earlyprintk=ttyS0,115200 earlyprintk=ttyS0 net.ifnames=0

but removing this param makes no difference, i.e. I still see the 1.5-minute delay, though the VF name changes from eth1 to enP41261s1:
[root@decui-RHEL9-latest-0424-2023 ~]# cat /proc/cmdline 
BOOT_IMAGE=(hd0,gpt2)/vmlinuz-5.14.0-162.22.2.el9_1.x86_64 root=/dev/mapper/rootvg-rootlv ro loglevel=3 crashkernel=1G-4G:192M,4G-64G:256M,64G-:512M console=tty1 console=ttyS0,115200n8 earlyprintk=ttyS0,115200 earlyprintk=ttyS0
[root@decui-RHEL9-latest-0424-2023 ~]# nmcli device
DEVICE      TYPE      STATE                                  CONNECTION         
eth0        ethernet  connected                              System eth0        
enP41261s1  ethernet  connecting (getting IP configuration)  Wired connection 1 
lo          loopback  unmanaged                              --                 
[root@decui-RHEL9-latest-0424-2023 ~]# udevadm info /sys/class/net/enP41261s1 
P: /devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0004:00/VMBUS:00/55484382-a12d-4162-a3dd-e50a69ddbc9a/pcia12d:00/a12d:00:02.0/net/enP41261s1
L: 0
E: DEVPATH=/devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0004:00/VMBUS:00/55484382-a12d-4162-a3dd-e50a69ddbc9a/pcia12d:00/a12d:00:02.0/net/enP41261s1
E: INTERFACE=enP41261s1
E: IFINDEX=3
E: SUBSYSTEM=net
E: ID_RENAMING=1



If I run "mv /usr/lib/udev/rules.d/60-net.rules ~/ -iv; reboot", I no longer see the 1.5-minute delay and the VM boots up quickly:
[root@decui-RHEL9-latest-0424-2023 ~]# nmcli device
DEVICE  TYPE      STATE      CONNECTION  
eth0    ethernet  connected  System eth0 
eth1    ethernet  unmanaged  --          
lo      loopback  unmanaged  --  

[root@decui-RHEL9-latest-0424-2023 ~]# udevadm info /sys/class/net/eth1
P: /devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0004:00/VMBUS:00/55484382-a12d-4162-a3dd-e50a69ddbc9a/pcia12d:00/a12d:00:02.0/net/eth1
L: 0
E: DEVPATH=/devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0004:00/VMBUS:00/55484382-a12d-4162-a3dd-e50a69ddbc9a/pcia12d:00/a12d:00:02.0/net/eth1
E: INTERFACE=eth1
E: IFINDEX=3
E: SUBSYSTEM=net
E: USEC_INITIALIZED=4596455
E: NM_UNMANAGED=1
E: ID_NET_NAMING_SCHEME=rhel-9.1
E: ID_NET_NAME_MAC=enx000d3a557c47
E: ID_OUI_FROM_DATABASE=Microsoft Corp.
E: ID_NET_NAME_PATH=enP41261p0s2
E: ID_NET_NAME_SLOT=enP41261s1
E: ID_BUS=pci
E: ID_VENDOR_ID=0x15b3
E: ID_MODEL_ID=0x1016
E: ID_PCI_CLASS_FROM_DATABASE=Network controller
E: ID_PCI_SUBCLASS_FROM_DATABASE=Ethernet controller
E: ID_VENDOR_FROM_DATABASE=Mellanox Technologies
E: ID_MODEL_FROM_DATABASE=MT27710 Family [ConnectX-4 Lx Virtual Function]
E: ID_PATH=acpi-VMBUS:00-pci-a12d:00:02.0
E: ID_PATH_TAG=acpi-VMBUS_00-pci-a12d_00_02_0
E: ID_NET_DRIVER=mlx5_core
E: ID_NET_LINK_FILE=/usr/lib/systemd/network/99-default.link
E: ID_NET_NAME=eth1
E: SYSTEMD_ALIAS=/sys/subsystem/net/devices/eth1
E: TAGS=:systemd:
E: CURRENT_TAGS=:systemd:

Comment 34 Yu Watanabe 2023-05-16 00:01:16 UTC
> I do have a kernel parameter "net.ifnames=0"

That's not relevant here. net.ifnames= is for controlling network interface renaming through .link file.

In this case, the interface renaming is requested by 60-net.rules. The rule provides new interface name based on ifname= kernel command line option and ifcfg- files, IIRC.

Anyway, this sounds not a systemd issue, but an issue in initscripts package or user's misconfiguration.
Please check /etc/sysconfig/network-scripts/ifcfg-ethX files.

Comment 35 Dexuan Cui 2023-05-16 00:16:24 UTC
[root@decui-RHEL9-latest-0424-2023 ~]# nmcli device
DEVICE  TYPE      STATE                                  CONNECTION
eth0    ethernet  connected                              System eth0
eth1    ethernet  connecting (getting IP configuration)  Wired connection 1
lo      loopback  unmanaged                              --

[root@decui-RHEL9-latest-0424-2023 ~]# ifconfig -a
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 172.23.0.4  netmask 255.255.255.0  broadcast 172.23.0.255
        inet6 fe80::20d:3aff:fe55:7c47  prefixlen 64  scopeid 0x20<link>
        ether 00:0d:3a:55:7c:47  txqueuelen 1000  (Ethernet)
        RX packets 612  bytes 327467 (319.7 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 566  bytes 208962 (204.0 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

eth1: flags=6211<UP,BROADCAST,RUNNING,SLAVE,MULTICAST>  mtu 1500
        inet6 fe80::bc1b:855:49d0:3568  prefixlen 64  scopeid 0x20<link>
        ether 00:0d:3a:55:7c:47  txqueuelen 1000  (Ethernet)
        RX packets 88  bytes 20726 (20.2 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 663  bytes 219654 (214.5 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

[root@decui-RHEL9-latest-0424-2023 ~]# find /etc | grep eth
/etc/ethertypes
/etc/sysconfig/network-scripts/ifcfg-eth0
/etc/modprobe.d/l2tp_eth-blacklist.conf
/etc/NetworkManager/system-connections/eth0.nmconnection

[root@decui-RHEL9-latest-0424-2023 ~]# cat /etc/sysconfig/network-scripts/ifcfg-eth0
# Created by cloud-init on instance boot automatically, do not edit.
#
AUTOCONNECT_PRIORITY=999
BOOTPROTO=dhcp
DEVICE=eth0
HWADDR=00:0d:3a:55:7c:47
ONBOOT=yes
TYPE=Ethernet
USERCTL=no

Comment 36 Dexuan Cui 2023-05-16 00:19:02 UTC
Hi Yu, thanks for quick reply! So the issue is that /etc/udev/rules.d/68-azure-sriov-nm-unmanaged.rules is skipped for the VF interface because /usr/lib/udev/rules.d/60-net.rules fails for the VF interface.

This is the 60-net.rules:
[root@decui-RHEL9-latest-0424-2023 ~]# cat /usr/lib/udev/rules.d/60-net.rules
ACTION=="add", SUBSYSTEM=="net", DRIVERS=="?*", ATTR{type}=="1", PROGRAM="/lib/udev/rename_device", RESULT=="?*", NAME="$result"

Why would 60-net.rules try to rename the VF interface from eth1 (or enP41261s1) to eth0?

How can we get this bug fixed? Can we just remove /usr/lib/udev/rules.d/60-net.rules?

When an RHEL 9 VM runs on newer Azure clusters, due to this bug, NetworkManager gets the same IP address for both the Hyper-V synthetic interface eth0 and the Mellanox VF interface eth1; due to the timing, if eth1 (the VF interface) is in the default routing entry, the VM's network is broken, i.e. the user is unable to SSH to the VM... This is affecting *all* RHEL 9 users on newer Azure clusters, so hopefully we can get this fixed ASAP. 

Please let me know if any more logs are needed.

Comment 37 Yu Watanabe 2023-05-16 00:31:50 UTC
> [root@decui-RHEL9-latest-0424-2023 ~]# ifconfig -a
> eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
>         ether 00:0d:3a:55:7c:47  txqueuelen 1000  (Ethernet)
> 
> eth1: flags=6211<UP,BROADCAST,RUNNING,SLAVE,MULTICAST>  mtu 1500
>         ether 00:0d:3a:55:7c:47  txqueuelen 1000  (Ethernet)

Hey! MAC addresses of eth0 and eth1 are the same!!!
Thus, ifcfg-eth0 is applied to both eth0 and eth1, and 60-net.rules try to assign eth0 to eth1...
Please check your VM config, and make not the Ethernet devices have the same MAC address.

Comment 38 Yu Watanabe 2023-05-16 00:34:27 UTC
https://bugzilla.redhat.com/show_bug.cgi?id=1990786#c4

This case has the same issue...

Comment 39 Dexuan Cui 2023-05-16 01:00:03 UTC
(In reply to Yu Watanabe from comment #37)
> > [root@decui-RHEL9-latest-0424-2023 ~]# ifconfig -a
> > eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
> >         ether 00:0d:3a:55:7c:47  txqueuelen 1000  (Ethernet)
> > 
> > eth1: flags=6211<UP,BROADCAST,RUNNING,SLAVE,MULTICAST>  mtu 1500
> >         ether 00:0d:3a:55:7c:47  txqueuelen 1000  (Ethernet)
> 
> Hey! MAC addresses of eth0 and eth1 are the same!!!
Yes. This is How Accelerated Networking works in Linux and FreeBSD VMs on Azure:
"The synthetic and VF interfaces both have the same MAC address" (please refer to 
https://learn.microsoft.com/en-us/azure/virtual-network/accelerated-networking-how-it-works )

> Thus, ifcfg-eth0 is applied to both eth0 and eth1, and 60-net.rules try to
> assign eth0 to eth1...
I see. How does RHEL 8 handle this issue? We know RHEL 8 doesn't have this bug.

> Please check your VM config, and make not the Ethernet devices have the same
> MAC address.
Unluckily I think we have to handle this implementation of NIC SR-IOV on Azure/Hyper-V, which will continue to exist for many years...
For RHEL 8, we devised /etc/udev/rules.d/68-azure-sriov-nm-unmanaged.rules to handle this, but the same udev file doesn't work in RHEL 9 -- I think we need to figure out a solution for RHEL 9. Is the rule 60-net.rules really needed? I guess RHEL 8 doesn't have this rule? BTW, my Ubuntu 20.0 VM doesn't have this rule.

BTW, do we really need the kernel parameter "net.ifnames=0" in an RHEL 9 VM on Azure? If we don't need /usr/lib/udev/rules.d/60-net.rules, it looks like we also don't need the kernel parameter "net.ifnames=0" (I referred to https://access.redhat.com/solutions/2435891)

Comment 40 Yu Watanabe 2023-05-16 01:43:33 UTC
(In reply to Dexuan Cui from comment #39)
> (In reply to Yu Watanabe from comment #37)
> > > [root@decui-RHEL9-latest-0424-2023 ~]# ifconfig -a
> > > eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
> > >         ether 00:0d:3a:55:7c:47  txqueuelen 1000  (Ethernet)
> > > 
> > > eth1: flags=6211<UP,BROADCAST,RUNNING,SLAVE,MULTICAST>  mtu 1500
> > >         ether 00:0d:3a:55:7c:47  txqueuelen 1000  (Ethernet)
> > 
> > Hey! MAC addresses of eth0 and eth1 are the same!!!
> Yes. This is How Accelerated Networking works in Linux and FreeBSD VMs on
> Azure:
> "The synthetic and VF interfaces both have the same MAC address" (please
> refer to 
> https://learn.microsoft.com/en-us/azure/virtual-network/accelerated-
> networking-how-it-works )

Interesting.

> > Thus, ifcfg-eth0 is applied to both eth0 and eth1, and 60-net.rules try to
> > assign eth0 to eth1...
> I see. How does RHEL 8 handle this issue? We know RHEL 8 doesn't have this
> bug.
> 
> > Please check your VM config, and make not the Ethernet devices have the same
> > MAC address.
> Unluckily I think we have to handle this implementation of NIC SR-IOV on
> Azure/Hyper-V, which will continue to exist for many years...
> For RHEL 8, we devised /etc/udev/rules.d/68-azure-sriov-nm-unmanaged.rules
> to handle this, but the same udev file doesn't work in RHEL 9 -- I think we
> need to figure out a solution for RHEL 9. Is the rule 60-net.rules really
> needed? I guess RHEL 8 doesn't have this rule? BTW, my Ubuntu 20.0 VM
> doesn't have this rule.

I am not familiar with Azure/Hyper-V nor cloud-init. I have no idea why ifcfg-eth0 is created like this.
What I can say here is that the issue is caused by the ifcfg file and 60-net.rules, and a workaround for the issue is removing ifcfg-eth0 or 60-net.rules.

The background story why 68-azure-sriov-nm-unmanaged.rules is not applied to the interface is the following:
1. 60-net.rules assigns unexpected and conflicted interface name to NAME=,
2. 68-azure-sriov-nm-unmanaged.rules queues to assign the property for NetworkManager,
3. systemd-udevd tries to rename eth1 -> eth0, and of course that's failed,
4. because of the failure in renaming, entire process of uevent for eth1 failed, and queued NM_UNMANAGED property assignment is discarded,
5. hence, NetworkManager tries to manage eth1, and get stuck.

> BTW, do we really need the kernel parameter "net.ifnames=0" in an RHEL 9 VM
> on Azure? If we don't need /usr/lib/udev/rules.d/60-net.rules, it looks like
> we also don't need the kernel parameter "net.ifnames=0" (I referred to
> https://access.redhat.com/solutions/2435891)

As I said in the above, net.ifnames=0 means disabling network interface renaming based on .link file.
As you can see in the result of 'udevadm info', 99-default.link is assigned to the interface.
The file 99-default.link contains renaming related settings. So, if you do not want to the network interfaces to be renamed, still you need to set net.ifnames=0.

Note, there are at least two infra of renaming network interfaces; one is done by 60-net.rules and ifcfg- files, another one is done by 80-net-setup-link.rules and .link files. The latter can be disabled with net.ifnames=0.

=================
Michal, please assign this to cloud-init or initscript. This is not an issue in systemd.

Comment 41 Yuxin Sun 2023-05-16 10:20:14 UTC
Hi Dexuan and Yu,

I did a simpley try in the RHEL-9.0 image in the Azure Marketplace(VM has 1 NIC with accelerated-networking true):
1. With both 60-net.rules and net.ifnames=0: 1min 18.636s
2. Only remove 60-net.rules: 22.272s
3. Only remove net.ifnames=0: 1min 18.636s
4. Remove both 60-net.rules and net.ifnames=0: 1min 18.802s

So it looks like that only remove 60-net.rules can resolve this issue. I'll do more test tomorrow to check this result. Thanks!

Comment 42 Dexuan Cui 2023-05-16 17:51:57 UTC
(In reply to Yuxin Sun from comment #41)
> ...
> 4. Remove both 60-net.rules and net.ifnames=0: 1min 18.802s
Can you please double check this? The time for this configuration is only 24.678s for me, not > 1 min.

[root@decui-RHEL9-latest-0424-2023 ~]# ll  /usr/lib/udev/rules.d/60-net.rules  
ls: cannot access '/usr/lib/udev/rules.d/60-net.rules': No such file or directory

[root@decui-RHEL9-latest-0424-2023 ~]# cat /proc/cmdline 
BOOT_IMAGE=(hd0,gpt2)/vmlinuz-5.14.0-162.22.2.el9_1.x86_64 root=/dev/mapper/rootvg-rootlv ro loglevel=3 crashkernel=1G-4G:192M,4G-64G:256M,64G-:512M console=tty1 console=ttyS0,115200n8 earlyprintk=ttyS0,115200 earlyprintk=ttyS0

[root@decui-RHEL9-latest-0424-2023 ~]# nmcli device 
DEVICE      TYPE      STATE      CONNECTION  
eth0        ethernet  connected  System eth0 
enP41261s1  ethernet  unmanaged  --          
lo          loopback  unmanaged  --          

[root@decui-RHEL9-latest-0424-2023 ~]# systemd-analyze
Startup finished in 436ms (firmware) + 13.024s (loader) + 858ms (kernel) + 3.028s (initrd) + 7.330s (userspace) = 24.678s 
multi-user.target reached after 4.742s in userspace

Comment 43 Yuxin Sun 2023-05-17 06:53:33 UTC
(In reply to Dexuan Cui from comment #42)

Hi Dexuan,

Hmmm so wired...I use another image(RedHat:RHEL:9_1:9.1.2023041316) and get the same result:

[root@wala91ondsriov05160619-vm1 ~]# systemd-analyze
Startup finished in 958ms (kernel) + 5.696s (initrd) + 1min 10.278s (userspace) = 1min 16.933s 
multi-user.target reached after 1min 6.283s in userspace
[root@wala91ondsriov05160619-vm1 ~]# systemd-analyze blame
1min 70ms NetworkManager-wait-online.service

[root@wala91ondsriov05160619-vm1 ~]# cat /proc/cmdline 
BOOT_IMAGE=(hd0,gpt2)/vmlinuz-5.14.0-162.18.1.el9_1.x86_64 root=/dev/mapper/rootvg-rootlv ro loglevel=3 crashkernel=1G-4G:192M,4G-64G:256M,64G-:512M console=tty1 console=ttyS0,115200n8 earlyprintk=ttyS0,115200

[root@wala91ondsriov05160619-vm1 ~]# ll /usr/lib/udev/rules.d/60-net.rules 
ls: cannot access '/usr/lib/udev/rules.d/60-net.rules': No such file or directory

[root@wala91ondsriov05160619-vm1 ~]# ll /etc/udev/rules.d/
total 4
-rw-r--r--. 1 root root 279 Mar 23 03:45 68-azure-sriov-nm-unmanaged.rules

[root@wala91ondsriov05160619-vm1 ~]# nmcli device
DEVICE    TYPE      STATE                                  CONNECTION         
eth0      ethernet  connected                              System eth0        
enP805s1  ethernet  connecting (getting IP configuration)  Wired connection 1 
lo        loopback  unmanaged 

[root@wala91ondsriov05160619-vm1 ~]# cat /etc/sysconfig/network-scripts/ifcfg-eth0 
# Created by cloud-init on instance boot automatically, do not edit.
#
AUTOCONNECT_PRIORITY=999
BOOTPROTO=dhcp
DEVICE=eth0
HWADDR=00:0d:3a:1f:b8:6e
ONBOOT=yes
TYPE=Ethernet
USERCTL=no

Here are some error logs in the /var/log/messages:
May 17 03:37:28 wala91ondsriov05160619-vm1 NetworkManager[904]: <info>  [1684294648.0925] policy: auto-activating connection 'Wired connection 1' (bf94a5ab-d137-3364-9b37-5cb3b491233c)
May 17 03:37:28 wala91ondsriov05160619-vm1 NetworkManager[904]: <info>  [1684294648.0931] device (enP805s1): Activation: starting connection 'Wired connection 1' (bf94a5ab-d137-3364-9b37-5cb3b491233c)
May 17 03:37:28 wala91ondsriov05160619-vm1 NetworkManager[904]: <info>  [1684294648.0933] device (enP805s1): state change: disconnected -> prepare (reason 'none', sys-iface-state: 'managed')
May 17 03:37:28 wala91ondsriov05160619-vm1 NetworkManager[904]: <info>  [1684294648.0938] device (enP805s1): state change: prepare -> config (reason 'none', sys-iface-state: 'managed')
May 17 03:37:28 wala91ondsriov05160619-vm1 NetworkManager[904]: <info>  [1684294648.1093] device (enP805s1): state change: config -> ip-config (reason 'none', sys-iface-state: 'managed')
May 17 03:37:28 wala91ondsriov05160619-vm1 NetworkManager[904]: <info>  [1684294648.1096] dhcp4 (enP805s1): activation: beginning transaction (timeout in 45 seconds)
May 17 03:37:28 wala91ondsriov05160619-vm1 NetworkManager[904]: <info>  [1684294648.1110] dhcp4 (enP805s1): dhclient started with pid 1831
May 17 03:37:28 wala91ondsriov05160619-vm1 dhclient[1831]: DHCPDISCOVER on enP805s1 to 255.255.255.255 port 67 interval 7 (xid=0x1f5e400e)
May 17 03:37:35 wala91ondsriov05160619-vm1 dhclient[1831]: DHCPDISCOVER on enP805s1 to 255.255.255.255 port 67 interval 8 (xid=0x1f5e400e)
May 17 03:37:43 wala91ondsriov05160619-vm1 dhclient[1831]: DHCPDISCOVER on enP805s1 to 255.255.255.255 port 67 interval 10 (xid=0x1f5e400e)
May 17 03:37:53 wala91ondsriov05160619-vm1 dhclient[1831]: DHCPDISCOVER on enP805s1 to 255.255.255.255 port 67 interval 8 (xid=0x1f5e400e)
May 17 03:38:01 wala91ondsriov05160619-vm1 dhclient[1831]: DHCPDISCOVER on enP805s1 to 255.255.255.255 port 67 interval 12 (xid=0x1f5e400e)
May 17 03:38:13 wala91ondsriov05160619-vm1 dhclient[1831]: DHCPDISCOVER on enP805s1 to 255.255.255.255 port 67 interval 16 (xid=0x1f5e400e)
May 17 03:38:28 wala91ondsriov05160619-vm1 NetworkManager[904]: <info>  [1684294708.1272] device (enP805s1): state change: ip-config -> failed (reason 'ip-config-unavailable', sys-iface-state: 'managed')
May 17 03:38:28 wala91ondsriov05160619-vm1 NetworkManager[904]: <warn>  [1684294708.1277] device (enP805s1): Activation: failed for connection 'Wired connection 1'
May 17 03:38:28 wala91ondsriov05160619-vm1 NetworkManager[904]: <info>  [1684294708.1279] device (enP805s1): state change: failed -> disconnected (reason 'none', sys-iface-state: 'managed')
May 17 03:38:28 wala91ondsriov05160619-vm1 NetworkManager[904]: <info>  [1684294708.1361] dhcp4 (enP805s1): canceled DHCP transaction, DHCP client pid 1831
May 17 03:38:28 wala91ondsriov05160619-vm1 NetworkManager[904]: <info>  [1684294708.1361] dhcp4 (enP805s1): activation: beginning transaction (timeout in 45 seconds)
May 17 03:38:28 wala91ondsriov05160619-vm1 NetworkManager[904]: <info>  [1684294708.1361] dhcp4 (enP805s1): state changed no lease

Comment 44 Dexuan Cui 2023-05-17 16:10:15 UTC
(In reply to Yuxin Sun from comment #43)
> (In reply to Dexuan Cui from comment #42)
> 
> Hi Dexuan,
> 
> Hmmm so wired...I use another image(RedHat:RHEL:9_1:9.1.2023041316) and get
> the same result:
> 
> [root@wala91ondsriov05160619-vm1 ~]# systemd-analyze
> Startup finished in 958ms (kernel) + 5.696s (initrd) + 1min 10.278s
> (userspace) = 1min 16.933s 
> multi-user.target reached after 1min 6.283s in userspace
> [root@wala91ondsriov05160619-vm1 ~]# systemd-analyze blame
> 1min 70ms NetworkManager-wait-online.service

So you indeed still see the 1-min delay of NetworkManager that tries to get an IP for the VF interface after you "Remove both 60-net.rules and net.ifnames=0". This is strange as I don't see the delay after I "Remove both 60-net.rules and net.ifnames=0".
 
Can you run "udevadm info /sys/class/net/enP805s1"? I guess your NIC enP805s1 doesn't have this properity:
E: NM_UNMANAGED=1
If so, somehow else is preventing 68-azure-sriov-nm-unmanaged.rules from working (can you please check/share /var/log/messages for any erors), and I suppose Yu Watanabe should have some thoughts. 


@Yuxin: After you "Remove both 60-net.rules and net.ifnames=0", it would be great if you can update the initrd ("dracut -f; reboot") and see if you still see the 1-min delay.

Comment 45 Yuxin Sun 2023-05-18 10:01:04 UTC
Hi Dexuan,

Yes there's no "NM_UNMANAGED=1".
[root@wala91ondsriov05160619-vm1 ~]# udevadm info /sys/class/net/enP53815s1
P: /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A03:00/device:07/VMBUS:01/57ac3861-d237-4196-b649-846fdf87846a/pcid237:00/d237:00:02.0/net/enP53815s1
L: 0
E: DEVPATH=/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A03:00/device:07/VMBUS:01/57ac3861-d237-4196-b649-846fdf87846a/pcid237:00/d237:00:02.0/net/enP53815s1
E: INTERFACE=enP53815s1
E: IFINDEX=3
E: SUBSYSTEM=net
E: USEC_INITIALIZED=8438290
E: ID_NET_NAMING_SCHEME=rhel-9.1
E: ID_NET_NAME_MAC=enx000d3a1fb86e
E: ID_OUI_FROM_DATABASE=Microsoft Corp.
E: ID_NET_NAME_PATH=enP53815p0s2
E: ID_NET_NAME_SLOT=enP53815s1
E: ID_BUS=pci
E: ID_VENDOR_ID=0x15b3
E: ID_MODEL_ID=0x1016
E: ID_PCI_CLASS_FROM_DATABASE=Network controller
E: ID_PCI_SUBCLASS_FROM_DATABASE=Ethernet controller
E: ID_VENDOR_FROM_DATABASE=Mellanox Technologies
E: ID_MODEL_FROM_DATABASE=MT27710 Family [ConnectX-4 Lx Virtual Function]
E: ID_PATH=acpi-VMBUS:01-pci-d237:00:02.0
E: ID_PATH_TAG=acpi-VMBUS_01-pci-d237_00_02_0
E: ID_NET_DRIVER=mlx5_core
E: ID_NET_LINK_FILE=/usr/lib/systemd/network/99-default.link
E: ID_NET_NAME=enP53815s1
E: SYSTEMD_ALIAS=/sys/subsystem/net/devices/enP53815s1
E: TAGS=:systemd:
E: CURRENT_TAGS=:systemd:

And after updating initrd("dracut -f; reboot") the time becomes very short:
[root@wala91ondsriov05160619-vm1 ~]# systemd-analyze 
Startup finished in 951ms (kernel) + 5.704s (initrd) + 13.114s (userspace) = 19.770s 
multi-user.target reached after 7.294s in userspace

Thanks!

Comment 46 Dexuan Cui 2023-05-18 15:24:26 UTC
(In reply to Yuxin Sun from comment #45)
> ...
> 
> And after updating initrd("dracut -f; reboot") the time becomes very short:
Looks like somehow your 60-net.rules was included into the initrd, so you still saw the 1-min delay after you removed /usr/lib/udev/rules.d/60-net.rules. After you updated your initrd, the delay went away.

So, it looks like the workaround is to remove /usr/lib/udev/rules.d/60-net.rules and update initrd ("dracut -f; reboot")? This way, we won't see the "File exists" error and the 1-min delay will go away:
eth1: Failed to rename network interface 3 from 'eth1' to 'eth0': File exists

Ideally I'd like to also remove the kernel parameter "net.ifnames=0" so that we see the enPXXXXXsX-style VF name rather than eth1.

Comment 47 Dexuan Cui 2023-05-18 15:27:45 UTC
Can Red Hat please clarify:

1. Is /usr/lib/udev/rules.d/60-net.rules really needed in RHEL 9.x, especially in an RHEL 9.x VM running on Azure? 

2. Is the kernel parameter "net.ifnames=0" really needed for an RHEL 9.x VM on Azure?

3. Is it safe to remove the file /usr/lib/udev/rules.d/60-net.rules  and the kernel parameter "net.ifnames=0" in the RHEL 9.x Azure Marketplace images?

Comment 48 Yuxin Sun 2023-06-15 08:57:42 UTC
(In reply to Dexuan Cui from comment #46)
> (In reply to Yuxin Sun from comment #45)
> > ...
> > 
> > And after updating initrd("dracut -f; reboot") the time becomes very short:
> Looks like somehow your 60-net.rules was included into the initrd, so you
> still saw the 1-min delay after you removed
> /usr/lib/udev/rules.d/60-net.rules. After you updated your initrd, the delay
> went away.
> 
> So, it looks like the workaround is to remove
> /usr/lib/udev/rules.d/60-net.rules and update initrd ("dracut -f; reboot")?
> This way, we won't see the "File exists" error and the 1-min delay will go
> away:
> eth1: Failed to rename network interface 3 from 'eth1' to 'eth0': File exists
> 
> Ideally I'd like to also remove the kernel parameter "net.ifnames=0" so that
> we see the enPXXXXXsX-style VF name rather than eth1.

Hi Dexuan,

About net.ifnames=0, here is a KCS: https://access.redhat.com/solutions/2435891
It mentioned that it is NOT safe to set net.ifnames=0 in RHEL7, RHEL8 and RHEL9. Red Hat strongly recommend that the new RHEL7, RHEL8 and RHEL9 naming conventions are used.

It is safe to set net.ifnames=0 only under a few specific circumstances:
* If the alternate naming scheme, such as biosdevname, is enabled and able to identify the needed interface properties. biosdevname is enabled by default only on systems running RHEL7+ on Dell hardware. In all other cases it must be enabled by setting biosdevname=1 and ensuring the biosdevname package is installed. Non-Dell hardware may not provide the necessary information needed for biosdevname to work.
* If the system only has a single interface and will never have more than a single interface.
* KVM guests (libvirt only, not RHV or OpenStack) exclusively using virtio-net type interfaces can safely set net.ifnames=0.
* If the system is configured to use unique, non ethX style names using either udev rules or relying on the functionality of the udev properties in the /usr/lib/udev/rules.d/60-net.rules rule file.

Comment 49 RHEL Program Management 2023-09-21 11:09:13 UTC
Issue migration from Bugzilla to Jira is in process at this time. This will be the last message in Jira copied from the Bugzilla bug.

Comment 50 RHEL Program Management 2023-09-21 11:14:27 UTC
This BZ has been automatically migrated to the issues.redhat.com Red Hat Issue Tracker. All future work related to this report will be managed there.

Due to differences in account names between systems, some fields were not replicated.  Be sure to add yourself to Jira issue's "Watchers" field to continue receiving updates and add others to the "Need Info From" field to continue requesting information.

To find the migrated issue, look in the "Links" section for a direct link to the new issue location. The issue key will have an icon of 2 footprints next to it, and begin with "RHEL-" followed by an integer.  You can also find this issue by visiting https://issues.redhat.com/issues/?jql= and searching the "Bugzilla Bug" field for this BZ's number, e.g. a search like:

"Bugzilla Bug" = 1234567

In the event you have trouble locating or viewing this issue, you can file an issue by sending mail to rh-issues. You can also visit https://access.redhat.com/articles/7032570 for general account information.

Comment 51 Red Hat Bugzilla 2024-01-20 04:25:11 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days


Note You need to log in before you can comment on or make changes to this bug.