RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1720157 - [Azure] Add an udev rule to make multiple SR-IOV NICs both can get ip addresses
Summary: [Azure] Add an udev rule to make multiple SR-IOV NICs both can get ip addresses
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: NetworkManager
Version: 8.0
Hardware: x86_64
OS: Linux
high
high
Target Milestone: rc
: 8.0
Assignee: Rick Barry
QA Contact: Yuxin Sun
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-06-13 09:34 UTC by Yuhui Jiang
Modified: 2021-05-27 17:52 UTC (History)
19 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-07-10 07:28:30 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Yuhui Jiang 2019-06-13 09:34:18 UTC
This bug is used for tracking Azure image build. In RHEL8, Due to lack of an udev rule, multiple SRI-OV NICs can't get ip addresses, except eth0(bz#1661574). Below is the content of this rule

# cat /etc/udev/rules.d/68-azure-sriov-nm-unmanaged.rules
SUBSYSTEM=="net", DRIVERS=="hv_pci", ACTION=="add", ENV{NM_UNMANAGED}="1"

Comment 2 Rick Barry 2019-06-13 12:15:44 UTC
Hi Josh,

The Microsoft Azure build team needs to add a udev rule to their RHEL-8 image builds to
allow multiple SRIOV nics to obtain IP addresses. Only eth0 gets configured correctly at
the moment. This was confirmed by QE using the "RedHat:RHEL:8:8.0.2019050711" image.

Can you pass this along to the Azure build team?

Comment 3 Alfred Sin 2019-08-13 21:26:22 UTC
Hey Rick - a couple questions on this for you.

1. What is the exact udev rule we should add? 
2. Does this need to be added into RHEL 7 images as well?

Comment 4 Rick Barry 2019-08-14 19:52:22 UTC
(In reply to Alfred Sin from comment #3)
> Hey Rick - a couple questions on this for you.
> 
> 1. What is the exact udev rule we should add? 
> 2. Does this need to be added into RHEL 7 images as well?

Hi Alfred, Yuhui provided the udev rule in the description
(https://bugzilla.redhat.com/show_bug.cgi?id=1720157#c0).

Yuhui, does this apply to RHEL 7 as well?

Comment 5 Alfred Sin 2019-08-14 21:22:03 UTC
(In reply to Rick Barry from comment #4)
> (In reply to Alfred Sin from comment #3)
> > Hey Rick - a couple questions on this for you.
> > 
> > 1. What is the exact udev rule we should add? 
> > 2. Does this need to be added into RHEL 7 images as well?
> 
> Hi Alfred, Yuhui provided the udev rule in the description
> (https://bugzilla.redhat.com/show_bug.cgi?id=1720157#c0).
> 
> Yuhui, does this apply to RHEL 7 as well?

I see - my bad. I did a quick check in a RHEL 7 VM and thought I had checked in RHEL 8. I just double-checked in a RHEL 8.0 VM and the rule is indeed not in there.

We can add it in during our image build process.

Comment 6 Lubomir Rintel 2019-08-15 16:28:30 UTC
Hello.

Bug #1661574 got closed because of the lack of communication. Beniamino asked perfectly reasonable questions, but didn't get a response.

This is a bit concerning. The udev rule in question seems very obviously wrong -- why would a disablement of a particular interface make any other interface get an IP address? Upon a closer look from Beniamino it became apparent, that it's because there are multiple interfaces with same MAC address. NetworkManager attempts to generate a connection profile for devices that don't have a matching one and because in RHEL 8.0 the MAC address is used in the generated profile, one of the devices "wins" at random. This was changed in RHEL 8.1 for unrelated reasons. [1]

[1] https://gitlab.freedesktop.org/NetworkManager/NetworkManager/merge_requests/137/

Therefore, what happens in RHEL 8.1 is that *all* devices attempt to connect. I suppose that it's still not what you want, the ep* devices on hv_pci still don't get connected. In order to solve that we need to understand why. It's far from clear to us that completely disallowing network devices on hv_pci would be a good idea.

A side note: this didn't affect eth0 because you ship a configuration file /etc/sysconfig/network-scripts/ifcfg-eth0 that sticks to eth0 and thus the automatism doesn't kick in. Why would you do this? It's just inconsistent and unnecessary. Also, there's a /etc/sysconfig/network-scripts/ifcfg-ens3 file, but the azure installations don't even have ens3. Why?

Comment 7 Haiyang Zhang 2019-08-27 04:33:28 UTC
The following udev rule exists on Azure image of 7.x:
cat /etc/udev/rules.d/68-azure-sriov-nm-unmanaged.rules

# Accelerated Networking on Azure exposes a new SRIOV interface to the VM.
# This interface is transparently bonded to the synthetic interface,
# so NetworkManager should just ignore any SRIOV interfaces.
SUBSYSTEM=="net", DRIVERS=="hv_pci", ACTION=="add", ENV{NM_UNMANAGED}="1"

The same rule needs to be added to Azure image of Redhat 8:
With this udev rule -- We are not disabling the VF NIC (from hv_pci). On Hyper-V or Azure hosts, VF NICs have the same MAC as their matching synthetic NICs – by design. And a VF NIC is bonded to its matching synthetic NIC automatically as a slave NIC. So VF NICs don’t need IP address, and shouldn’t be managed by “Network manager”.

I was able to reproduce the issue with multi VF NICs on Redhat 8 -- only eth0 has IP. After adding that udev rule, the problem is solved.

Comment 8 Lubomir Rintel 2019-08-27 07:35:02 UTC
(In reply to Haiyang Zhang from comment #7)
> With this udev rule -- We are not disabling the VF NIC (from hv_pci). On
> Hyper-V or Azure hosts, VF NICs have the same MAC as their matching
> synthetic NICs – by design.

I'm just curious -- is the design documented anywhere?

> And a VF NIC is bonded to its matching synthetic
> NIC automatically as a slave NIC. So VF NICs don’t need IP address, and
> shouldn’t be managed by “Network manager”.

What I'm interested in is solving this in a way that's not going to need any Azure-specific secret sauce.

I'm wondering if just blacklisting all NICs on a Hyper-V PCI bus is not an overkill. If the rule was applied outside of Azure, I suspect it would affect things like PCI passthrough on Virtual PC.

Comment 9 Lubomir Rintel 2019-08-27 07:44:04 UTC
(In reply to Lubomir Rintel from comment #8)
> What I'm interested in is solving this in a way that's not going to need any
> Azure-specific secret sauce.

Perhaps not; there doesn't seem to be anything particularly specific to Azure in the sysfs attributes:

[root@az2 lkundrak]# udevadm info -a /sys/class/net/enP40072s1  

Udevadm info starts with the device specified by the devpath and then
walks up the chain of parent devices. It prints for every device
found, all possible attributes in the udev rules key format.
A rule to match, can be composed by the attributes of the device
and the attributes from one single parent device.

  looking at device '/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A03:00/device:07/VMBUS:01/291fe2b6-0a3f-4450-9c88-3b494f14be71/pci9c88:00/9c88:00:02.0/net/enP40072s1':
    KERNEL=="enP40072s1"
    SUBSYSTEM=="net"
    DRIVER==""
    ATTR{addr_assign_type}=="0"
    ATTR{addr_len}=="6"
    ATTR{address}=="00:0d:3a:55:2c:9e"
    ATTR{broadcast}=="ff:ff:ff:ff:ff:ff"
    ATTR{carrier}=="1"
    ATTR{carrier_changes}=="1"
    ATTR{carrier_down_count}=="0"
    ATTR{carrier_up_count}=="1"
    ATTR{dev_id}=="0x0"
    ATTR{dev_port}=="0"
    ATTR{dormant}=="0"
    ATTR{duplex}=="full"
    ATTR{flags}=="0x1803"
    ATTR{gro_flush_timeout}=="0"
    ATTR{ifalias}==""
    ATTR{ifindex}=="4"
    ATTR{iflink}=="4"
    ATTR{link_mode}=="0"
    ATTR{mtu}=="1500"
    ATTR{name_assign_type}=="4"
    ATTR{netdev_group}=="0"
    ATTR{operstate}=="up"
    ATTR{proto_down}=="0"
    ATTR{speed}=="40000"
    ATTR{tx_queue_len}=="1000"
    ATTR{type}=="1"

  looking at parent device '/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A03:00/device:07/VMBUS:01/291fe2b6-0a3f-4450-9c88-3b494f14be71/pci9c88:00/9c88:00:02.0':
    KERNELS=="9c88:00:02.0"
    SUBSYSTEMS=="pci"
    DRIVERS=="mlx4_core"
    ATTRS{ari_enabled}=="0"
    ATTRS{broken_parity_status}=="0"
    ATTRS{class}=="0x020000"
    ATTRS{consistent_dma_mask_bits}=="64"
    ATTRS{current_link_speed}=="Unknown speed"
    ATTRS{current_link_width}=="0"
    ATTRS{d3cold_allowed}=="1"
    ATTRS{device}=="0x1004"
    ATTRS{dma_mask_bits}=="64"
    ATTRS{driver_override}=="(null)"
    ATTRS{enable}=="1"
    ATTRS{irq}=="0"
    ATTRS{local_cpulist}=="0-1"
    ATTRS{local_cpus}=="00000000,00000000,00000000,00000003"
    ATTRS{max_link_speed}=="8 GT/s"
    ATTRS{max_link_width}=="8"
    ATTRS{mlx4_port1}=="eth"
    ATTRS{mlx4_port1_mtu}=="-1"
    ATTRS{msi_bus}=="1"
    ATTRS{numa_node}=="0"
    ATTRS{revision}=="0x00"
    ATTRS{subsystem_device}=="0x61b0"
    ATTRS{subsystem_vendor}=="0x15b3"
    ATTRS{vendor}=="0x15b3"

  looking at parent device '/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A03:00/device:07/VMBUS:01/291fe2b6-0a3f-4450-9c88-3b494f14be71/pci9c88:00':
    KERNELS=="pci9c88:00"
    SUBSYSTEMS==""
    DRIVERS==""

  looking at parent device '/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A03:00/device:07/VMBUS:01/291fe2b6-0a3f-4450-9c88-3b494f14be71':
    KERNELS=="291fe2b6-0a3f-4450-9c88-3b494f14be71"
    SUBSYSTEMS=="vmbus"
    DRIVERS=="hv_pci"
    ATTRS{channel_vp_mapping}=="20:0"
    ATTRS{class_id}=="{44c4f61d-4444-4400-9d52-802e27ede19f}"
    ATTRS{client_monitor_conn_id}=="0"
    ATTRS{client_monitor_latency}=="0"
    ATTRS{client_monitor_pending}=="1985940810"
    ATTRS{device}=="0x5"
    ATTRS{device_id}=="{291fe2b6-0a3f-4450-9c88-3b494f14be71}"
    ATTRS{driver_override}=="(null)"
    ATTRS{id}=="20"
    ATTRS{in_intr_mask}=="0"
    ATTRS{in_read_bytes_avail}=="0"
    ATTRS{in_read_index}=="1016"
    ATTRS{in_write_bytes_avail}=="12288"
    ATTRS{in_write_index}=="1016"
    ATTRS{monitor_id}=="255"
    ATTRS{out_intr_mask}=="0"
    ATTRS{out_read_bytes_avail}=="0"
    ATTRS{out_read_index}=="1136"
    ATTRS{out_write_bytes_avail}=="12288"
    ATTRS{out_write_index}=="1136"
    ATTRS{server_monitor_conn_id}=="0"
    ATTRS{server_monitor_latency}=="0"
    ATTRS{server_monitor_pending}=="1985940810"
    ATTRS{state}=="3"
    ATTRS{vendor}=="0x1414"

  looking at parent device '/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A03:00/device:07/VMBUS:01':
    KERNELS=="VMBUS:01"
    SUBSYSTEMS=="acpi"
    DRIVERS=="vmbus"
    ATTRS{hid}=="VMBUS"
    ATTRS{path}=="\_SB_.PCI0.SBRG.VMB8"
    ATTRS{power_state}=="D0"
    ATTRS{status}=="15"
    ATTRS{uid}=="0"

  looking at parent device '/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A03:00/device:07':
    KERNELS=="device:07"
    SUBSYSTEMS=="acpi"
    DRIVERS==""
    ATTRS{adr}=="0x00070000"
    ATTRS{path}=="\_SB_.PCI0.SBRG"

  looking at parent device '/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A03:00':
    KERNELS=="PNP0A03:00"
    SUBSYSTEMS=="acpi"
    DRIVERS==""
    ATTRS{adr}=="0x00000000"
    ATTRS{hid}=="PNP0A03"
    ATTRS{path}=="\_SB_.PCI0"
    ATTRS{uid}=="0"

  looking at parent device '/devices/LNXSYSTM:00/LNXSYBUS:00':
    KERNELS=="LNXSYBUS:00"
    SUBSYSTEMS=="acpi"
    DRIVERS==""
    ATTRS{hid}=="LNXSYBUS"
    ATTRS{path}=="\_SB_"

  looking at parent device '/devices/LNXSYSTM:00':
    KERNELS=="LNXSYSTM:00"
    SUBSYSTEMS=="acpi"
    DRIVERS==""
    ATTRS{hid}=="LNXSYSTM"
    ATTRS{path}=="\"

[root@az2 lkundrak]# 

I guess an Azure-specific tweak is indeed the way to go. The other udev rules seem to be shipped by the WALinuxAgent package. Let's see if we can get this one added the same way:

https://github.com/Azure/WALinuxAgent/pull/1622

Comment 10 Lubomir Rintel 2019-09-04 08:12:43 UTC
I'm wondering if anyone at Microsoft can get WALinuxAgent maintainers to review the pull requests in their queue?

Comment 11 Alfred Sin 2019-09-04 15:50:13 UTC
Let me bug them, email autoreplies inform me that there are a few of them on vacation so maybe that's why it's taking a little while...

Comment 12 Rick Barry 2019-09-11 17:52:50 UTC
We discussed this BZ at our monthly MSFT-RH call. It seems that there are internal Microsoft discussions about which method they prefer to resolve this (adding a udev rule or doing this dynamically as in Lubomir's upstream proposal.

Comment 13 Lubomir Rintel 2019-11-15 09:56:53 UTC
(In reply to Rick Barry from comment #12)
> We discussed this BZ at our monthly MSFT-RH call. It seems that there are
> internal Microsoft discussions about which method they prefer to resolve
> this (adding a udev rule or doing this dynamically as in Lubomir's upstream
> proposal.

What are you even talking about.

My upstream proposal also is to just add an udev rule. The only difference is that I'm adding in in the place that actually makes at least some sense.

I'm not sure what you discussed with Microsoft, but my repeated attempts to get them to respond in any useful manner about this via e-mail or GitHub have all failed. I am not happy about this, but I am unable to do anything about this at least until Microsoft seriously reconsiders their approach to collaboration. When that happens, please feel free to reopen the bug.

Comment 14 Rick Barry 2019-11-15 13:46:59 UTC
Hi Michael,

What was the final decision regarding how Microsoft was planning to resolve this bug? 

Lubomir Rintel submitted an upstream proposal a couple of months ago to add a udev rule in WALinuxAgent to resolve this, but apparently his pull requests were not reviewed/accepted. I don't have the details, but perhaps someone from the WALinuxAgent team can respond to Lubomir.

In any event, do you know if this issue has been resolved?

Comment 15 Lubomir Rintel 2019-11-15 17:35:24 UTC
(In reply to Lubomir Rintel from comment #13)
Rick & Michael,

It's been brought to my attention by my colleagues that the tone of my comment was far from appropriate. Re-reading it I wish I had not written it. I do apologize for it.

The message that I wanted to get across is that unless the communication with the partner around this issue improves, there isn't anything we can do about the issue other than shipping a downstream patch. We care about doing the right thing here, that is, involving the WALinuxAgent upstream. That is because we care about the upstream opinion, but also because that will fix the issue for other Linux images on Azure that are not necessarily running RHEL.

Thanks
Lubo

Comment 21 Michael Kelley 2019-12-05 16:53:31 UTC
I ping'ed the waagent team on this issue a couple of weeks ago, and since then the discussion has been active in the waagent GitHub pull request that Lubomir Rintel originally made. See https://github.com/Azure/WALinuxAgent/pull/1622.

I'm marking the "needs info" request as completed while the discussion is ongoing.

Comment 22 Rick Barry 2019-12-06 17:23:57 UTC
(In reply to Michael Kelley from comment #21)
> I ping'ed the waagent team on this issue a couple of weeks ago, and since
> then the discussion has been active in the waagent GitHub pull request that
> Lubomir Rintel originally made. See
> https://github.com/Azure/WALinuxAgent/pull/1622.
> 
> I'm marking the "needs info" request as completed while the discussion is
> ongoing.

Thanks, Michael, I appreciate your help to get that discussion kick-started again.

Comment 23 Thomas Haller 2020-07-09 20:30:19 UTC
Hi Rick,

what should we do about this bug? I feel the NetworkManager devel team doesn't have anything to fix here (or do we??)

I'd propose to close this bug as NOTABUG. Alternatively, if you use this bug for tracking purpose, can we assign it to a different component?

Thanks.

Comment 24 Alfred Sin 2020-07-10 01:02:32 UTC
Hi Thomas - we have added the udev rules to our Azure RHEL image builds for RHEL 8.x and all our RHEL 8.x images should contain the udev rule. We can probably close this now (and I do apologize for the all churn in the PR).

Comment 25 Thomas Haller 2020-07-10 07:28:30 UTC
Thanks!!

closing thus, according to comment 23 and comment 24.


Note You need to log in before you can comment on or make changes to this bug.