RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 2075383 - The vlan tag setting does not work in the <interface type='direct'> xml
Summary: The vlan tag setting does not work in the <interface type='direct'> xml
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 9
Classification: Red Hat
Component: libvirt
Version: 9.0
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: rc
: ---
Assignee: Michal Privoznik
QA Contact: yalzhang@redhat.com
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-04-14 04:20 UTC by Yanghang Liu
Modified: 2022-11-15 10:39 UTC (History)
13 users (show)

Fixed In Version: libvirt-8.5.0-1.el9
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-11-15 10:04:06 UTC
Type: Bug
Target Upstream Version: 8.4.0
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker LIBVIRTAT-12933 0 None None None 2022-07-28 01:55:02 UTC
Red Hat Issue Tracker RHELPLAN-118926 0 None None None 2022-04-14 04:30:41 UTC
Red Hat Product Errata RHSA-2022:8003 0 None None None 2022-11-15 10:04:26 UTC

Description Yanghang Liu 2022-04-14 04:20:48 UTC
Description of problem:
The vlan tag does not work in the  <interface type='direct'> xml


Version-Release number of selected component (if applicable):
5.14.0-78.el9.x86_64
libvirt-8.2.0-1.el9.x86_64

How reproducible:
100%

Steps to Reproduce:
1. create a VF from a PF
  # echo 1 > /sys/bus/pci/devices/0000\:41\:00.1/sriov_numvfs

2. check the PF/VF info
  # lshw -c network -businfo
  Bus info          Device       Class          Description
  =========================================================
  pci@0000:41:00.1  enp65s0f1    network        Ethernet Controller XXV710 for 25GbE SFP28
  pci@0000:41:0a.0  enp65s0f1v0  network        Ethernet Virtual Function 700 Series

  # ip -d link show enp65s0f1
  9: enp65s0f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 3c:fd:fe:b5:eb:41 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 9702 addrgenmode none numtxqueues 32 numrxqueues 32 gso_max_size 65536 gso_max_segs 65535 portid 3cfdfeb5eb41 
  parentbus pci parentdev 0000:41:00.1 
    vf 0     link/ether f6:54:76:37:75:7c brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off



3. start a vm with the following xml
  ...
  <interface type='direct'>
      <mac address='f6:54:76:37:75:7c'/>
      <source dev='enp65s0f1v0' mode='passthrough'/>
      <vlan>
        <tag id='100'/>
      </vlan>
      <model type='virtio'/>
      <driver name='vhost'/>
    </interface>


4. check if this vlan setting works 

  (4.1) # ip -d link show
  489: macvtap4@enp65s0f1v0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 500
    link/ether f6:54:76:37:75:7c brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 9702 
    macvtap mode passthru bcqueuelen 1000 usedbcqueuelen 1000 addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 
  ...
  9: enp65s0f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 3c:fd:fe:b5:eb:41 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 9702 addrgenmode none numtxqueues 32 numrxqueues 32 gso_max_size 65536 gso_max_segs 65535 portid 3cfdfeb5eb41 
  parentbus pci parentdev 0000:41:00.1 
    vf 0     link/ether f6:54:76:37:75:7c brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off  <--- There should be "vlan 100" if the vlan setting works  

  (4.2) The related interface with vlan 100 tag in the vm can still get the ip address from the DHCP server, which is not an expected result when vlan setting works 



Actual results:
  The vlan tag setting does not work in the  <interface type='direct'> xml

Expected results:
  (1) The related cmd output can show "vlan 100" keywords 

  # ip -d link show enp65s0f1
  23:25:13 INFO | 9: enp65s0f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 3c:fd:fe:b5:eb:41 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 9702 addrgenmode none numtxqueues 32 numrxqueues 32 gso_max_size 65536 gso_max_segs 65535 portid 3cfdfeb5eb41 
  parentbus pci parentdev 0000:41:00.1 
    vf 0     link/ether f6:54:76:37:75:7c brd ff:ff:ff:ff:ff:ff, *vlan 100*, spoof checking on, link-state auto, trust off


  (2) The related interface with vlan 100 tag in the vm can not get the ip address from the DHCP server



Additional info:
The problem can not be reproduced in libvirt-8.0.0-7.el9_0.x86_64

Comment 1 yalzhang@redhat.com 2022-04-14 11:33:08 UTC
I have one system with same libvirt version and intel X520, but can not reproduce it.
Another system with the same libvirt and intel X710, it can reproduce the issue.

Comment 2 Jaroslav Suchanek 2022-04-14 15:06:16 UTC
Laine, anything for libvirt here, or would you suspect underlying kernel driver instead?

Comment 3 Laine Stump 2022-04-14 17:24:17 UTC
The MAC address and VLAN ID of VFs was previously set in a single operation. There were some changes just between 8.0.0 and 8.1.0 that split the setting of MAC address and VLAN ID to happen separately (because some new Nvidia "Smart NICs" don't support setting of VLAN via the SRIOV PF):

commit 86fc0c25768326abcfebbdd17dbe1074d145f652
Author: Dmitrii Shcherbakov <dmitrii.shcherbakov>
Date:   Tue Feb 1 11:28:51 2022 +0300

    Set VF MAC and VLAN ID in two different operations
    
commit 73961771a1cfec3c0f43caec9d117d2fbcc7af39
Author: Dmitrii Shcherbakov <dmitrii.shcherbakov>
Date:   Tue Feb 1 11:28:52 2022 +0300

    Allow VF vlanid to be passed as a pointer
    
commit 09cdd16a9bf73bc1f75fe774216c71f9ebc78c88
Author: Dmitrii Shcherbakov <dmitrii.shcherbakov>
Date:   Tue Feb 1 11:28:53 2022 +0300

    Ignore EPERM on implicit clearing of VF VLAN ID
    


It's possible something in that change is triggering a strange behavior in the driver used for the X710 (which is i40e, right?) vs. the driver used for X520 (ixgbe?). (as a further datapoint, I tried libvirt 8.2.0 on my ancient 82576 card (igb) and it does properly set the vlan tag.)

Can you try turning on 1:util.netdev in logging before starting the guest? Do this for both the system with X520 (working) and the system with X710 (not working) so we can compare the results; it may provide some useful information for whoever is next on the triage trail.

Comment 6 yalzhang@redhat.com 2022-04-15 08:21:43 UTC
On x710 host, by the steps of start-destroy vm, I got the error like "Cannot read module EEPROM memory" in dmesg. 
Not each start-destroy got this error. I'm not sure if it helps. No such error for x520 card.

[73973.974489] device ens1f0v1 left promiscuous mode
[73974.233416] iavf 0000:3b:02.1: Leaving promiscuous mode
[73974.238789] i40e 0000:3b:00.0: Unprivileged VF 1 is attempting to configure promiscuous mode
[74154.901548] i40e 0000:3b:00.1 ens1f1: Cannot read module EEPROM memory. No module connected.
[74245.891055] device ens1f0v1 entered promiscuous mode
[74246.622035] iavf 0000:3b:02.1: Entering promiscuous mode
[74246.627370] iavf 0000:3b:02.1: Entering multicast promiscuous mode
[74246.633670] i40e 0000:3b:00.0: Unprivileged VF 1 is attempting to configure promiscuous mode

Comment 7 Michal Privoznik 2022-04-21 09:32:31 UTC
I think I might have found the problem. Although, I'm not currently on a machine with SRIOV. Anyway, this is the commit that I suspect has caused the problem:

https://gitlab.com/libvirt/libvirt/-/commit/73961771a1cfec3c0f43caec9d117d2fbcc7af39

and this hunk in particular is problematic:

@@ -2344,7 +2355,7 @@ virNetDevSetNetConfig(const char *linkdev, int vf,
         }
     }
 
-    if (adminMAC || vlanTag >= 0) {
+    if (adminMAC) {
         /* Set vlanTag and admin MAC using an RTM_SETLINK request sent to
          * PFdevname+VF#, if mac != NULL this will set the "admin MAC" via
          * the PF, *not* the actual VF MAC - the admin MAC only takes


because later in this block virNetDevSetVfConfig() is called which is responsible for setting vlan tag. However, this change may cause the block to be skipped. Let me see if I can cook a scratch build with obvious fix.

Comment 10 Michal Privoznik 2022-04-22 12:17:04 UTC
I've polished the fix from comment 7 and posed it here:

https://listman.redhat.com/archives/libvir-list/2022-April/230309.html

Comment 12 Michal Privoznik 2022-05-05 11:25:00 UTC
Merged upstream as:

b399f2c000 virnetdev: Fix regression in setting VLAN tag
7899a11523 virNetDevSetVfMac: Fix error message on invalid args

v8.3.0-11-gb399f2c000

Comment 13 yalzhang@redhat.com 2022-06-02 10:22:36 UTC
Hi Michal, there is a regression bug in this build, please help to check it.

# rpm -q libvirt 
libvirt-8.4.0-1.el9.x86_64

# virsh attach-interface test network default --model virtio 
error: Failed to attach interface
error: internal error: unable to execute QEMU command 'netdev_add': File descriptor named '(null)' has not been found

check the libvirtd log:
2022-06-02 10:14:51.557+0000: 231196: info : qemuMonitorSend:887 : QEMU_MONITOR_SEND_MSG: mon=0x7f6af40862f0 msg={"execute":"netdev_add","arguments":{"type":"tap","fd":"(null)","vhost":true,"vhostfd":"(null)","id":"hostnet1"},"id":"libvirt-409"}
 fd=-1

It is also report on https://gitlab.com/libvirt/libvirt/-/issues/318

Comment 14 Peter Krempa 2022-06-02 10:26:44 UTC
That regression was caused by my refactors to FD handling, not by this bug. I've assigned the upstream issue to me.

If you want a BZ to track it, please file a new one as it's not related to this one.

Comment 15 yalzhang@redhat.com 2022-06-02 11:34:44 UTC
(In reply to Peter Krempa from comment #14)
> That regression was caused by my refactors to FD handling, not by this bug.
> I've assigned the upstream issue to me.
> 
> If you want a BZ to track it, please file a new one as it's not related to
> this one.

Thank you for your quick reply. I have submitted 1 bug 2092856 to track this issue.

Comment 18 yalzhang@redhat.com 2022-06-13 04:10:03 UTC
Hi Michal, I have test the scenario in comment 0 with x710 and 82599ES, it works as expected. But when I test with some negative scenarios, there is something need to confirm, could you please help to check it? Thank you!

Test on libvirt-8.4.0-1.el9.x86_64

1. Prepare vm with interface as below setting which is not supported: 
# virsh dumpxml test | grep /interface -B9
    <interface type='direct'>
      <mac address='52:54:00:68:09:14'/>
      <source dev='enp59s0f0v0' mode='passthrough'/>
      <vlan trunk='yes'>
        <tag id='42'/>
        <tag id='123' nativeMode='untagged'/>
      </vlan>
      <model type='virtio'/>
      <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/>
    </interface>

2. try to start vm several times:
# virsh start test 
error: Failed to start domain 'test'
error: unsupported configuration: vlan trunking is not supported by SR-IOV network devices

# virsh start test 
error: Disconnected from qemu:///system due to end of file
error: Failed to start domain 'test'
error: End of file while reading data: Input/output error

# virsh list 
error: failed to connect to the hypervisor
error: no call waiting for reply with prog 536903814 vers 1 serial 8

# virsh list 
error: failed to connect to the hypervisor
error: internal error: client socket is closed

# virsh list 
 Id   Name   State
--------------------

and there is calltrace for virtqemud

Comment 21 Michal Privoznik 2022-06-13 13:24:59 UTC
(In reply to yalzhang from comment #18)
> Hi Michal, I have test the scenario in comment 0 with x710 and 82599ES, it
> works as expected. But when I test with some negative scenarios, there is
> something need to confirm, could you please help to check it? Thank you!

Yeah, this is a regression that was introduced in 8.4.0. It's not strictly related to SRIOV, but it's the easiest to reproduce. I've posted patches here:

https://listman.redhat.com/archives/libvir-list/2022-June/232422.html

I think we can use this bug to reiterate the patches. Let me move back to ASSIGNED.

Comment 24 Michal Privoznik 2022-07-15 12:41:47 UTC
Merged upstream as:

67e4fed61c qemuBuildInterfaceConnect: Initialize @tapfd array
74ba5b5401 virNetDevSaveNetConfig: Pass mode to virFileWriteStr()

v8.4.0-203-g67e4fed61c

Comment 25 yalzhang@redhat.com 2022-07-18 07:26:31 UTC
Test on libvirt-8.5.0-2.el9.x86_64, the result is as expected.

1. Start vm with direct + passthrough with vlan id:
1) prepare vm with below interface
# virsh dumpxml rhel | grep /interface -B8
    <interface type='direct'>
      <mac address='52:54:00:6f:88:c5'/>
      <source dev='enp59s0f0v0' mode='passthrough'/>
      <vlan>
        <tag id='100'/>
      </vlan>
      <model type='virtio'/>
      <address type='pci' domain='0x0000' bus='0x03' slot='0x00' function='0x0'/>
    </interface>

2) start vm and check the vlan id is added on to the vf:
# virsh start rhel 
Domain 'rhel' started

# ip l
20: enp59s0f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether f8:f2:1e:21:ce:a0 brd ff:ff:ff:ff:ff:ff
    vf 0     link/ether 06:b3:85:2d:6d:64 brd ff:ff:ff:ff:ff:ff, vlan 100, spoof checking on, link-state auto, trust off

3) destroy the vm, check the vlan id is cleared:
# virsh destroy rhel
Domain 'rhel' destroyed

# ip l show enp59s0f0
20: enp59s0f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether f8:f2:1e:21:ce:a0 brd ff:ff:ff:ff:ff:ff
    vf 0     link/ether 06:b3:85:2d:6d:64 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off

4) for hostdev type:
# virsh start rhel
# virsh dumpxml rhel | grep /interface -B11
    <interface type='hostdev' managed='yes'>
      <mac address='52:54:00:b7:b1:02'/>
      <driver name='vfio'/>
      <source>
        <address type='pci' domain='0x0000' bus='0x3b' slot='0x02' function='0x0'/>
      </source>
      <vlan>
        <tag id='40'/>
      </vlan>
      <alias name='hostdev0'/>
      <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
    </interface>
# ip l show enp59s0f0
20: enp59s0f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether f8:f2:1e:21:ce:a0 brd ff:ff:ff:ff:ff:ff
    vf 0     link/ether 52:54:00:b7:b1:02 brd ff:ff:ff:ff:ff:ff, vlan 40, spoof checking on, link-state auto, trust off

For the issue in comment 18, it can not be reproduced now:
Edit vm to be with the interface with vlan trunk which is not supported:
<interface type='direct'>
      <mac address='52:54:00:68:09:14'/>
      <source dev='enp59s0f0v0' mode='passthrough'/>
      <vlan trunk='yes'>
        <tag id='42'/>
        <tag id='123' nativeMode='untagged'/>
      </vlan>
      <model type='virtio'/>
      <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
    </interface>

Try to start it several times, the result is as expected:
# virsh start rhel
error: Failed to start domain 'rhel'
error: unsupported configuration: vlan trunking is not supported by SR-IOV network devices

# virsh start rhel
error: Failed to start domain 'rhel'
error: unsupported configuration: vlan trunking is not supported by SR-IOV network devices

# virsh list 
 Id   Name   State
--------------------

Comment 27 errata-xmlrpc 2022-11-15 10:04:06 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Low: libvirt security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:8003


Note You need to log in before you can comment on or make changes to this bug.