Bug 2075383
Summary: | The vlan tag setting does not work in the <interface type='direct'> xml | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 9 | Reporter: | Yanghang Liu <yanghliu> |
Component: | libvirt | Assignee: | Michal Privoznik <mprivozn> |
libvirt sub component: | Networking | QA Contact: | yalzhang <yalzhang> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | medium | ||
Priority: | medium | CC: | chayang, dzheng, jdenemar, jsuchane, laine, lmen, lvivier, mprivozn, pkrempa, virt-bugs, virt-maint, yalzhang, yicui |
Version: | 9.0 | Keywords: | Triaged, Upstream |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | libvirt-8.5.0-1.el9 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2022-11-15 10:04:06 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | 8.4.0 |
Embargoed: |
Description
Yanghang Liu
2022-04-14 04:20:48 UTC
I have one system with same libvirt version and intel X520, but can not reproduce it. Another system with the same libvirt and intel X710, it can reproduce the issue. Laine, anything for libvirt here, or would you suspect underlying kernel driver instead? The MAC address and VLAN ID of VFs was previously set in a single operation. There were some changes just between 8.0.0 and 8.1.0 that split the setting of MAC address and VLAN ID to happen separately (because some new Nvidia "Smart NICs" don't support setting of VLAN via the SRIOV PF): commit 86fc0c25768326abcfebbdd17dbe1074d145f652 Author: Dmitrii Shcherbakov <dmitrii.shcherbakov> Date: Tue Feb 1 11:28:51 2022 +0300 Set VF MAC and VLAN ID in two different operations commit 73961771a1cfec3c0f43caec9d117d2fbcc7af39 Author: Dmitrii Shcherbakov <dmitrii.shcherbakov> Date: Tue Feb 1 11:28:52 2022 +0300 Allow VF vlanid to be passed as a pointer commit 09cdd16a9bf73bc1f75fe774216c71f9ebc78c88 Author: Dmitrii Shcherbakov <dmitrii.shcherbakov> Date: Tue Feb 1 11:28:53 2022 +0300 Ignore EPERM on implicit clearing of VF VLAN ID It's possible something in that change is triggering a strange behavior in the driver used for the X710 (which is i40e, right?) vs. the driver used for X520 (ixgbe?). (as a further datapoint, I tried libvirt 8.2.0 on my ancient 82576 card (igb) and it does properly set the vlan tag.) Can you try turning on 1:util.netdev in logging before starting the guest? Do this for both the system with X520 (working) and the system with X710 (not working) so we can compare the results; it may provide some useful information for whoever is next on the triage trail. On x710 host, by the steps of start-destroy vm, I got the error like "Cannot read module EEPROM memory" in dmesg. Not each start-destroy got this error. I'm not sure if it helps. No such error for x520 card. [73973.974489] device ens1f0v1 left promiscuous mode [73974.233416] iavf 0000:3b:02.1: Leaving promiscuous mode [73974.238789] i40e 0000:3b:00.0: Unprivileged VF 1 is attempting to configure promiscuous mode [74154.901548] i40e 0000:3b:00.1 ens1f1: Cannot read module EEPROM memory. No module connected. [74245.891055] device ens1f0v1 entered promiscuous mode [74246.622035] iavf 0000:3b:02.1: Entering promiscuous mode [74246.627370] iavf 0000:3b:02.1: Entering multicast promiscuous mode [74246.633670] i40e 0000:3b:00.0: Unprivileged VF 1 is attempting to configure promiscuous mode I think I might have found the problem. Although, I'm not currently on a machine with SRIOV. Anyway, this is the commit that I suspect has caused the problem: https://gitlab.com/libvirt/libvirt/-/commit/73961771a1cfec3c0f43caec9d117d2fbcc7af39 and this hunk in particular is problematic: @@ -2344,7 +2355,7 @@ virNetDevSetNetConfig(const char *linkdev, int vf, } } - if (adminMAC || vlanTag >= 0) { + if (adminMAC) { /* Set vlanTag and admin MAC using an RTM_SETLINK request sent to * PFdevname+VF#, if mac != NULL this will set the "admin MAC" via * the PF, *not* the actual VF MAC - the admin MAC only takes because later in this block virNetDevSetVfConfig() is called which is responsible for setting vlan tag. However, this change may cause the block to be skipped. Let me see if I can cook a scratch build with obvious fix. I've polished the fix from comment 7 and posed it here: https://listman.redhat.com/archives/libvir-list/2022-April/230309.html Merged upstream as: b399f2c000 virnetdev: Fix regression in setting VLAN tag 7899a11523 virNetDevSetVfMac: Fix error message on invalid args v8.3.0-11-gb399f2c000 Hi Michal, there is a regression bug in this build, please help to check it. # rpm -q libvirt libvirt-8.4.0-1.el9.x86_64 # virsh attach-interface test network default --model virtio error: Failed to attach interface error: internal error: unable to execute QEMU command 'netdev_add': File descriptor named '(null)' has not been found check the libvirtd log: 2022-06-02 10:14:51.557+0000: 231196: info : qemuMonitorSend:887 : QEMU_MONITOR_SEND_MSG: mon=0x7f6af40862f0 msg={"execute":"netdev_add","arguments":{"type":"tap","fd":"(null)","vhost":true,"vhostfd":"(null)","id":"hostnet1"},"id":"libvirt-409"} fd=-1 It is also report on https://gitlab.com/libvirt/libvirt/-/issues/318 That regression was caused by my refactors to FD handling, not by this bug. I've assigned the upstream issue to me. If you want a BZ to track it, please file a new one as it's not related to this one. (In reply to Peter Krempa from comment #14) > That regression was caused by my refactors to FD handling, not by this bug. > I've assigned the upstream issue to me. > > If you want a BZ to track it, please file a new one as it's not related to > this one. Thank you for your quick reply. I have submitted 1 bug 2092856 to track this issue. Hi Michal, I have test the scenario in comment 0 with x710 and 82599ES, it works as expected. But when I test with some negative scenarios, there is something need to confirm, could you please help to check it? Thank you! Test on libvirt-8.4.0-1.el9.x86_64 1. Prepare vm with interface as below setting which is not supported: # virsh dumpxml test | grep /interface -B9 <interface type='direct'> <mac address='52:54:00:68:09:14'/> <source dev='enp59s0f0v0' mode='passthrough'/> <vlan trunk='yes'> <tag id='42'/> <tag id='123' nativeMode='untagged'/> </vlan> <model type='virtio'/> <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/> </interface> 2. try to start vm several times: # virsh start test error: Failed to start domain 'test' error: unsupported configuration: vlan trunking is not supported by SR-IOV network devices # virsh start test error: Disconnected from qemu:///system due to end of file error: Failed to start domain 'test' error: End of file while reading data: Input/output error # virsh list error: failed to connect to the hypervisor error: no call waiting for reply with prog 536903814 vers 1 serial 8 # virsh list error: failed to connect to the hypervisor error: internal error: client socket is closed # virsh list Id Name State -------------------- and there is calltrace for virtqemud (In reply to yalzhang from comment #18) > Hi Michal, I have test the scenario in comment 0 with x710 and 82599ES, it > works as expected. But when I test with some negative scenarios, there is > something need to confirm, could you please help to check it? Thank you! Yeah, this is a regression that was introduced in 8.4.0. It's not strictly related to SRIOV, but it's the easiest to reproduce. I've posted patches here: https://listman.redhat.com/archives/libvir-list/2022-June/232422.html I think we can use this bug to reiterate the patches. Let me move back to ASSIGNED. Merged upstream as: 67e4fed61c qemuBuildInterfaceConnect: Initialize @tapfd array 74ba5b5401 virNetDevSaveNetConfig: Pass mode to virFileWriteStr() v8.4.0-203-g67e4fed61c Test on libvirt-8.5.0-2.el9.x86_64, the result is as expected. 1. Start vm with direct + passthrough with vlan id: 1) prepare vm with below interface # virsh dumpxml rhel | grep /interface -B8 <interface type='direct'> <mac address='52:54:00:6f:88:c5'/> <source dev='enp59s0f0v0' mode='passthrough'/> <vlan> <tag id='100'/> </vlan> <model type='virtio'/> <address type='pci' domain='0x0000' bus='0x03' slot='0x00' function='0x0'/> </interface> 2) start vm and check the vlan id is added on to the vf: # virsh start rhel Domain 'rhel' started # ip l 20: enp59s0f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000 link/ether f8:f2:1e:21:ce:a0 brd ff:ff:ff:ff:ff:ff vf 0 link/ether 06:b3:85:2d:6d:64 brd ff:ff:ff:ff:ff:ff, vlan 100, spoof checking on, link-state auto, trust off 3) destroy the vm, check the vlan id is cleared: # virsh destroy rhel Domain 'rhel' destroyed # ip l show enp59s0f0 20: enp59s0f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000 link/ether f8:f2:1e:21:ce:a0 brd ff:ff:ff:ff:ff:ff vf 0 link/ether 06:b3:85:2d:6d:64 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off 4) for hostdev type: # virsh start rhel # virsh dumpxml rhel | grep /interface -B11 <interface type='hostdev' managed='yes'> <mac address='52:54:00:b7:b1:02'/> <driver name='vfio'/> <source> <address type='pci' domain='0x0000' bus='0x3b' slot='0x02' function='0x0'/> </source> <vlan> <tag id='40'/> </vlan> <alias name='hostdev0'/> <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/> </interface> # ip l show enp59s0f0 20: enp59s0f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000 link/ether f8:f2:1e:21:ce:a0 brd ff:ff:ff:ff:ff:ff vf 0 link/ether 52:54:00:b7:b1:02 brd ff:ff:ff:ff:ff:ff, vlan 40, spoof checking on, link-state auto, trust off For the issue in comment 18, it can not be reproduced now: Edit vm to be with the interface with vlan trunk which is not supported: <interface type='direct'> <mac address='52:54:00:68:09:14'/> <source dev='enp59s0f0v0' mode='passthrough'/> <vlan trunk='yes'> <tag id='42'/> <tag id='123' nativeMode='untagged'/> </vlan> <model type='virtio'/> <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/> </interface> Try to start it several times, the result is as expected: # virsh start rhel error: Failed to start domain 'rhel' error: unsupported configuration: vlan trunking is not supported by SR-IOV network devices # virsh start rhel error: Failed to start domain 'rhel' error: unsupported configuration: vlan trunking is not supported by SR-IOV network devices # virsh list Id Name State -------------------- Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Low: libvirt security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:8003 |