Bug 1066825
Summary: | failed to start guest with "<interface type='hostdev'>" type vf | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Xuesong Zhang <xuzhang> | |
Component: | kernel | Assignee: | Virtualization Maintenance <virt-maint> | |
Status: | CLOSED CURRENTRELEASE | QA Contact: | William Gomeringer <wgomerin> | |
Severity: | high | Docs Contact: | ||
Priority: | high | |||
Version: | 7.0 | CC: | acathrow, agospoda, alex.williamson, chayang, dyuan, hhuang, honzhang, jiahu, juzhang, michen, mstowe, mzhan, nupur.priya, qcai, qzhang, tgraf, xuhan, xuzhang, ypei | |
Target Milestone: | rc | Keywords: | Reopened, TestBlocker | |
Target Release: | --- | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | kernel-3.10.0-105.el7 | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1069548 (view as bug list) | Environment: | ||
Last Closed: | 2014-06-13 13:17:14 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1067873, 1069548, 1073810 |
Description
Xuesong Zhang
2014-02-19 07:43:53 UTC
I change the libvirt version to libvirt-1.1.1-17.el7.x86_64 only, other package keep same version with the bug description. The issue is still occurred. Sorry, it seems not one libvirt bug, like some other component issue. Here is the package version: libvirt-1.1.1-17.el7.x86_64------(Which one is working well before) qemu-kvm-rhev-1.5.3-48.el7.x86_64 libnl-1.1.4-3.el7.x86_64 kernel-3.10.0-89.el7.x86_64 also bumped into this issue on I350 [root@ibm-x3650m4-01 sriov_test]# uname -r 3.10.0-89.el7.x86_64 [root@ibm-x3650m4-01 sriov_test]# lspci |grep -i i350 06:00.0 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01) 06:00.1 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01) 06:00.2 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01) 06:00.3 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01) 07:10.0 Ethernet controller: Intel Corporation I350 Ethernet Controller Virtual Function (rev 01) 07:10.1 Ethernet controller: Intel Corporation I350 Ethernet Controller Virtual Function (rev 01) 07:10.2 Ethernet controller: Intel Corporation I350 Ethernet Controller Virtual Function (rev 01) 07:10.3 Ethernet controller: Intel Corporation I350 Ethernet Controller Virtual Function (rev 01) 07:10.4 Ethernet controller: Intel Corporation I350 Ethernet Controller Virtual Function (rev 01) 07:10.5 Ethernet controller: Intel Corporation I350 Ethernet Controller Virtual Function (rev 01) 07:10.6 Ethernet controller: Intel Corporation I350 Ethernet Controller Virtual Function (rev 01) 07:10.7 Ethernet controller: Intel Corporation I350 Ethernet Controller Virtual Function (rev 01) [root@ibm-x3650m4-01 sriov_test]# cat igb_vf.xml <interface type='hostdev' managed='yes'> <mac address='00:13:93:23:32:61'/> <source> <address type='pci' domain='0x0000' bus='0x07' slot='0x10' function='0x5'/> </source> </interface> [root@ibm-x3650m4-01 sriov_test]# virsh attach-device vm1 igb_vf.xml error: Failed to attach device from igb_vf.xml error: internal error: couldn't find IFLA_VF_INFO for VF 1 in netlink response it blocked igb driver sr-iov feature testing, so set it TestBlocker. (In reply to Zhang Xuesong from comment #1) > I change the libvirt version to libvirt-1.1.1-17.el7.x86_64 only, other > package keep same version with the bug description. The issue is still > occurred. > > Sorry, it seems not one libvirt bug, like some other component issue. > > > Here is the package version: > libvirt-1.1.1-17.el7.x86_64------(Which one is working well before) > qemu-kvm-rhev-1.5.3-48.el7.x86_64 > libnl-1.1.4-3.el7.x86_64 > kernel-3.10.0-89.el7.x86_64 The most likely candidate is the kernel (since it appears that the only rebuilds of libnl have been as a part of the mass auto-rebuilds, and qemu isn't involved at all with the netlink commands to get the VFINFO - it's not even running yet at that point). Please boot with an older kernel and see if the problem goes away. If so, move this BZ to kernel. BTW, in RHEL7, it is the "libnl3" package that is used by libvirt, not "libnl" (which is deprecated, and really should be removed, but can't be because I believe some packages are still using it). Fixed in libnl3-3.2.21-5.el7 *** This bug has been marked as a duplicate of bug 1069548 *** I'm actually not so sure that this is the same libnl problem (unfortunately I'm not in the room with my RHEL7 mahcine right now, and it is currently running Fedora, so I can't test to verify until tomorrow morning). Here's why: 1) the "ip" command doesn't use libnl, it directly calls netlink. 2) With kernel-3.10.0-50 on my test machine, the output of "ip link show" for one of my PFs is: 11: enp4s0f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT qlen 1000 link/ether a0:36:9f:24:c2:53 brd ff:ff:ff:ff:ff:ff vf 0 MAC e2:b0:f2:8e:eb:53, spoof checking on, link-state auto vf 1 MAC ea:7f:82:72:1b:6b, spoof checking on, link-state auto vf 2 MAC 16:06:3f:0b:3a:f7, spoof checking on, link-state auto vf 3 MAC ee:7b:e4:ca:52:4d, spoof checking on, link-state auto vf 4 MAC 2a:d1:1a:ea:9c:86, spoof checking on, link-state auto vf 5 MAC ba:e6:99:95:07:42, spoof checking on, link-state auto vf 6 MAC aa:1c:22:24:65:7f, spoof checking on, link-state auto 3) When I boot with everything else identical, but using kernel-3.10.0-95, I get this: 4: enp4s0f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT qlen 1000 link/ether a0:36:9f:24:c2:53 brd ff:ff:ff:ff:ff:ff vf 0 MAC a6:7e:03:f4:60:c7, spoof checking on, link-state auto 4) In both cases, ip link show gives output for all 7 VFs, but with the newer kernel, there is no VF info in the PF. (I have "options igb max_vfs=7" in /etc/modprobe.d/local.conf) Since iproute2 doesn't use libnl, and it is also not showing all the VFs, I think there may be a completely different problem in the kernel itself. I will check again tomorrow after upgrading libnl3. hi, Laine, From the test result, it seems you are right, this bug reason is not the same with BZ 104626 (libnl3 issue). After update the libnl3 package to the fixed version libnl3-3.2.21-5.el7.x86_64, this issue is still there, note that I also update all other package included the kernel. I notice you mentioned the "ip" command, is it the iproute component issue? or kernel component? # which ip /usr/sbin/ip # rpm -qf /usr/sbin/ip iproute-3.10.0-13.el7.x86_64 Package version: libvirt-1.1.1-25.el7.x86_64 qemu-kvm-rhev-1.5.3-50.el7.x86_64 kernel-3.10.0-97.el7.x86_64 libnl3-3.2.21-5.el7.x86_64 iproute-3.10.0-13.el7.x86_64 Test result: 1. failed to hot-plug one vf to the guest.Note: the vf is not the 1st generated vf. # virsh attach-device b inter-hostdev-vfio.xml error: Failed to attach device from inter-hostdev-vfio.xml error: internal error: couldn't find IFLA_VF_INFO for VF 1 in netlink response 2. failed to hot-plug vf to guest. Note: the vf is the 1st generated vf. #virsh attach-device b inter-hostdev-vfio.xml error: Failed to attach device from inter-hostdev-vfio.xml error: Path '/dev/vfio/22' is not accessible: No such file or directory (In reply to Zhang Xuesong from comment #8) > hi, Laine, > > From the test result, it seems you are right, this bug reason is not the > same with BZ 104626 (libnl3 issue). Sorry, there is one typo, should be BZ 1040626. > After update the libnl3 package to the fixed version > libnl3-3.2.21-5.el7.x86_64, this issue is still there, note that I also > update all other package included the kernel. > I notice you mentioned the "ip" command, is it the iproute component issue? > or kernel component? > > # which ip > /usr/sbin/ip > # rpm -qf /usr/sbin/ip > iproute-3.10.0-13.el7.x86_64 > > > Package version: > libvirt-1.1.1-25.el7.x86_64 > qemu-kvm-rhev-1.5.3-50.el7.x86_64 > kernel-3.10.0-97.el7.x86_64 > libnl3-3.2.21-5.el7.x86_64 > iproute-3.10.0-13.el7.x86_64 > > Test result: > 1. failed to hot-plug one vf to the guest.Note: the vf is not the 1st > generated vf. > # virsh attach-device b inter-hostdev-vfio.xml > error: Failed to attach device from inter-hostdev-vfio.xml > error: internal error: couldn't find IFLA_VF_INFO for VF 1 in netlink > response > > 2. failed to hot-plug vf to guest. Note: the vf is the 1st generated vf. > #virsh attach-device b inter-hostdev-vfio.xml > error: Failed to attach device from inter-hostdev-vfio.xml > error: Path '/dev/vfio/22' is not accessible: No such file or directory (In reply to Zhang Xuesong from comment #8) > From the test result, it seems you are right, this bug reason is not the > same with BZ 104626 (libnl3 issue). > After update the libnl3 package to the fixed version > libnl3-3.2.21-5.el7.x86_64, this issue is still there, note that I also > update all other package included the kernel. I've also verified that the problem still remains (both with libvirt and with "ip link show" after updating to the latest libnl3 build. > I notice you mentioned the "ip" command, is it the iproute component issue? > or kernel component? Since the behavior is common to both iproute and libvirt+libnl3, the problem is in the kernel (which is the only piece in common. > Test result: > 1. failed to hot-plug one vf to the guest.Note: the vf is not the 1st > generated vf. > # virsh attach-device b inter-hostdev-vfio.xml > error: Failed to attach device from inter-hostdev-vfio.xml > error: internal error: couldn't find IFLA_VF_INFO for VF 1 in netlink > response This is because the IFLA_VF_INFO response contains only the first VF (vf0). Note that when you list /sys/bus/pci/devices/$PF-directory, you will see all of the VFs listed as "virtfnN". I also tried writing "0" to sriov_numvfs, then writing "7". No error was indicated, and the virtfnN links disappeared and reappeared, but the vfs were still missing from the ip link show output of the PF. In light of all this, I am re-opening this bug and assigning it to kernel. (it still has blocker? and the "TestBlocker" tag). > > 2. failed to hot-plug vf to guest. Note: the vf is the 1st generated vf. > #virsh attach-device b inter-hostdev-vfio.xml > error: Failed to attach device from inter-hostdev-vfio.xml > error: Path '/dev/vfio/22' is not accessible: No such file or directory In this case, you are attempting to pass through vf0, which *is* in the IFLA_VF_INFO output, so that part is succeeding, but you're hitting a separate problem later on. This error message is one that would previously show up as a side-effect of some other error, for example if vfio wasn't properly loaded, but I thought all those bugs had been fixed. Can you look in your log files for this same message, then see if libvirt has logged some other message just prior? (fyi, I didn't see this error when I started a domain with vf0 attached) (In reply to Laine Stump from comment #10) > (In reply to Zhang Xuesong from comment #8) > > 2. failed to hot-plug vf to guest. Note: the vf is the 1st generated vf. > > #virsh attach-device b inter-hostdev-vfio.xml > > error: Failed to attach device from inter-hostdev-vfio.xml > > error: Path '/dev/vfio/22' is not accessible: No such file or directory > > In this case, you are attempting to pass through vf0, which *is* in the > IFLA_VF_INFO output, so that part is succeeding, but you're hitting a > separate problem later on. This error message is one that would previously > show up as a side-effect of some other error, for example if vfio wasn't > properly loaded, but I thought all those bugs had been fixed. Can you look > in your log files for this same message, then see if libvirt has logged some > other message just prior? > > (fyi, I didn't see this error when I started a domain with vf0 attached) I suspect this problem is because the xml from the original post does not included managed=yes and the device is therefore not attached to vfio-pci. The user needs to explicitly virsh nodedev-detach the device without managed=yes. Bisecting kernel rpms with the 'ip link show' test, the problem was introduced in kernel 86. I'll bisect the changes added for that kernel. (In reply to Alex Williamson from comment #11) > (In reply to Laine Stump from comment #10) > > (In reply to Zhang Xuesong from comment #8) > > > 2. failed to hot-plug vf to guest. Note: the vf is the 1st generated vf. > > > #virsh attach-device b inter-hostdev-vfio.xml > > > error: Failed to attach device from inter-hostdev-vfio.xml > > > error: Path '/dev/vfio/22' is not accessible: No such file or directory > > > > In this case, you are attempting to pass through vf0, which *is* in the > > IFLA_VF_INFO output, so that part is succeeding, but you're hitting a > > separate problem later on. This error message is one that would previously > > show up as a side-effect of some other error, for example if vfio wasn't > > properly loaded, but I thought all those bugs had been fixed. Can you look > > in your log files for this same message, then see if libvirt has logged some > > other message just prior? > > > > (fyi, I didn't see this error when I started a domain with vf0 attached) > > I suspect this problem is because the xml from the original post does not > included managed=yes and the device is therefore not attached to vfio-pci. > The user needs to explicitly virsh nodedev-detach the device without > managed=yes. Sorry, didn't add "managed=yes" in the attached xml. After add it, the first generated VF is working well while both hot-plug and pci passthrough method. *** Bug 1069548 has been marked as a duplicate of this bug. *** Patch(es) available on kernel-3.10.0-105.el7 Test with the latest build, this bug is fixed. Package version: libvirt-1.1.1-26.el7.x86_64 qemu-kvm-rhev-1.5.3-52.el7.x86_64 kernel-3.10.0-105.el7.x86_64 Scenario 1: assign vf to guest 1. find one SR-IOV host, check the output of the command "ip link show" of one PF. # ip link show ...... 15: ens1f1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN mode DEFAULT qlen 1000 link/ether 00:1b:21:39:8b:19 brd ff:ff:ff:ff:ff:ff vf 0 MAC d2:97:8f:88:9d:a3, spoof checking on, link-state auto vf 1 MAC 52:54:00:a5:e7:f6, spoof checking on, link-state auto vf 2 MAC 00:00:00:00:00:00, spoof checking on, link-state auto vf 3 MAC 00:00:00:00:00:00, spoof checking on, link-state auto vf 4 MAC 00:00:00:00:00:00, spoof checking on, link-state auto vf 5 MAC 00:00:00:00:00:00, spoof checking on, link-state auto vf 6 MAC 00:00:00:00:00:00, spoof checking on, link-state auto ...... 2. assign one vf to the guest, note the vf is not the 1st generated vf. Add the following xml to the shutoff guest. <interface type='hostdev' managed='yes'> <mac address='52:54:00:a5:e7:f6'/> <source> <address type='pci' domain='0x0000' bus='0x03' slot='0x10' function='0x3'/> </source> </interface> 3. start the guest # virsh start b Domain b started 4. check the guest dumpxml, make sure the interface is # virsh dumpxml b|grep hostdev -A5 <interface type='hostdev' managed='yes'> <mac address='52:54:00:a5:e7:f6'/> <driver name='vfio'/> <source> <address type='pci' domain='0x0000' bus='0x03' slot='0x10' function='0x3'/> </source> <alias name='hostdev0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> </interface> Scenario 2: hot-plug vf to guest 1. prepare one running guest 2. prepare one vf xml like following: # cat inter-hostdev-vfio.xml <interface type='hostdev' managed='yes'> <mac address='52:54:00:a5:e7:f6'/> <source> <address type='pci' domain='0x0000' bus='0x03' slot='0x10' function='0x3'/> </source> </interface> 3. hot-plug the vf to the running guest. # virsh attach-device b inter-hostdev-vfio.xml Device attached successfully 4. check the guest xml, make sure the vf is in there. # virsh dumpxml b|grep hostdev -A5 <interface type='hostdev' managed='yes'> <mac address='52:54:00:a5:e7:f6'/> <driver name='vfio'/> <source> <address type='pci' domain='0x0000' bus='0x03' slot='0x10' function='0x3'/> </source> <alias name='hostdev0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> </interface> scenario 3: assign vf from hostdev network, hot-plug vf from hostdev network. 1. prepare one hostdev network like following one: # virsh net-list Name State Autostart Persistent ---------------------------------------------------------- default active yes yes hostnet active no yes # virsh net-dumpxml hostnet <network> <name>hostnet</name> <uuid>c64f5418-4287-4152-83af-e944215fb824</uuid> <forward mode='hostdev' managed='yes'> <address type='pci' domain='0x0000' bus='0x03' slot='0x10' function='0x0'/> <address type='pci' domain='0x0000' bus='0x03' slot='0x10' function='0x1'/> <address type='pci' domain='0x0000' bus='0x03' slot='0x10' function='0x2'/> <address type='pci' domain='0x0000' bus='0x03' slot='0x10' function='0x3'/> </forward> </network> 2. add the following xml to the shutoff guest. <interface type='network'> <source network='hostnet'/> </interface> 3. start the guest. # virsh start b Domain b started 4. check the guest dumpxml, make sure the interface is in there. # virsh dumpxml b|grep interface -A5 <interface type='network'> <mac address='52:54:00:89:9a:ec'/> <source network='hostnet'/> <model type='rtl8139'/> <alias name='hostdev0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> </interface> 5. hot-plug the interface from hostdev network to running guest. # virsh attach-interface b network hostnet Interface attached successfully 6. check the guest xml, make sure another one interface in the hostdev network is attached. [root@sriov1 xuzhang]# virsh dumpxml b|grep interface -A5 <interface type='network'> <mac address='52:54:00:89:9a:ec'/> <source network='hostnet'/> <model type='rtl8139'/> <alias name='hostdev0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> </interface> <interface type='network'> <mac address='52:54:00:b6:45:10'/> <source network='hostnet'/> <model type='rtl8139'/> <alias name='hostdev1'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x09' function='0x0'/> </interface> Tested with these components below: kernel-3.10.0-110.el7.x86_64 qemu-kvm-rhev-1.5.3-53.el7.x86_64 Steps: 1. Generates VFs, and then check them. # modprobe igb max_vfs=7 # ip link show ... 7: ens1f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT qlen 1000 link/ether 00:1b:21:39:8b:18 brd ff:ff:ff:ff:ff:ff vf 0 MAC 5e:d2:1c:1c:2b:3a, spoof checking on, link-state auto vf 1 MAC 6a:09:bf:33:83:00, spoof checking on, link-state auto vf 2 MAC 7a:16:0a:a6:9c:e3, spoof checking on, link-state auto vf 3 MAC ba:8d:c2:f2:17:53, spoof checking on, link-state auto vf 4 MAC 22:d4:42:5e:f3:63, spoof checking on, link-state auto vf 5 MAC 5a:84:13:6e:a7:56, spoof checking on, link-state auto vf 6 MAC de:3c:c0:42:15:e9, spoof checking on, link-state auto ... 22: ens1f1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN mode DEFAULT qlen 1000 link/ether 00:1b:21:39:8b:19 brd ff:ff:ff:ff:ff:ff vf 0 MAC 00:00:00:00:00:00, spoof checking on, link-state auto vf 1 MAC 00:00:00:00:00:00, spoof checking on, link-state auto vf 2 MAC 00:00:00:00:00:00, spoof checking on, link-state auto vf 3 MAC 00:00:00:00:00:00, spoof checking on, link-state auto vf 4 MAC 00:00:00:00:00:00, spoof checking on, link-state auto vf 5 MAC 00:00:00:00:00:00, spoof checking on, link-state auto vf 6 MAC 00:00:00:00:00:00, spoof checking on, link-state auto 2. Bind one VF to vfio-pci. # modprobe vfio # modprobe vfio-pci # echo "0000:03:10.6" > /sys/bus/pci/drivers/igbvf/unbind # echo "8086 10ca" > /sys/bus/pci/drivers/vfio-pci/new_id Scenario 1: Boot guest with assigned VF. # /usr/libexec/qemu-kvm ... \ -device vfio-pci,host=03:10.6,id=vf0 Scenario 2: Hot plug assigned VF to guest. (qmp) {"execute": "device_add", "arguments": {"driver": "vfio-pci", "host": "03:10.6", "id": "vf0"}} Results: The assigned VF works properly in these two scenarios above. This request was resolved in Red Hat Enterprise Linux 7.0. Contact your manager or support representative in case you have further questions about the request. Still facing the same issue with following kernel version: Name : kernel Version : 3.10.0 Release : 327.el7 Architecture: x86_64 |