Bug 1261708
Summary: | Guest gets paused after unplugging a PCI device | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Dan Zheng <dzheng> |
Component: | libvirt | Assignee: | Andrea Bolognani <abologna> |
Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> |
Severity: | medium | Docs Contact: | |
Priority: | unspecified | ||
Version: | 7.2 | CC: | abologna, dgibson, dyuan, dzheng, gklein, gsun, hannsj_uhl, jsuchane, lmiksik, mzhan, rbalakri, zhwang |
Target Milestone: | rc | Keywords: | Reopened, TestOnly |
Target Release: | --- | ||
Hardware: | ppc64le | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2015-11-19 06:54:17 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1259556 | ||
Bug Blocks: | 1201513, 1277183, 1277184 |
Description
Dan Zheng
2015-09-10 02:48:35 UTC
Retest will be executed after 1259556 is on QA Can you post the output of # virsh nodedev-dumpxml pci_0002_01_00_0 please? Run command with below packages: libvirt-daemon-1.2.17-9.el7.ppc64le qemu-kvm-rhev-2.3.0-24.el7.ppc64le kernel-3.10.0-316.el7.ppc64le # virsh nodedev-dumpxml pci_0002_01_00_0 <device> <name>pci_0002_01_00_0</name> <path>/sys/devices/pci0002:00/0002:00:00.0/0002:01:00.0</path> <parent>pci_0002_00_00_0</parent> <driver> <name>be2net</name> </driver> <capability type='pci'> <domain>2</domain> <bus>1</bus> <slot>0</slot> <function>0</function> <product id='0xe220'>OneConnect NIC (Lancer)</product> <vendor id='0x10df'>Emulex Corporation</vendor> <iommuGroup number='1'> <address domain='0x0002' bus='0x01' slot='0x00' function='0x0'/> <address domain='0x0002' bus='0x01' slot='0x00' function='0x1'/> <address domain='0x0002' bus='0x01' slot='0x00' function='0x2'/> <address domain='0x0002' bus='0x01' slot='0x00' function='0x3'/> <address domain='0x0002' bus='0x01' slot='0x00' function='0x4'/> <address domain='0x0002' bus='0x01' slot='0x00' function='0x5'/> </iommuGroup> <numa node='0'/> <pci-express> <link validity='cap' port='0' speed='8' width='8'/> <link validity='sta' speed='8' width='8'/> </pci-express> </capability> </device> Thanks. Is any of the ports assigned to the host? Or is it using a different Ethernet card altogether? (In reply to Andrea Bolognani from comment #5) > Thanks. > > Is any of the ports assigned to the host? > Or is it using a different Ethernet card altogether? Andrea, Before starting the guest, I have nodedev-detach all the pci devices in this iommugroup from the host. then start the guest. After that , detach one of them. And on that host, there are two Ethernet cards. I used one of them. But for those 6 pci devices in that iommuGroup, they are from one card. Did above answer your questions? # lspci ... 0002:01:00.0 Ethernet controller: Emulex Corporation OneConnect NIC (Lancer) (rev 10) 0002:01:00.1 Ethernet controller: Emulex Corporation OneConnect NIC (Lancer) (rev 10) 0002:01:00.2 Ethernet controller: Emulex Corporation OneConnect NIC (Lancer) (rev 10) 0002:01:00.3 Ethernet controller: Emulex Corporation OneConnect NIC (Lancer) (rev 10) 0002:01:00.4 Fibre Channel: Emulex Corporation OneConnect FCoE Initiator (Lancer) (rev 10) 0002:01:00.5 Fibre Channel: Emulex Corporation OneConnect FCoE Initiator (Lancer) (rev 10) ... 0003:09:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01) 0003:09:00.1 Ethernet controller: Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01) 0003:09:00.2 Ethernet controller: Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01) 0003:09:00.3 Ethernet controller: Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01) ************************************************************* Today I did test again. But same error happened again. Host is installed with the snapshot 2 tree RHEL-7.2-20150917.0 Server. libvirt-1.2.17-9.el7.ppc64le qemu-kvm-rhev-2.3.0-23.el7.ppc64le (replace qemu-kvm-rhev-2.3.0-24.el7.ppc64le due to bug 1264845) kernel-3.10.0-316.el7.ppc64le Guest: kernel-3.10.0-316.el7.ppc64le Host only has one Ethernet card. # lspci ... 0003:03:00.0 USB controller: Texas Instruments TUSB73x0 SuperSpeed USB 3.0 xHCI Host Controller (rev 02) 0003:09:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01) 0003:09:00.1 Ethernet controller: Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01) 0003:09:00.2 Ethernet controller: Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01) 0003:09:00.3 Ethernet controller: Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01) <device> <name>pci_0003_09_00_0</name> <path>/sys/devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:09.0/0003:09:00.0</path> <parent>pci_0003_02_09_0</parent> <driver> <name>vfio-pci</name> </driver> <capability type='pci'> <domain>3</domain> <bus>9</bus> <slot>0</slot> <function>0</function> <product id='0x1657'>NetXtreme BCM5719 Gigabit Ethernet PCIe</product> <vendor id='0x14e4'>Broadcom Corporation</vendor> <iommuGroup number='1'> <address domain='0x0003' bus='0x09' slot='0x00' function='0x0'/> <address domain='0x0003' bus='0x09' slot='0x00' function='0x1'/> <address domain='0x0003' bus='0x09' slot='0x00' function='0x2'/> <address domain='0x0003' bus='0x09' slot='0x00' function='0x3'/> </iommuGroup> <numa node='0'/> <pci-express> <link validity='cap' port='0' speed='2.5' width='4'/> <link validity='sta' speed='2.5' width='4'/> </pci-express> </capability> </device> Start the guest with below four pci devices which will all be detached from host automatically as managed=yes. <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0003' bus='0x09' slot='0x00' function='0x0'/> </source> </hostdev> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0003' bus='0x09' slot='0x00' function='0x1'/> </source> </hostdev> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0003' bus='0x09' slot='0x00' function='0x2'/> </source> </hostdev> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0003' bus='0x09' slot='0x00' function='0x3'/> </source> </hostdev> Guest is running. Log in guest. # lspci 00:06.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01) 00:07.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01) 00:08.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01) 00:09.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01) On host, detach pci 03:09:00.02 successfully. <hostdev mode='subsystem' type='pci' managed='yes'> <source> <address domain='0x0003' bus='0x09' slot='0x00' function='0x2'/> </source> </hostdev> Check dumpxml, the xml is updated already to remove this pci device. But guest is paused with same error messages as before. And I also got below error on host. # lspci pcilib: Cannot open /sys/bus/pci/devices/0003:09:00.3/config lspci: Unable to read the standard configuration space header of device 0003:09:00.3 ... 0003:09:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01) 0003:09:00.1 Ethernet controller: Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01) 0003:09:00.2 Ethernet controller: Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01) ----Below missing--- 0003:09:00.3 ... Yes, that's exactly the information I was looking for. I just wanted to make sure that there's no obvious reason why the setup you're using wouldn't work, and that doesn't seem to be the case. I'm now confident the issues you're facing will go away as soon as Bug 1259556 has been fixed. Thanks for your help. Product Management has reviewed and declined this request. You may appeal this decision by reopening this request. Test with packages below: libvirt-1.2.17-13.el7.ppc64le qemu-kvm-rhev-2.3.0-29.el7.ppc64le kernel-3.10.0-322.el7.ppc64le Guest kernel: kernel-3.10.0-322.el7.ppc64le 1.Detach a device pci_0003_09_00_0 from the host. # virsh nodedev-dumpxml pci_0003_09_00_0 <device> <name>pci_0003_09_00_0</name> <path>/sys/devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:09.0/0003:09:00.0</path> <parent>pci_0003_02_09_0</parent> <driver> <name>tg3</name> </driver> <capability type='pci'> <domain>3</domain> <bus>9</bus> <slot>0</slot> <function>0</function> <product id='0x1657'>NetXtreme BCM5719 Gigabit Ethernet PCIe</product> <vendor id='0x14e4'>Broadcom Corporation</vendor> <iommuGroup number='1'> <address domain='0x0003' bus='0x09' slot='0x00' function='0x0'/> <address domain='0x0003' bus='0x09' slot='0x00' function='0x1'/> <address domain='0x0003' bus='0x09' slot='0x00' function='0x2'/> <address domain='0x0003' bus='0x09' slot='0x00' function='0x3'/> </iommuGroup> <numa node='0'/> <pci-express> <link validity='cap' port='0' speed='2.5' width='4'/> <link validity='sta' speed='2.5' width='4'/> </pci-express> </capability> </device> # virsh nodedev-detach pci_0003_09_00_0 ...Successful. # virsh nodedev-reset pci_0003_09_00_0 ...Successful. 2. Start the guest with 3 Host PCI devices. And the guest is running. <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0003' bus='0x09' slot='0x00' function='0x1'/> </source> </hostdev> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0003' bus='0x09' slot='0x00' function='0x2'/> </source> </hostdev> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0003' bus='0x09' slot='0x00' function='0x3'/> </source> </hostdev> 3. Check the PCI devices are displayed in the guest and Yes. 4. Detach/attach a PCI device from/to the guest. unplug.xml: <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0003' bus='0x09' slot='0x00' function='0x1'/> </source> </hostdev> # virsh detach-device virt-tests-vm1 unplug.xml ---> using pci_0003_09_00_1 Successful. # virsh attach-device virt-tests-vm1 unplug.xml ---> using pci_0003_09_00_0 Successful. 5. Check dumpxml of the guest and it does get updated. 6. Check the lspci within the guest and it does get updated. 7. Repeat step 4 - 6 to use other pci devices in same iommu group, like pci_0003_09_00_3, pci_0003_09_00_2, and it works as expected, except the unexpected guest crashing and rebooting which is tracked by bug 1270636. The guest's getting paused issue disappears. So I mark this as verified as the original issue does not happen any more. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2015-2202.html |