RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1261708 - Guest gets paused after unplugging a PCI device
Summary: Guest gets paused after unplugging a PCI device
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: libvirt
Version: 7.2
Hardware: ppc64le
OS: Linux
unspecified
medium
Target Milestone: rc
: ---
Assignee: Andrea Bolognani
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On: 1259556
Blocks: RHEV3.6PPC 1277183 1277184
TreeView+ depends on / blocked
 
Reported: 2015-09-10 02:48 UTC by Dan Zheng
Modified: 2016-02-21 11:14 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-11-19 06:54:17 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2015:2202 0 normal SHIPPED_LIVE libvirt bug fix and enhancement update 2015-11-19 08:17:58 UTC

Description Dan Zheng 2015-09-10 02:48:35 UTC
The guest gets paused after unplugging a PCI device and  unrecoverable error is detected.

Packages: 
  kernel-3.10.0-313.el7.ppc64le
  qemu-kvm-rhev-2.3.0-22.el7.ppc64le
  libvirt-daemon-1.2.17-7.el7.ppc64le
Guest: kernel-3.10.0-313.el7.ppc64le


Steps:
0. Prepare the environment on host
#modprobe vfio
#modprobe vfio_spapr_eeh
#modprobe vfio_iommu_spapr_tce
#modprobe vfio_pci

1. Start a guest with 3 PCI devices passthrough to the guest.

# virsh start dzhengvm2
Domain dzhengvm2 started

# virsh dumpxml dzhengvm2|grep hostdev -A5
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0002' bus='0x01' slot='0x00' function='0x0'/>
      </source>
      <alias name='hostdev0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0002' bus='0x01' slot='0x00' function='0x1'/>
      </source>
      <alias name='hostdev1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x09' function='0x0'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0002' bus='0x01' slot='0x00' function='0x3'/>
      </source>
      <alias name='hostdev2'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x0a' function='0x0'/>
    </hostdev>


2. Check within the guest, and three PCI devices are displayed.
[root@localhost ~]# lspci
...
00:08.0 Ethernet controller: Emulex Corporation OneConnect NIC (Lancer) (rev 10)
00:09.0 Ethernet controller: Emulex Corporation OneConnect NIC (Lancer) (rev 10)
00:0a.0 Ethernet controller: Emulex Corporation OneConnect NIC (Lancer) (rev 10)



3. Unplug one PCI device pci_0002_01_00_1
unplugPF.xml:
<hostdev mode='subsystem' type='pci' managed='yes'>
  <source>
     <address domain='0x0002' bus='0x01' slot='0x00' function='0x3'/>
  </source>
</hostdev>

# virsh detach-device dzhengvm2 unplugPF.xml
Device detached successfully

4. Check guest state
# virsh list --all
 Id    Name                           State
----------------------------------------------------
 21    d1                             paused

5. Check qemu log

2015-09-06T08:19:44.390589Z qemu-kvm: vfio_err_notifier_handler(0002:01:00.1) Unrecoverable error detected.  Please collect any data possible and then kill the guest
2015-09-06T08:19:44.413427Z qemu-kvm: vfio_err_notifier_handler(0002:01:00.0) Unrecoverable error detected.  Please collect any data possible and then kill the guest

Additional information:
This is separated from bug 1245004 and this bug also blocks the reproduction of bug 1245004.

Comment 1 Dan Zheng 2015-09-10 02:50:40 UTC
Retest will be executed after 1259556 is on QA

Comment 3 Andrea Bolognani 2015-09-21 09:04:04 UTC
Can you post the output of

  # virsh nodedev-dumpxml pci_0002_01_00_0

please?

Comment 4 Dan Zheng 2015-09-21 09:56:27 UTC
Run command with below packages:

libvirt-daemon-1.2.17-9.el7.ppc64le
qemu-kvm-rhev-2.3.0-24.el7.ppc64le
kernel-3.10.0-316.el7.ppc64le


#  virsh nodedev-dumpxml pci_0002_01_00_0
<device>
  <name>pci_0002_01_00_0</name>
  <path>/sys/devices/pci0002:00/0002:00:00.0/0002:01:00.0</path>
  <parent>pci_0002_00_00_0</parent>
  <driver>
    <name>be2net</name>
  </driver>
  <capability type='pci'>
    <domain>2</domain>
    <bus>1</bus>
    <slot>0</slot>
    <function>0</function>
    <product id='0xe220'>OneConnect NIC (Lancer)</product>
    <vendor id='0x10df'>Emulex Corporation</vendor>
    <iommuGroup number='1'>
      <address domain='0x0002' bus='0x01' slot='0x00' function='0x0'/>
      <address domain='0x0002' bus='0x01' slot='0x00' function='0x1'/>
      <address domain='0x0002' bus='0x01' slot='0x00' function='0x2'/>
      <address domain='0x0002' bus='0x01' slot='0x00' function='0x3'/>
      <address domain='0x0002' bus='0x01' slot='0x00' function='0x4'/>
      <address domain='0x0002' bus='0x01' slot='0x00' function='0x5'/>
    </iommuGroup>
    <numa node='0'/>
    <pci-express>
      <link validity='cap' port='0' speed='8' width='8'/>
      <link validity='sta' speed='8' width='8'/>
    </pci-express>
  </capability>
</device>

Comment 5 Andrea Bolognani 2015-09-21 10:33:54 UTC
Thanks.

Is any of the ports assigned to the host?
Or is it using a different Ethernet card altogether?

Comment 6 Dan Zheng 2015-09-23 10:30:51 UTC
(In reply to Andrea Bolognani from comment #5)
> Thanks.
> 
> Is any of the ports assigned to the host?
> Or is it using a different Ethernet card altogether?

Andrea,
Before starting the guest, I have nodedev-detach all the pci devices in this iommugroup from the host. then start the guest. After that , detach one of them.

And on that host, there are two Ethernet cards. I used one of them. But for those 6 pci devices in that iommuGroup, they are from one card. 

Did above answer your questions?

# lspci
...
0002:01:00.0 Ethernet controller: Emulex Corporation OneConnect NIC (Lancer) (rev 10)
0002:01:00.1 Ethernet controller: Emulex Corporation OneConnect NIC (Lancer) (rev 10)
0002:01:00.2 Ethernet controller: Emulex Corporation OneConnect NIC (Lancer) (rev 10)
0002:01:00.3 Ethernet controller: Emulex Corporation OneConnect NIC (Lancer) (rev 10)
0002:01:00.4 Fibre Channel: Emulex Corporation OneConnect FCoE Initiator (Lancer) (rev 10)
0002:01:00.5 Fibre Channel: Emulex Corporation OneConnect FCoE Initiator (Lancer) (rev 10)

...
0003:09:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
0003:09:00.1 Ethernet controller: Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
0003:09:00.2 Ethernet controller: Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
0003:09:00.3 Ethernet controller: Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)

*************************************************************
Today I did test again. But same error happened again.
Host is installed with the snapshot 2 tree RHEL-7.2-20150917.0 Server.
libvirt-1.2.17-9.el7.ppc64le
qemu-kvm-rhev-2.3.0-23.el7.ppc64le (replace qemu-kvm-rhev-2.3.0-24.el7.ppc64le due to bug 1264845)
kernel-3.10.0-316.el7.ppc64le
Guest: kernel-3.10.0-316.el7.ppc64le


Host only has one Ethernet card.
# lspci
...
0003:03:00.0 USB controller: Texas Instruments TUSB73x0 SuperSpeed USB 3.0 xHCI Host Controller (rev 02)
0003:09:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
0003:09:00.1 Ethernet controller: Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
0003:09:00.2 Ethernet controller: Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
0003:09:00.3 Ethernet controller: Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)



<device>
  <name>pci_0003_09_00_0</name>
  <path>/sys/devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:09.0/0003:09:00.0</path>
  <parent>pci_0003_02_09_0</parent>
  <driver>
    <name>vfio-pci</name>
  </driver>
  <capability type='pci'>
    <domain>3</domain>
    <bus>9</bus>
    <slot>0</slot>
    <function>0</function>
    <product id='0x1657'>NetXtreme BCM5719 Gigabit Ethernet PCIe</product>
    <vendor id='0x14e4'>Broadcom Corporation</vendor>
    <iommuGroup number='1'>
      <address domain='0x0003' bus='0x09' slot='0x00' function='0x0'/>
      <address domain='0x0003' bus='0x09' slot='0x00' function='0x1'/>
      <address domain='0x0003' bus='0x09' slot='0x00' function='0x2'/>
      <address domain='0x0003' bus='0x09' slot='0x00' function='0x3'/>
    </iommuGroup>
    <numa node='0'/>
    <pci-express>
      <link validity='cap' port='0' speed='2.5' width='4'/>
      <link validity='sta' speed='2.5' width='4'/>
    </pci-express>
  </capability>
</device>

Start the guest with below four pci devices which will all be detached from host automatically as managed=yes.

    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0003' bus='0x09' slot='0x00' function='0x0'/>
      </source>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0003' bus='0x09' slot='0x00' function='0x1'/>
      </source>
    </hostdev>
<hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0003' bus='0x09' slot='0x00' function='0x2'/>
      </source>
    </hostdev>
<hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0003' bus='0x09' slot='0x00' function='0x3'/>
      </source>
    </hostdev>

Guest is running. Log in guest.
# lspci
00:06.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
00:07.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
00:08.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
00:09.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)


On host, detach pci 03:09:00.02 successfully.
<hostdev mode='subsystem' type='pci' managed='yes'>
  <source>
     <address domain='0x0003' bus='0x09' slot='0x00' function='0x2'/>
  </source>
</hostdev>

Check dumpxml, the xml is updated already to remove this pci device. But guest is paused with same error messages as before.
And I also got below error on host.
# lspci
pcilib: Cannot open /sys/bus/pci/devices/0003:09:00.3/config
lspci: Unable to read the standard configuration space header of device 0003:09:00.3
...
0003:09:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
0003:09:00.1 Ethernet controller: Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
0003:09:00.2 Ethernet controller: Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
----Below missing---
0003:09:00.3 ...

Comment 7 Andrea Bolognani 2015-09-24 09:44:15 UTC
Yes, that's exactly the information I was looking for.

I just wanted to make sure that there's no obvious reason
why the setup you're using wouldn't work, and that doesn't
seem to be the case.

I'm now confident the issues you're facing will go away as
soon as Bug 1259556 has been fixed.

Thanks for your help.

Comment 9 RHEL Program Management 2015-10-05 11:36:44 UTC
Product Management has reviewed and declined this request.
You may appeal this decision by reopening this request.

Comment 11 Dan Zheng 2015-10-13 02:38:06 UTC
Test with packages below:
libvirt-1.2.17-13.el7.ppc64le
qemu-kvm-rhev-2.3.0-29.el7.ppc64le
kernel-3.10.0-322.el7.ppc64le
Guest kernel: kernel-3.10.0-322.el7.ppc64le


1.Detach a device pci_0003_09_00_0 from the host.
# virsh nodedev-dumpxml pci_0003_09_00_0
<device>
  <name>pci_0003_09_00_0</name>
  <path>/sys/devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:09.0/0003:09:00.0</path>
  <parent>pci_0003_02_09_0</parent>
  <driver>
    <name>tg3</name>
  </driver>
  <capability type='pci'>
    <domain>3</domain>
    <bus>9</bus>
    <slot>0</slot>
    <function>0</function>
    <product id='0x1657'>NetXtreme BCM5719 Gigabit Ethernet PCIe</product>
    <vendor id='0x14e4'>Broadcom Corporation</vendor>
    <iommuGroup number='1'>
      <address domain='0x0003' bus='0x09' slot='0x00' function='0x0'/>
      <address domain='0x0003' bus='0x09' slot='0x00' function='0x1'/>
      <address domain='0x0003' bus='0x09' slot='0x00' function='0x2'/>
      <address domain='0x0003' bus='0x09' slot='0x00' function='0x3'/>
    </iommuGroup>
    <numa node='0'/>
    <pci-express>
      <link validity='cap' port='0' speed='2.5' width='4'/>
      <link validity='sta' speed='2.5' width='4'/>
    </pci-express>
  </capability>
</device>
# virsh nodedev-detach pci_0003_09_00_0
...Successful.
# virsh nodedev-reset pci_0003_09_00_0
...Successful.

2. Start the guest with 3 Host PCI devices. And the guest is running. 

     <hostdev mode='subsystem' type='pci' managed='yes'>
       <driver name='vfio'/>
       <source>
         <address domain='0x0003' bus='0x09' slot='0x00' function='0x1'/>
       </source>
     </hostdev>
     <hostdev mode='subsystem' type='pci' managed='yes'>
       <driver name='vfio'/>
       <source>
         <address domain='0x0003' bus='0x09' slot='0x00' function='0x2'/>
       </source>
     </hostdev>
     <hostdev mode='subsystem' type='pci' managed='yes'>
       <driver name='vfio'/>
       <source>
         <address domain='0x0003' bus='0x09' slot='0x00' function='0x3'/>
       </source>
     </hostdev>
3. Check the PCI devices are displayed in the guest and Yes.
4. Detach/attach a PCI device from/to the guest.
unplug.xml:
     <hostdev mode='subsystem' type='pci' managed='yes'>
       <driver name='vfio'/>
       <source>
         <address domain='0x0003' bus='0x09' slot='0x00' function='0x1'/>
       </source>
     </hostdev>
# virsh detach-device virt-tests-vm1 unplug.xml ---> using pci_0003_09_00_1
Successful.
# virsh attach-device virt-tests-vm1 unplug.xml ---> using pci_0003_09_00_0
Successful.
5. Check dumpxml of the guest and it does get updated.
6. Check the lspci within the guest and it does get updated.
7. Repeat step 4 - 6 to use other pci devices in same iommu group, like pci_0003_09_00_3, pci_0003_09_00_2, and it works as expected, except the unexpected guest crashing and rebooting which is tracked by bug 1270636.

The guest's getting paused issue disappears.


So I mark this as verified as the original issue does not happen any more.

Comment 13 errata-xmlrpc 2015-11-19 06:54:17 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-2202.html


Note You need to log in before you can comment on or make changes to this bug.