Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1608298

Summary: Hot plug vhost-user NICs fail after "# kill -9 $testpmd_process_id" in guest [upstream qemu]
Product: Red Hat Enterprise Linux 7 Reporter: Pei Zhang <pezhang>
Component: qemu-kvm-rhevAssignee: Virtualization Maintenance <virt-maint>
Status: CLOSED WONTFIX QA Contact: Pei Zhang <pezhang>
Severity: medium Docs Contact:
Priority: medium    
Version: 7.6CC: chayang, juzhang, ktraynor, maxime.coquelin, michen, siliu, virt-maint
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1608314 (view as bug list) Environment:
Last Closed: 2018-07-25 09:58:51 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1608314    
Attachments:
Description Flags
Guest XML none

Description Pei Zhang 2018-07-25 09:28:29 UTC
Created attachment 1470478 [details]
Guest XML

Description of problem:
After stop the testpmd process by "# kill -9", the vhost-user network cards can hot unplugged, but next hot plug will fail. Seems PCI address is not cleaned very well.


Version-Release number of selected component (if applicable):

qemu-kvm version:

Repo: git://git.qemu.org/qemu.git, Branch: master

# git log -1
commit 3bae150448dbd888a480f892ebbf01caec0d8329
Merge: 0a7052b 042b757
Author: Peter Maydell <peter.maydell>
Date:   Tue Jul 24 15:26:01 2018 +0100

    Merge remote-tracking branch 'remotes/stefanha/tags/block-pull-request' into staging
    
    Pull request
    
    Regression fix for host block devices with the file-posix driver when aio=native is in use.
    
    # gpg: Signature made Tue 24 Jul 2018 15:22:49 BST
    # gpg:                using RSA key 9CA4ABB381AB73C8
    # gpg: Good signature from "Stefan Hajnoczi <stefanha>"
    # gpg:                 aka "Stefan Hajnoczi <stefanha>"
    # Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35  775A 9CA4 ABB3 81AB 73C8
    
    * remotes/stefanha/tags/block-pull-request:
      block/file-posix: add bdrv_attach_aio_context callback for host dev and cdrom
    
    Signed-off-by: Peter Maydell <peter.maydell>

Other versions:
3.10.0-925.el7.x86_64
libvirt-4.5.0-4.el7.x86_64
tuned-2.9.0-1.el7.noarch
openvswitch-2.9.0-55.el7fdp.x86_64


How reproducible:
100%


Steps to Reproduce:
1. Boot ovs, refer to[1]

2. Boot guest without vhost-user NICs, full xml is attached.

3. Hot plug 2 vhost-user NICs, NIC xml refer to[3]

4. In guest, start testpmd, refer to[4]

5. In guest, kill testpmd

# ps aux | grep /usr/bin/testpmd
root      1669  5.6  0.8 3355384 70488 pts/0   SLl+ 17:07   0:00 /usr/bin/testpmd -l 1,2,3,4,5 -n 4 -d /usr/lib64/librte_pmd_virtio.so.1 -w 0000:00:06.0 -w 0000:00:07.0 -- --nb-cores=4 --disable-hw-vlan -i --disable-rss --rxq=2 --txq=2
...

# kill -9 1669

6. Detach these 2 vhost-user NICs, success. Refer to[6]

7. Then, attach these 2 vhost-user NICs, fail.

# virsh attach-device rhel7.6_nonrt nic1.xml
error: Failed to attach device from nic1.xml
error: XML error: Attempted double use of PCI Address 0000:00:06.0

# virsh attach-device rhel7.6_nonrt nic2.xml
error: Failed to attach device from nic2.xml
error: XML error: Attempted double use of PCI Address 0000:00:07.0


Actual results:
After user "# kill -9 $testpmd_process_id", next hot unplug the vhost-user network devices, then hot plug will fail.

Expected results:
Hot plug vhost-user network cards should work well, even when user's application (which is related to these network cards) is killed. 


Reference:
[1]
# ovs-vsctl show
3d7a1678-d24c-4fe2-9580-ecbdbaff4822
    Bridge "ovsbr1"
        Port "ovsbr1"
            Interface "ovsbr1"
                type: internal
        Port "vhost-user1"
            Interface "vhost-user1"
                type: dpdkvhostuserclient
                options: {vhost-server-path="/tmp/vhostuser1.sock"}
        Port "dpdk1"
            Interface "dpdk1"
                type: dpdk
                options: {dpdk-devargs="0000:04:00.1", n_rxq="2"}
    Bridge "ovsbr0"
        Port "vhost-user0"
            Interface "vhost-user0"
                type: dpdkvhostuserclient
                options: {vhost-server-path="/tmp/vhostuser0.sock"}
        Port "dpdk0"
            Interface "dpdk0"
                type: dpdk
                options: {dpdk-devargs="0000:04:00.0", n_rxq="2"}
        Port "ovsbr0"
            Interface "ovsbr0"
                type: internal


[3]
# cat nic1.xml 
<interface type="vhostuser">
  <mac address="18:66:da:5f:dd:02" />
  <source mode="server" path="/tmp/vhostuser0.sock" type="unix" />
  <model type="virtio" />
  <driver name="vhost" queues="2" rx_queue_size="512" />
  <address bus="0x00" domain="0x0000" function="0x0" slot="0x6" type="pci" />
</interface>

# cat nic2.xml 
<interface type="vhostuser">
  <mac address="18:66:da:5f:dd:03" />
  <source mode="server" path="/tmp/vhostuser1.sock" type="unix" />
  <model type="virtio" />
  <driver name="vhost" queues="2" rx_queue_size="512" />
  <address bus="0x00" domain="0x0000" function="0x0" slot="0x7" type="pci" />
</interface>


[4]
# echo 4 >  /sys/devices/system/node/node0/hugepages/hugepages-1048576kB/nr_hugepages

# modprobe vfio enable_unsafe_noiommu_mode=Y
# modprobe vfio-pci

# python dpdk-devbind.py --bind=vfio-pci 0000:00:06.0
# python dpdk-devbind.py --bind=vfio-pci 0000:00:07.0

# /usr/bin/testpmd \
-l 1,2,3,4,5 \
-n 4 \
-d /usr/lib64/librte_pmd_virtio.so.1 \
-w 0000:00:06.0 -w 0000:00:07.0 \
-- \
--nb-cores=4 \
--disable-hw-vlan \
-i \
--disable-rss \
--rxq=2 --txq=2


[6]
# virsh detach-device rhel7.6_nonrt nic1.xml
Device detached successfully

# virsh detach-device rhel7.6_nonrt nic2.xml
Device detached successfully


Additional info:

1. This is only upstream qemu-kvm issue. Downstream qemu-kvm-rhev(qemu-kvm-rhev-2.12.0-8.el7.x86_64) works very well.