RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1480446 - vhost-user/iommu: crash when backend disconnects [rhel-7.4.z]
Summary: vhost-user/iommu: crash when backend disconnects [rhel-7.4.z]
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: qemu-kvm-rhev
Version: 7.4
Hardware: Unspecified
OS: Unspecified
high
medium
Target Milestone: rc
: ---
Assignee: Maxime Coquelin
QA Contact: Pei Zhang
URL:
Whiteboard:
Depends On: 1468260
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-08-11 06:40 UTC by Jaroslav Reznik
Modified: 2017-10-16 10:45 UTC (History)
12 users (show)

Fixed In Version: qemu-kvm-rhev-2.9.0-16.el7_4.6
Doc Type: Bug Fix
Doc Text:
Previously, the qemu-kvm service in some cases terminated unexpectedly when starting the Input/Output Memory Management Unit (IOMMU) feature. This update ensures that all active conections are released in the proper order when starting IOMMU. As a result, the back end no longer attempts to handle requests after the connections are released, which prevents the problem from occurring.
Clone Of: 1468260
Environment:
Last Closed: 2017-10-16 10:45:10 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2017:2891 0 normal SHIPPED_LIVE qemu-kvm-rhev bug fix update 2017-10-16 14:44:54 UTC

Description Jaroslav Reznik 2017-08-11 06:40:59 UTC
This bug has been copied from bug #1468260 and has been proposed to be backported to 7.4 z-stream (EUS).

Comment 6 Maxime Coquelin 2017-09-01 07:11:44 UTC
Two patches bacported to rhvirt-patches list:


commit b9ec9bd468b2c5b218d16642e8f8ea4df60418bb
Author: Maxime Coquelin <maxime.coquelin>
Date:   Fri Jun 30 18:04:22 2017 +0200

    vhost-user: unregister slave req handler at cleanup time
    
    If the backend sends a request just before closing the socket,
    the aio dispatcher might schedule its reading after the vhost
    device has been cleaned, leading to a NULL pointer dereference
    in slave_read();
    
    vhost_user_cleanup() already closes the socket but it is not
    enough, the handler has to be unregistered.
    
    Signed-off-by: Maxime Coquelin <maxime.coquelin>
    Reviewed-by: Marc-André Lureau <marcandre.lureau>
    Reviewed-by: Michael S. Tsirkin <mst>
    Signed-off-by: Michael S. Tsirkin <mst>

commit 384b557da1a44ce260cd0328c06a250507348f73
Author: Maxime Coquelin <maxime.coquelin>
Date:   Fri Jun 30 18:04:21 2017 +0200

    vhost: ensure vhost_ops are set before calling iotlb callback
    
    This patch fixes a crash that happens when vhost-user iommu
    support is enabled and vhost-user socket is closed.
    
    When it happens, if an IOTLB invalidation notification is sent
    by the IOMMU, vhost_ops's NULL pointer is dereferenced.
    
    Signed-off-by: Maxime Coquelin <maxime.coquelin>
    Reviewed-by: Marc-André Lureau <marcandre.lureau>
    Reviewed-by: Michael S. Tsirkin <mst>
    Signed-off-by: Michael S. Tsirkin <mst>

Comment 7 Miroslav Rezanina 2017-09-05 06:13:52 UTC
Fix included in qemu-kvm-rhev-2.9.0-16.el7_4.6

Comment 9 Pei Zhang 2017-09-28 04:31:36 UTC
Hi Maxime,

I'm verifying this bug. However, I tried latest dpdk-17.08-1.el8+4.x86_64, but seems dpdk vhost-user backend still doesn't support viommu as [1]. 

Could you please share which dpdk version we should use for verifying this bug? Thanks.

[1]
Versions:
qemu-kvm-rhev-2.9.0-16.el7_4.8.x86_64
dpdk-17.08-1.el8+4.x86_64
3.10.0-713.el7.x86_64

Testing steps:
1. Boot testpmd as vhost-user client 
# testpmd -l 19,17,15 --socket-mem=1024,1024 -n 4 \
--vdev 'net_vhost0,iface=/tmp/vhost-user1,client=1' -- \
--portmask=3 --disable-hw-vlan -i --rxq=1 --txq=1 \
--nb-cores=2 --forward-mode=io

2. Boot VM as vhost-user server
# /usr/libexec/qemu-kvm -name rhel7.4 -M q35,kernel-irqchip=split \
-device intel-iommu,device-iotlb=on,intremap \
-cpu host -m 8G \
-object memory-backend-file,id=mem,size=8G,mem-path=/dev/hugepages,share=on \
-numa node,memdev=mem -mem-prealloc \
-smp 4,sockets=1,cores=4,threads=1 \
-device pcie-root-port,id=root.1,slot=1 \
-device pcie-root-port,id=root.2,slot=2 \
-device pcie-root-port,id=root.3,slot=3 \
-chardev socket,id=char0,path=/tmp/vhost-user1,server \
-device virtio-net-pci,netdev=mynet1,mac=54:52:00:1a:2c:01,iommu_platform=on,ats=on \
-netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce \
-drive file=/home/images_nfv-virt-rt-kvm/rhel7.4_nonrt.qcow2,format=qcow2,if=none,id=drive-virtio-blk0,werror=stop,rerror=stop \
-device virtio-blk-pci,drive=drive-virtio-blk0,id=virtio-blk0,bus=root.1 \
-vnc :2 \
-monitor stdio \

3. Both testpmd and qemu repeatedly show below info:
testpmd> 
...
VHOST_CONFIG: new device, handle is 0
VHOST_CONFIG: read message VHOST_USER_GET_FEATURES
VHOST_CONFIG: read message VHOST_USER_GET_PROTOCOL_FEATURES
VHOST_CONFIG: read message VHOST_USER_SET_PROTOCOL_FEATURES
VHOST_CONFIG: read message VHOST_USER_GET_QUEUE_NUM
VHOST_CONFIG: read message VHOST_USER_SET_OWNER
VHOST_CONFIG: read message VHOST_USER_GET_FEATURES
VHOST_CONFIG: read message VHOST_USER_SET_VRING_CALL
VHOST_CONFIG: vring call idx:0 file:25
VHOST_CONFIG: read message VHOST_USER_SET_VRING_CALL
VHOST_CONFIG: vring call idx:1 file:26
VHOST_CONFIG: recvmsg failed
VHOST_CONFIG: vhost peer closed
VHOST_CONFIG: vhost-user client: socket created, fd: 23


(qemu)
...
vhost lacks feature mask 8589934592 for backend
qemu-kvm: failed to init vhost_net for queue 0


Best Regards,
Pei

Comment 10 Maxime Coquelin 2017-09-29 12:13:35 UTC
(In reply to Pei Zhang from comment #9)
> Hi Maxime,
> 
> I'm verifying this bug. However, I tried latest dpdk-17.08-1.el8+4.x86_64,
> but seems dpdk vhost-user backend still doesn't support viommu as [1]. 
> 
> Could you please share which dpdk version we should use for verifying this
> bug? Thanks.

Hi Pei,

This feature is not yet in dpdk's upstream repository.
You can however try with latest version of my series I made availabe on Gitlab:

repo: https://gitlab.com/mcoquelin/dpdk-next-virtio.git
branch: https://gitlab.com/mcoquelin/dpdk-next-virtio/tree/vhost_iotlb_v2


> [1]
> Versions:
> qemu-kvm-rhev-2.9.0-16.el7_4.8.x86_64
> dpdk-17.08-1.el8+4.x86_64
> 3.10.0-713.el7.x86_64
> 
> Testing steps:
> 1. Boot testpmd as vhost-user client 
> # testpmd -l 19,17,15 --socket-mem=1024,1024 -n 4 \
> --vdev 'net_vhost0,iface=/tmp/vhost-user1,client=1' -- \
> --portmask=3 --disable-hw-vlan -i --rxq=1 --txq=1 \
> --nb-cores=2 --forward-mode=io
> 
> 2. Boot VM as vhost-user server
> # /usr/libexec/qemu-kvm -name rhel7.4 -M q35,kernel-irqchip=split \
> -device intel-iommu,device-iotlb=on,intremap \
> -cpu host -m 8G \
> -object memory-backend-file,id=mem,size=8G,mem-path=/dev/hugepages,share=on \
> -numa node,memdev=mem -mem-prealloc \
> -smp 4,sockets=1,cores=4,threads=1 \
> -device pcie-root-port,id=root.1,slot=1 \
> -device pcie-root-port,id=root.2,slot=2 \
> -device pcie-root-port,id=root.3,slot=3 \
> -chardev socket,id=char0,path=/tmp/vhost-user1,server \
> -device
> virtio-net-pci,netdev=mynet1,mac=54:52:00:1a:2c:01,iommu_platform=on,ats=on \
> -netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce \
> -drive
> file=/home/images_nfv-virt-rt-kvm/rhel7.4_nonrt.qcow2,format=qcow2,if=none,
> id=drive-virtio-blk0,werror=stop,rerror=stop \
> -device virtio-blk-pci,drive=drive-virtio-blk0,id=virtio-blk0,bus=root.1 \
> -vnc :2 \
> -monitor stdio \


Your Qemu command line looks good, it should work with the branch I provided you above.

Thanks,
Maxime

Comment 11 Pei Zhang 2017-09-30 09:35:27 UTC
(In reply to Maxime Coquelin from comment #10)
[...]
> 
> Hi Pei,
> 
> This feature is not yet in dpdk's upstream repository.
> You can however try with latest version of my series I made availabe on
> Gitlab:
> 
> repo: https://gitlab.com/mcoquelin/dpdk-next-virtio.git
> branch: https://gitlab.com/mcoquelin/dpdk-next-virtio/tree/vhost_iotlb_v2

Thanks Maxime.

I hit a regression bug on qemu z build, but it was not caused patches of this bug. This regression bug block me verifying this bug.

This regression issue is: guest and qemu will become hang when start the network using kernel driver in guest. This problem starts from qemu-kvm-rhev-2.9.0-16.el7_4.5.x86_64.

Here is the details:

Versions:
qemu-kvm-rhev-2.9.0-16.el7_4.5.x86_64
3.10.0-693.5.1.el7.x86_64
seabios-bin-1.10.2-3.el7_4.1.noarch

1. Compile dpdk from above repo and branch

2. Boot testpmd, then set porlist and start
# /root/test/dpdk-next-virtio/x86_64-native-linuxapp-gcc/build/app/test-pmd/testpmd \
-l 19,17,15 --socket-mem=1024,1024 -n 4 \
-d /root/test/dpdk-next-virtio/x86_64-native-linuxapp-gcc/lib/librte_pmd_vhost.so \
--vdev 'net_vhost0,iface=/tmp/vhost-user1' -- \
--portmask=3 --disable-hw-vlan -i --rxq=1 --txq=1 \
--nb-cores=2 --forward-mode=io

testpmd> set portlist 0,1
testpmd> start 

3. Boot qemu
/usr/libexec/qemu-kvm -name rhel7.4 -M q35,kernel-irqchip=split \
-device intel-iommu,device-iotlb=on,intremap \
-cpu host -m 8G \
-object memory-backend-file,id=mem,size=8G,mem-path=/dev/hugepages,share=on \
-numa node,memdev=mem -mem-prealloc \
-smp 4,sockets=1,cores=4,threads=1 \
-device pcie-root-port,id=root.1,slot=1 \
-device pcie-root-port,id=root.2,slot=2 \
-device pcie-root-port,id=root.3,slot=3 \
-chardev socket,id=char0,path=/tmp/vhost-user1 \
-device virtio-net-pci,netdev=mynet1,mac=54:52:00:1a:2c:01,iommu_platform=on,ats=on,bus=root.1 \
-netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce \
-drive file=/home/images_nfv-virt-rt-kvm/rhel7.4_nonrt.qcow2,format=qcow2,if=none,id=drive-virtio-blk0,werror=stop,rerror=stop \
-device virtio-blk-pci,drive=drive-virtio-blk0,id=virtio-blk0,bus=root.2 \
-vnc :2 \
-monitor stdio \

4. After guest boot up, set network up, then guest and qemu will become hang.
# ifconfig eth0 up


For this regression issue:
qemu-kvm-rhev-2.9.0-16.el7_4.2.x86_64   work
qemu-kvm-rhev-2.9.0-16.el7_4.3.x86_64   work
qemu-kvm-rhev-2.9.0-16.el7_4.4.x86_64   work
qemu-kvm-rhev-2.9.0-16.el7_4.5.x86_64   fail
qemu-kvm-rhev-2.9.0-16.el7_4.8.x86_64   fail

So probably this regression issue was caused by fix of [1]
[1]Bug 1482856 - Unable to start vhost if iommu_platform=on but intel_iommu=on not specified in guest [rhel-7.4.z]


Best Regards,
Pei

Comment 15 Maxime Coquelin 2017-10-04 07:44:00 UTC
Hi Pei,

(In reply to Pei Zhang from comment #11)
> (In reply to Maxime Coquelin from comment #10)
> 
> I hit a regression bug on qemu z build, but it was not caused patches of
> this bug. This regression bug block me verifying this bug.
> 
> This regression issue is: guest and qemu will become hang when start the
> network using kernel driver in guest. This problem starts from
> qemu-kvm-rhev-2.9.0-16.el7_4.5.x86_64.
...
> For this regression issue:
> qemu-kvm-rhev-2.9.0-16.el7_4.2.x86_64   work
> qemu-kvm-rhev-2.9.0-16.el7_4.3.x86_64   work
> qemu-kvm-rhev-2.9.0-16.el7_4.4.x86_64   work
> qemu-kvm-rhev-2.9.0-16.el7_4.5.x86_64   fail
> qemu-kvm-rhev-2.9.0-16.el7_4.8.x86_64   fail
> 
> So probably this regression issue was caused by fix of [1]
> [1]Bug 1482856 - Unable to start vhost if iommu_platform=on but
> intel_iommu=on not specified in guest [rhel-7.4.z]

I do confirm the regression starting qemu-kvm-rhev-2.9.0-16.el7_4.4.x86_64,
which is caused by patch:

commit d5ba92b697f81189c20aa672581ca4aadf3b8302
Author: Peter Xu <peterx>
Date:   Mon Aug 21 08:52:14 2017 +0200

    exec: abstract address_space_do_translate()

With this patch, my vhost-user iommu setup is broken, I still need to understand the root cause, but reverting the patch fixes my setup.
I need to test again, but upstream containing this patch does not seem to be broken.

IIUC, China is off this week, so I'll debug it further and provide info in
https://bugzilla.redhat.com/show_bug.cgi?id=1482856

Cheers,
Maxime
> 
> Best Regards,
> Pei

Comment 16 Chao Yang 2017-10-04 08:27:29 UTC
(In reply to Maxime Coquelin from comment #15)
> Hi Pei,
> 
> (In reply to Pei Zhang from comment #11)
> > (In reply to Maxime Coquelin from comment #10)
> > 
> > I hit a regression bug on qemu z build, but it was not caused patches of
> > this bug. This regression bug block me verifying this bug.
> > 
> > This regression issue is: guest and qemu will become hang when start the
> > network using kernel driver in guest. This problem starts from
> > qemu-kvm-rhev-2.9.0-16.el7_4.5.x86_64.
> ...
> > For this regression issue:
> > qemu-kvm-rhev-2.9.0-16.el7_4.2.x86_64   work
> > qemu-kvm-rhev-2.9.0-16.el7_4.3.x86_64   work
> > qemu-kvm-rhev-2.9.0-16.el7_4.4.x86_64   work
> > qemu-kvm-rhev-2.9.0-16.el7_4.5.x86_64   fail
> > qemu-kvm-rhev-2.9.0-16.el7_4.8.x86_64   fail
> > 
> > So probably this regression issue was caused by fix of [1]
> > [1]Bug 1482856 - Unable to start vhost if iommu_platform=on but
> > intel_iommu=on not specified in guest [rhel-7.4.z]
> 
> I do confirm the regression starting qemu-kvm-rhev-2.9.0-16.el7_4.4.x86_64,
> which is caused by patch:
> 
> commit d5ba92b697f81189c20aa672581ca4aadf3b8302
> Author: Peter Xu <peterx>
> Date:   Mon Aug 21 08:52:14 2017 +0200
> 
>     exec: abstract address_space_do_translate()
> 
> With this patch, my vhost-user iommu setup is broken, I still need to
> understand the root cause, but reverting the patch fixes my setup.
> I need to test again, but upstream containing this patch does not seem to be
> broken.
> 

Thanks for your confirmation first.

> IIUC, China is off this week, so I'll debug it further and provide info in
> https://bugzilla.redhat.com/show_bug.cgi?id=1482856
> 

Shall I go ahead with patch review and close this bug? I think we could open a new bug(Pei said she would open it later) to fix the regression issue.

> Cheers,
> Maxime
> > 
> > Best Regards,
> > Pei

Comment 17 Maxime Coquelin 2017-10-04 08:35:46 UTC
(In reply to Chao Yang from comment #16)
> (In reply to Maxime Coquelin from comment #15)
> 
> > IIUC, China is off this week, so I'll debug it further and provide info in
> > https://bugzilla.redhat.com/show_bug.cgi?id=1482856
> > 
> 
> Shall I go ahead with patch review and close this bug? I think we could open
> a new bug(Pei said she would open it later) to fix the regression issue.

Yes, you can go ahead with patch review for this bug.

And let's open a new bug for the regression, I will do it this afternoon.

Thanks,
Maxime

> > Cheers,
> > Maxime
> > > 
> > > Best Regards,
> > > Pei

Comment 21 errata-xmlrpc 2017-10-16 10:45:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2891


Note You need to log in before you can comment on or make changes to this bug.