Note: This bug is displayed in read-only format because
the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
DescriptionMaxime Coquelin
2018-08-16 18:49:40 UTC
Description of problem:
Vhost-user backend may crash when receiving VHOST_USER_SET_MEM_TABLE request
while it is processing the virtqueues.
Such request can be issued by QEMU when binding a VF to testpmd within the
guest, as shown by below QEMU backtrace:
#0 vhost_user_set_mem_table (dev=0x55ba6cea5c00, mem=0x55ba6d391900)
at /usr/src/debug/qemu-2.10.0/hw/virtio/vhost-user.c:294
#1 0x000055ba69962bfc in vhost_commit (listener=0x55ba6cea5c08) at /usr/src/debug/qemu-2.10.0/hw/virtio/vhost.c:653
#2 0x000055ba6991c5ae in memory_region_transaction_commit () at /usr/src/debug/qemu-2.10.0/memory.c:1063
#3 0x000055ba69a92c86 in pci_update_mappings (d=0x55ba6d1aec00) at hw/pci/pci.c:1303
#4 0x000055ba69a93219 in pci_default_write_config (d=d@entry=0x55ba6d1aec00, addr=4, val_in=val_in@entry=1024,
l=l@entry=2) at hw/pci/pci.c:1363
#5 0x000055ba69956aec in vfio_pci_write_config (pdev=0x55ba6d1aec00, addr=<optimized out>, val=1024, len=2)
at /usr/src/debug/qemu-2.10.0/hw/vfio/pci.c:1218
#6 0x000055ba6991aec3 in memory_region_write_accessor (mr=<optimized out>, addr=<optimized out>,
value=<optimized out>, size=<optimized out>, shift=<optimized out>, mask=<optimized out>, attrs=...)
at /usr/src/debug/qemu-2.10.0/memory.c:530
#7 0x000055ba69918bd9 in access_with_adjusted_size (addr=addr@entry=0, value=value@entry=0x7f9ecbd39708,
size=size@entry=2, access_size_min=<optimized out>, access_size_max=<optimized out>,
access=access@entry=0x55ba6991ae80 <memory_region_write_accessor>, mr=mr@entry=0x55ba6bfaa400,
attrs=attrs@entry=...) at /usr/src/debug/qemu-2.10.0/memory.c:596
#8 0x000055ba6991cb25 in memory_region_dispatch_write (mr=<optimized out>, addr=0, data=1024, size=<optimized out>,
attrs=...) at /usr/src/debug/qemu-2.10.0/memory.c:1472
#9 0x000055ba698d4312 in flatview_write (fv=0x55ba6c664c80, addr=<optimized out>, attrs=..., buf=<optimized out>,
len=<optimized out>) at /usr/src/debug/qemu-2.10.0/exec.c:2908
#10 0x000055ba698d790f in address_space_write (as=<optimized out>, addr=<optimized out>, attrs=...,
buf=<optimized out>, len=<optimized out>) at /usr/src/debug/qemu-2.10.0/exec.c:3074
#11 0x000055ba698d79a5 in address_space_rw (as=<optimized out>, addr=addr@entry=3324, attrs=..., attrs@entry=...,
buf=buf@entry=0x7f9ef0954000 "", len=len@entry=2, is_write=is_write@entry=true)
at /usr/src/debug/qemu-2.10.0/exec.c:3085
#12 0x000055ba6992aaaa in kvm_handle_io (count=1, size=2, direction=<optimized out>, data=<optimized out>,
attrs=..., port=3324) at /usr/src/debug/qemu-2.10.0/accel/kvm/kvm-all.c:1807
#13 kvm_cpu_exec (cpu=cpu@entry=0x55ba6c014000) at /usr/src/debug/qemu-2.10.0/accel/kvm/kvm-all.c:2047
#14 0x000055ba69909b52 in qemu_kvm_cpu_thread_fn (arg=0x55ba6c014000) at /usr/src/debug/qemu-2.10.0/cpus.c:1138
#15 0x00007f9ed7fdddd5 in start_thread (arg=0x7f9ecbd3c700) at pthread_create.c:308
#16 0x00007f9ed7d07b3d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
When it happens, if OVS is processing the Virtio rings, below crash is likely to happen:
[root@overcloud-compute-0 openvswitch]# gdb --args ovs-vswitchd unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --mlockall --no-chdir --log-file=/var/log/openvswitch/ovs-vswitchd.log --pidfile=/var/run/openvswitch/ovs-vswitchd.pid
…/...
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffea8ff9700 (LWP 317423)]
rte_vhost_dequeue_burst (vid=<optimized out>, queue_id=<optimized out>, mbuf_pool=0x7ff27fe23d40, pkts=pkts@entry=0x7ffea8ff8770, count=count@entry=32)
at /usr/src/debug/openvswitch-2.9.0/dpdk-17.11/lib/librte_vhost/virtio_net.c:1567
1567 free_entries = *((volatile uint16_t *)&vq->avail->idx) -
(gdb) bt
#0 rte_vhost_dequeue_burst (vid=<optimized out>, queue_id=<optimized out>, mbuf_pool=0x7ff27fe23d40, pkts=pkts@entry=0x7ffea8ff8770,
count=count@entry=32) at /usr/src/debug/openvswitch-2.9.0/dpdk-17.11/lib/librte_vhost/virtio_net.c:1567
#1 0x00005555558542e4 in netdev_dpdk_vhost_rxq_recv (rxq=<optimized out>, batch=0x7ffea8ff8760) at lib/netdev-dpdk.c:1849
#2 0x00005555557a0671 in netdev_rxq_recv (rx=<optimized out>, batch=batch@entry=0x7ffea8ff8760) at lib/netdev.c:701
#3 0x0000555555779c1f in dp_netdev_process_rxq_port (pmd=pmd@entry=0x7fff88358010, rxq=0x5555560a8810, port_no=7) at lib/dpif-netdev.c:3279
#4 0x000055555577a02a in pmd_thread_main (f_=<optimized out>) at lib/dpif-netdev.c:4145
#5 0x00005555557f6cb6 in ovsthread_wrapper (aux_=<optimized out>) at lib/ovs-thread.c:348
#6 0x00007ffff70eadd5 in start_thread (arg=0x7ffea8ff9700) at pthread_create.c:308
#7 0x00007ffff64e8b3d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
(gdb) l
1562 } else {
1563 count -= 1;
1564 }
1565 }
1566
1567 free_entries = *((volatile uint16_t *)&vq->avail->idx) -
1568 vq->last_avail_idx;
1569 if (free_entries == 0)
1570 goto out;
Version-Release number of selected component (if applicable):
How reproducible:
Create a guest with a Virtio-net device reyling on vhost-user backend,
and with an assigned VF. Connect the vhost-user port to OVS.
In the guest, bind the VF to testpmd, the virtio-net remains binded to
the Kernel virtio-net driver.
Steps to Reproduce:
1.
2.
3.
Actual results:
OVS crash with above backtrace
Expected results:
No crash
Additional info:
Patch fixing the issue is already in upstream master, but wasn't
backported becasue it was only known to fix a new feature at that
time.
Patch backported to v17.11.4-rc1:
commit 96935c61631fe2095246b5dce5c6fea960e34c87
Author: Maxime Coquelin <maxime.coquelin>
Date: Thu Aug 16 19:29:22 2018 +0200
vhost: retranslate vring addr when memory table changes
[ backported from upstream commit d5022533c20aed365d513663806a999459037015 ]
When the vhost-user master sends memory updates using
VHOST_USER_SET_MEM request, the user backends unmap and then
mmap again the memory regions in its address space.
If the ring addresses have already been translated, it needs to
be translated again as they point to unmapped memory.
Signed-off-by: Maxime Coquelin <maxime.coquelin>
Patch backported downstream and pushed to private-mcoqueli-bz1618488 branch.
Comment 13Jean-Tsung Hsiao
2018-10-17 14:46:19 UTC
First of all, can't reproduce the issue with OVS 2.9.0-55 fdP with NIC=X710.
But, the fix passed netperf testing between guests over OVS-dpdk/{vxlan,geneve}/i40e(X710) tunnel. So, there is no regression with this fix.
When debugging the issue, we saw that binding the VF to DPDK
cause a guest memory change, which results in QEMU to send
a memory update notification to the vhost-user backend.
In the vhost-user backend, the memory table update notification
results in all guest memory regions to be unmapped, and then
remapped. The crash happens because the virtqueues pointers
weren't re-translated (and so point to the unmmaped memory
area), causing invalid pointer dereferencing.
OVS logs when Jean tried to reproduce the issue show that
the issue does not reproduce because the guest memory regions
get remapped at the exact same virtual addresses.
I think that we can consider the fix as tested, because I
tested it on Franck setup before posting it upstream.
Comment 15Jean-Tsung Hsiao
2018-10-17 23:29:36 UTC
Per comment #13 and #14 set the bug status to verified.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
https://access.redhat.com/errata/RHSA-2018:3500
Description of problem: Vhost-user backend may crash when receiving VHOST_USER_SET_MEM_TABLE request while it is processing the virtqueues. Such request can be issued by QEMU when binding a VF to testpmd within the guest, as shown by below QEMU backtrace: #0 vhost_user_set_mem_table (dev=0x55ba6cea5c00, mem=0x55ba6d391900) at /usr/src/debug/qemu-2.10.0/hw/virtio/vhost-user.c:294 #1 0x000055ba69962bfc in vhost_commit (listener=0x55ba6cea5c08) at /usr/src/debug/qemu-2.10.0/hw/virtio/vhost.c:653 #2 0x000055ba6991c5ae in memory_region_transaction_commit () at /usr/src/debug/qemu-2.10.0/memory.c:1063 #3 0x000055ba69a92c86 in pci_update_mappings (d=0x55ba6d1aec00) at hw/pci/pci.c:1303 #4 0x000055ba69a93219 in pci_default_write_config (d=d@entry=0x55ba6d1aec00, addr=4, val_in=val_in@entry=1024, l=l@entry=2) at hw/pci/pci.c:1363 #5 0x000055ba69956aec in vfio_pci_write_config (pdev=0x55ba6d1aec00, addr=<optimized out>, val=1024, len=2) at /usr/src/debug/qemu-2.10.0/hw/vfio/pci.c:1218 #6 0x000055ba6991aec3 in memory_region_write_accessor (mr=<optimized out>, addr=<optimized out>, value=<optimized out>, size=<optimized out>, shift=<optimized out>, mask=<optimized out>, attrs=...) at /usr/src/debug/qemu-2.10.0/memory.c:530 #7 0x000055ba69918bd9 in access_with_adjusted_size (addr=addr@entry=0, value=value@entry=0x7f9ecbd39708, size=size@entry=2, access_size_min=<optimized out>, access_size_max=<optimized out>, access=access@entry=0x55ba6991ae80 <memory_region_write_accessor>, mr=mr@entry=0x55ba6bfaa400, attrs=attrs@entry=...) at /usr/src/debug/qemu-2.10.0/memory.c:596 #8 0x000055ba6991cb25 in memory_region_dispatch_write (mr=<optimized out>, addr=0, data=1024, size=<optimized out>, attrs=...) at /usr/src/debug/qemu-2.10.0/memory.c:1472 #9 0x000055ba698d4312 in flatview_write (fv=0x55ba6c664c80, addr=<optimized out>, attrs=..., buf=<optimized out>, len=<optimized out>) at /usr/src/debug/qemu-2.10.0/exec.c:2908 #10 0x000055ba698d790f in address_space_write (as=<optimized out>, addr=<optimized out>, attrs=..., buf=<optimized out>, len=<optimized out>) at /usr/src/debug/qemu-2.10.0/exec.c:3074 #11 0x000055ba698d79a5 in address_space_rw (as=<optimized out>, addr=addr@entry=3324, attrs=..., attrs@entry=..., buf=buf@entry=0x7f9ef0954000 "", len=len@entry=2, is_write=is_write@entry=true) at /usr/src/debug/qemu-2.10.0/exec.c:3085 #12 0x000055ba6992aaaa in kvm_handle_io (count=1, size=2, direction=<optimized out>, data=<optimized out>, attrs=..., port=3324) at /usr/src/debug/qemu-2.10.0/accel/kvm/kvm-all.c:1807 #13 kvm_cpu_exec (cpu=cpu@entry=0x55ba6c014000) at /usr/src/debug/qemu-2.10.0/accel/kvm/kvm-all.c:2047 #14 0x000055ba69909b52 in qemu_kvm_cpu_thread_fn (arg=0x55ba6c014000) at /usr/src/debug/qemu-2.10.0/cpus.c:1138 #15 0x00007f9ed7fdddd5 in start_thread (arg=0x7f9ecbd3c700) at pthread_create.c:308 #16 0x00007f9ed7d07b3d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113 When it happens, if OVS is processing the Virtio rings, below crash is likely to happen: [root@overcloud-compute-0 openvswitch]# gdb --args ovs-vswitchd unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --mlockall --no-chdir --log-file=/var/log/openvswitch/ovs-vswitchd.log --pidfile=/var/run/openvswitch/ovs-vswitchd.pid …/... Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7ffea8ff9700 (LWP 317423)] rte_vhost_dequeue_burst (vid=<optimized out>, queue_id=<optimized out>, mbuf_pool=0x7ff27fe23d40, pkts=pkts@entry=0x7ffea8ff8770, count=count@entry=32) at /usr/src/debug/openvswitch-2.9.0/dpdk-17.11/lib/librte_vhost/virtio_net.c:1567 1567 free_entries = *((volatile uint16_t *)&vq->avail->idx) - (gdb) bt #0 rte_vhost_dequeue_burst (vid=<optimized out>, queue_id=<optimized out>, mbuf_pool=0x7ff27fe23d40, pkts=pkts@entry=0x7ffea8ff8770, count=count@entry=32) at /usr/src/debug/openvswitch-2.9.0/dpdk-17.11/lib/librte_vhost/virtio_net.c:1567 #1 0x00005555558542e4 in netdev_dpdk_vhost_rxq_recv (rxq=<optimized out>, batch=0x7ffea8ff8760) at lib/netdev-dpdk.c:1849 #2 0x00005555557a0671 in netdev_rxq_recv (rx=<optimized out>, batch=batch@entry=0x7ffea8ff8760) at lib/netdev.c:701 #3 0x0000555555779c1f in dp_netdev_process_rxq_port (pmd=pmd@entry=0x7fff88358010, rxq=0x5555560a8810, port_no=7) at lib/dpif-netdev.c:3279 #4 0x000055555577a02a in pmd_thread_main (f_=<optimized out>) at lib/dpif-netdev.c:4145 #5 0x00005555557f6cb6 in ovsthread_wrapper (aux_=<optimized out>) at lib/ovs-thread.c:348 #6 0x00007ffff70eadd5 in start_thread (arg=0x7ffea8ff9700) at pthread_create.c:308 #7 0x00007ffff64e8b3d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113 (gdb) l 1562 } else { 1563 count -= 1; 1564 } 1565 } 1566 1567 free_entries = *((volatile uint16_t *)&vq->avail->idx) - 1568 vq->last_avail_idx; 1569 if (free_entries == 0) 1570 goto out; Version-Release number of selected component (if applicable): How reproducible: Create a guest with a Virtio-net device reyling on vhost-user backend, and with an assigned VF. Connect the vhost-user port to OVS. In the guest, bind the VF to testpmd, the virtio-net remains binded to the Kernel virtio-net driver. Steps to Reproduce: 1. 2. 3. Actual results: OVS crash with above backtrace Expected results: No crash Additional info: Patch fixing the issue is already in upstream master, but wasn't backported becasue it was only known to fix a new feature at that time.