This bug has been migrated to another issue tracking site. It has been closed here and may no longer be being monitored.

If you would like to get updates for this issue, or to participate in it, you may do so at Red Hat Issue Tracker .
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 2213416 - The domain with vhost-user interface + iommu throws "call trace" when running the netperf tests
Summary: The domain with vhost-user interface + iommu throws "call trace" when runni...
Keywords:
Status: CLOSED MIGRATED
Alias: None
Product: Red Hat Enterprise Linux 9
Classification: Red Hat
Component: qemu-kvm
Version: 9.3
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: rc
: ---
Assignee: Laurent Vivier
QA Contact: Yanghang Liu
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-06-08 06:04 UTC by Yanghang Liu
Modified: 2023-09-22 16:07 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-09-22 16:07:38 UTC
Type: ---
Target Upstream Version:
Embargoed:
pm-rhel: mirror+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker   RHEL-7307 0 None None None 2023-09-22 16:07:37 UTC
Red Hat Issue Tracker RHELPLAN-159251 0 None None None 2023-06-08 06:13:30 UTC

Description Yanghang Liu 2023-06-08 06:04:39 UTC
Description of problem:
The domain with vhost-user interface  + iommu  throws "call trace" when running the netperf tests

Version-Release number of selected component (if applicable):
qemu-kvm-8.0.0-4.el9.x86_64
5.14.0-323.el9.x86_64
dpdk-22.11-3.el9_2.x86_64
openvswitch3.1-3.1.0-28.el9fdp.x86_64

How reproducible:
100%


Steps to Reproduce:
1. setup the host kernel option, like CPU isolation,huge-page, iommu 
# grubby --args="iommu=pt intel_iommu=on default_hugepagesz=1G" --update-kernel=`grubby --default-kernel` 
# echo "isolated_cores=2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,31,29,27,25,23,21,19,17,15,13,11"  >> /etc/tuned/cpu-partitioning-variables.conf  
tuned-adm profile cpu-partitioning
# reboot

2. start a ovs-dpdk on the host

# echo 20 > /sys/devices/system/node/node0/hugepages/hugepages-1048576kB/nr_hugepages
# echo 20 > /sys/devices/system/node/node1/hugepages/hugepages-1048576kB/nr_hugepages
# modprobe vfio
# modprobe vfio-pci
# dpdk-devbind.py --bind=vfio-pci 0000:5e:00.0
# dpdk-devbind.py --bind=vfio-pci 0000:5e:00.1
...
# ovs-vsctl get Open_vSwitch . other_config
{dpdk-init="true", dpdk-lcore-mask="0x2", dpdk-socket-mem="1024,1024", pmd-cpu-mask="0x15554", vhost-iommu-support="true"}

# ovs-vsctl show 
1e271d29-308d-4201-be11-d898617cc592
    Bridge ovsbr0
        datapath_type: netdev
        Port ovsbr0
            Interface ovsbr0
                type: internal
        Port dpdk0
            Interface dpdk0
                type: dpdk
                options: {dpdk-devargs="0000:5e:00.0", n_rxq="2", n_txq="2"}
        Port vhost-user0
            Interface vhost-user0
                type: dpdkvhostuserclient
                options: {vhost-server-path="/tmp/vhostuser0.sock"}
    Bridge ovsbr1
        datapath_type: netdev
        Port ovsbr1
            Interface ovsbr1
                type: internal
        Port dpdk1
            Interface dpdk1
                type: dpdk
                options: {dpdk-devargs="0000:5e:00.1", n_rxq="2", n_txq="2"}
        Port vhost-user1
            Interface vhost-user1
                type: dpdkvhostuserclient
                options: {vhost-server-path="/tmp/vhostuser1.sock"}


3. start a nfv virt domain with iommu device and vhost-user interfaces

   <interface type='vhostuser'>
      <mac address='18:66:da:5f:dd:22'/>
      <source type='unix' path='/tmp/vhostuser0.sock' mode='server'/>
      <target dev='vhost-user0'/>
      <model type='virtio'/>
      <driver name='vhost' queues='2' rx_queue_size='1024' iommu='on' ats='on'/>
      <alias name='net1'/>
    </interface>
    
    <iommu model='intel'>
      <driver intremap='on' caching_mode='on' iotlb='on'/>
    </iommu>

4. setup the kernel option in the domain
# grubby --args="iommu=pt intel_iommu=on default_hugepagesz=1G" --update-kernel=`grubby --default-kernel` 	
# echo "isolated_cores=1,2,3,4,5"  >> /etc/tuned/cpu-partitioning-variables.conf 
# tuned-adm profile cpu-partitioning
# reboot

5. run the netperf tests between the domain clinet and host server
(5.1) The host is the netperf server
# ip addr add 192.168.1.3/24 dev ens3f1
# netserver 
Starting netserver with host 'IN(6)ADDR_ANY' port '12865' and family AF_UNSPEC

(5.2)The domain is the netperf client:
# ip addr add 192.168.1.2/24 dev enp6s0  <-- the domain can ping the 192.168.1.3 successfully but with some package lost
# netperf -H 192.168.1.3/24

6. check the domain dmesg 
# dmesg
[ 4802.234530] ------------[ cut here ]------------
[ 4802.234532] NETDEV WATCHDOG: enp6s0 (virtio_net): transmit queue 0 timed out
[ 4802.234549] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:525 dev_watchdog+0x1f9/0x200
[ 4802.236690] Modules linked in: intel_rapl_msr intel_rapl_common isst_if_common nfit libnvdimm kvm_intel kvm nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 bridge ip_set stp llc iTCO_wdt rfkill iTCO_vendor_support nf_tables irqbypass nfnetlink rapl virtio_balloon i2c_i801 i2c_smbus lpc_ich qrtr pcspkr vfat fat drm fuse xfs libcrc32c ahci libahci nvme_tcp nvme_fabrics nvme libata nvme_core nvme_common t10_pi crct10dif_pclmul crc32_pclmul crc32c_intel virtio_net ghash_clmulni_intel net_failover virtio_blk failover serio_raw sunrpc dm_mirror dm_region_hash dm_log dm_mod
[ 4802.243011] CPU: 0 PID: 0 Comm: swapper/0 Kdump: loaded Not tainted 5.14.0-323.el9.x86_64 #1
[ 4802.243900] Hardware name: Red Hat KVM/RHEL, BIOS edk2-20230301gitf80f052277c8-5.el9 03/01/2023
[ 4802.244809] RIP: 0010:dev_watchdog+0x1f9/0x200
[ 4802.245284] Code: 00 e9 40 ff ff ff 48 89 ef c6 05 03 af 7a 01 01 e8 3c c5 fa ff 44 89 e9 48 89 ee 48 c7 c7 a0 b1 6d 97 48 89 c2 e8 17 82 77 ff <0f> 0b e9 22 ff ff ff 0f 1f 44 00 00 55 53 48 89 fb 48 8b 6f 18 0f
[ 4802.247210] RSP: 0018:ffffb32980003eb0 EFLAGS: 00010286
[ 4802.247766] RAX: 0000000000000000 RBX: ffff99428b8ff488 RCX: 0000000000000027
[ 4802.248511] RDX: 0000000000000027 RSI: ffffffff97e67460 RDI: ffff994337c1f8c8
[ 4802.249262] RBP: ffff99428b8ff000 R08: ffff994337c1f8c0 R09: 0000000000000000
[ 4802.250016] R10: ffffffffffffffff R11: ffffffff98b6f070 R12: ffff99428b8ff3dc
[ 4802.250766] R13: 0000000000000000 R14: ffffffff96b7e5b0 R15: ffffb32980003f08
[ 4802.251516] FS:  0000000000000000(0000) GS:ffff994337c00000(0000) knlGS:0000000000000000
[ 4802.252364] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4802.252977] CR2: 00007ffe7a749000 CR3: 0000000101d54004 CR4: 0000000000770ef0
[ 4802.253732] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 4802.254479] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 4802.255230] PKRU: 55555554
[ 4802.255532] Call Trace:
[ 4802.255805]  <IRQ>
[ 4802.256031]  ? pfifo_fast_change_tx_queue_len+0x70/0x70
[ 4802.256586]  call_timer_fn+0x24/0x130
[ 4802.256986]  __run_timers.part.0+0x1ee/0x280
[ 4802.257444]  ? enqueue_hrtimer+0x2f/0x80
[ 4802.257870]  ? __hrtimer_run_queues+0x159/0x2c0
[ 4802.258358]  run_timer_softirq+0x26/0x50
[ 4802.258785]  __do_softirq+0xc7/0x2ac
[ 4802.259173]  __irq_exit_rcu+0xb9/0xf0
[ 4802.259573]  sysvec_apic_timer_interrupt+0x72/0x90
[ 4802.260084]  </IRQ>
[ 4802.260318]  <TASK>
[ 4802.260559]  asm_sysvec_apic_timer_interrupt+0x16/0x20
[ 4802.261103] RIP: 0010:default_idle+0x10/0x20
[ 4802.261571] Code: 8b 04 25 40 ef 01 00 f0 80 60 02 df c3 cc cc cc cc 0f ae 38 eb bb 0f 1f 40 00 0f 1f 44 00 00 66 90 0f 00 2d be da 47 00 fb f4 <c3> cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 0f 1f 44 00 00 65
[ 4802.263496] RSP: 0018:ffffffff97e03ea8 EFLAGS: 00000252
[ 4802.264050] RAX: ffffffff96d8d320 RBX: ffffffff97e1a940 RCX: 0000000000000000
[ 4802.264803] RDX: 4000000000000000 RSI: ffff994337c22b20 RDI: 000000000497eebc
[ 4802.265554] RBP: 0000000000000000 R08: 0000045e163d1cbb R09: ffff9941d6202400
[ 4802.266301] R10: 0000000000020604 R11: 0000000000000000 R12: 0000000000000000
[ 4802.267054] R13: 000000006dc53d18 R14: 000000006d3c47a8 R15: 000000006d3c47b0
[ 4802.267810]  ? mwait_idle+0x70/0x70
[ 4802.268189]  default_idle_call+0x33/0xe0
[ 4802.268615]  cpuidle_idle_call+0x125/0x160
[ 4802.269051]  ? kvm_sched_clock_read+0x14/0x30
[ 4802.269519]  do_idle+0x78/0xe0
[ 4802.269891]  cpu_startup_entry+0x19/0x20
[ 4802.270311]  rest_init+0xca/0xd0
[ 4802.270671]  arch_call_rest_init+0xa/0x14
[ 4802.271099]  start_kernel+0x4a3/0x4c2
[ 4802.271495]  secondary_startup_64_no_verify+0xe5/0xeb
[ 4802.272037]  </TASK>
[ 4802.272279] ---[ end trace 87fb221169225dfd ]--

Besides above "Call Trace" , the domain will keep throwing the info like" virtio_net virtio3 enp6s0: TX timeout on queue: 0, sq: output.0, vq: 0x1, name: output.0, 7820000 usecs ago"

7. run the ping tests 
# ping  192.168.1.3
PING 192.168.1.3 (192.168.1.3) 56(84) bytes of data.
ping: sendmsg: No buffer space available
ping: sendmsg: No buffer space available
ping: sendmsg: No buffer space available
...

Actual results:
The domain with vhost-user interface  + iommu  throws "call trace" when running the netperf tests

Expected results:
No Call Trace

Additional info:

Comment 1 RHEL Program Management 2023-09-22 15:51:51 UTC
Issue migration from Bugzilla to Jira is in process at this time. This will be the last message in Jira copied from the Bugzilla bug.


Note You need to log in before you can comment on or make changes to this bug.