Bug 1949887

Summary: [broadcom][ovs dpdk bonding] Both openvswitch-2.9.9-1.el7fdp.x86_64 and kernel crashed after create ovs dpdk boding
Product: Red Hat Enterprise Linux Fast Datapath Reporter: Hekai Wang <hewang>
Component: openvswitchAssignee: Timothy Redaelli <tredaelli>
openvswitch sub component: daemons and tools QA Contact: Hekai Wang <hewang>
Status: CLOSED CURRENTRELEASE Docs Contact:
Severity: unspecified    
Priority: unspecified CC: ctrautma, jhsiao, qding
Version: FDP 19.C   
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-11-09 09:01:23 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
ovs log
none
console log none

Description Hekai Wang 2021-04-15 10:33:02 UTC
Created attachment 1772110 [details]
ovs log

Description of problem:
ovs crashed after create ovs dpdk bonding with broadcom nic cards .
At meanwhile kernel core dump as well

Apr 15 06:16:48 dell-per740-10.rhts.eng.pek2.redhat.com ovs-vswitchd[97415]: ovs|00101|dpdk|ERR|PMD: bnxt_hwrm_vnic_rss_cfg error 2:0:00000000:0310
Apr 15 06:16:48 dell-per740-10.rhts.eng.pek2.redhat.com ovs-vswitchd[97415]: ovs|00102|dpdk|ERR|PMD: HWRM vnic 0 set RSS failure rc: ffffffea
Apr 15 06:16:48 dell-per740-10.rhts.eng.pek2.redhat.com ovs-vswitchd[97415]: ovs|00103|dpdk|ERR|PMD: bnxt_hwrm_vnic_tpa_cfg error 2:0:00000000:0293
Apr 15 06:16:48 dell-per740-10.rhts.eng.pek2.redhat.com ovs-vswitchd[97415]: ovs|00104|dpdk|EMER|PANIC in rte_free():
Apr 15 06:16:48 dell-per740-10.rhts.eng.pek2.redhat.com ovs-ctl[97389]: Starting ovs-vswitchd 2021-04-15T10:16:48Z|00104|dpdk|EMER|PANIC in rte_free():
Apr 15 06:16:48 dell-per740-10.rhts.eng.pek2.redhat.com ovs-ctl[97389]: 2021-04-15T10:16:48Z|00105|dpdk|EMER|Fatal error: Invalid memory
Apr 15 06:16:48 dell-per740-10.rhts.eng.pek2.redhat.com ovs-vswitchd[97415]: ovs|00105|dpdk|EMER|Fatal error: Invalid memory
Apr 15 06:16:48 dell-per740-10.rhts.eng.pek2.redhat.com ovs-vswitchd[97415]: ovs|00106|dpdk|ERR|19: [ovs-vswitchd(+0x40cd5) [0x5644fcad9cd5]]
Apr 15 06:16:48 dell-per740-10.rhts.eng.pek2.redhat.com ovs-vswitchd[97415]: ovs|00107|dpdk|ERR|18: [/lib64/libc.so.6(__libc_start_main+0xf5) [0x7f2f8c108555]]
Apr 15 06:16:48 dell-per740-10.rhts.eng.pek2.redhat.com ovs-vswitchd[97415]: ovs|00108|dpdk|ERR|17: [ovs-vswitchd(+0x3fedd) [0x5644fcad8edd]]
Apr 15 06:16:48 dell-per740-10.rhts.eng.pek2.redhat.com ovs-vswitchd[97415]: ovs|00109|dpdk|ERR|16: [ovs-vswitchd(+0x1ff839) [0x5644fcc98839]]
Apr 15 06:16:48 dell-per740-10.rhts.eng.pek2.redhat.com ovs-vswitchd[97415]: ovs|00110|dpdk|ERR|15: [ovs-vswitchd(+0x1fc2e5) [0x5644fcc952e5]]
Apr 15 06:16:48 dell-per740-10.rhts.eng.pek2.redhat.com ovs-vswitchd[97415]: ovs|00111|dpdk|ERR|14: [ovs-vswitchd(+0x1fa6b5) [0x5644fcc936b5]]
Apr 15 06:16:48 dell-per740-10.rhts.eng.pek2.redhat.com ovs-vswitchd[97415]: ovs|00112|dpdk|ERR|13: [ovs-vswitchd(+0x20c991) [0x5644fcca5991]]
Apr 15 06:16:48 dell-per740-10.rhts.eng.pek2.redhat.com ovs-vswitchd[97415]: ovs|00113|dpdk|ERR|12: [ovs-vswitchd(+0x216180) [0x5644fccaf180]]
Apr 15 06:16:48 dell-per740-10.rhts.eng.pek2.redhat.com ovs-vswitchd[97415]: ovs|00114|dpdk|ERR|11: [ovs-vswitchd(+0x2659ae) [0x5644fccfe9ae]]
Apr 15 06:16:48 dell-per740-10.rhts.eng.pek2.redhat.com ovs-vswitchd[97415]: ovs|00115|dpdk|ERR|10: [ovs-vswitchd(+0x25f3ed) [0x5644fccf83ed]]
Apr 15 06:16:48 dell-per740-10.rhts.eng.pek2.redhat.com ovs-vswitchd[97415]: ovs|00116|dpdk|ERR|9: [ovs-vswitchd(+0x25f267) [0x5644fccf8267]]
Apr 15 06:16:48 dell-per740-10.rhts.eng.pek2.redhat.com ovs-vswitchd[97415]: ovs|00117|dpdk|ERR|8: [ovs-vswitchd(+0x25e6e9) [0x5644fccf76e9]]
Apr 15 06:16:48 dell-per740-10.rhts.eng.pek2.redhat.com ovs-vswitchd[97415]: ovs|00118|dpdk|ERR|7: [ovs-vswitchd(+0x33cccc) [0x5644fcdd5ccc]]
Apr 15 06:16:48 dell-per740-10.rhts.eng.pek2.redhat.com ovs-vswitchd[97415]: ovs|00119|dpdk|ERR|6: [ovs-vswitchd(+0x63b35) [0x5644fcafcb35]]
Apr 15 06:16:48 dell-per740-10.rhts.eng.pek2.redhat.com ovs-vswitchd[97415]: ovs|00120|dpdk|ERR|5: [ovs-vswitchd(+0x93764) [0x5644fcb2c764]]
Apr 15 06:16:48 dell-per740-10.rhts.eng.pek2.redhat.com ovs-vswitchd[97415]: ovs|00121|dpdk|ERR|4: [ovs-vswitchd(+0xa0f78) [0x5644fcb39f78]]
Apr 15 06:16:48 dell-per740-10.rhts.eng.pek2.redhat.com ovs-vswitchd[97415]: ovs|00122|dpdk|ERR|3: [ovs-vswitchd(+0x58300) [0x5644fcaf1300]]
Apr 15 06:16:48 dell-per740-10.rhts.eng.pek2.redhat.com ovs-vswitchd[97415]: ovs|00123|dpdk|ERR|2: [ovs-vswitchd(+0x3581d) [0x5644fcace81d]]
Apr 15 06:16:48 dell-per740-10.rhts.eng.pek2.redhat.com ovs-vswitchd[97415]: ovs|00124|dpdk|ERR|1: [ovs-vswitchd(+0x5099d) [0x5644fcae999d]]
Apr 15 06:16:48 dell-per740-10.rhts.eng.pek2.redhat.com ovs-vswitchd[97414]: ovs|00002|daemon_unix|ERR|fork child died before signaling startup (killed (Aborted))
Apr 15 06:16:48 dell-per740-10.rhts.eng.pek2.redhat.com ovs-vswitchd[97414]: ovs|00003|daemon_unix|EMER|could not detach from foreground session
Apr 15 06:16:48 dell-per740-10.rhts.eng.pek2.redhat.com ovs-ctl[97389]: ovs-vswitchd: could not detach from foreground session
Apr 15 06:16:48 dell-per740-10.rhts.eng.pek2.redhat.com ovs-ctl[97389]: [FAILED]
Apr 15 06:16:48 dell-per740-10.rhts.eng.pek2.redhat.com systemd[1]: ovs-vswitchd.service: control process exited, code=exited status=1
Apr 15 06:16:48 dell-per740-10.rhts.eng.pek2.redhat.com systemd[1]: Failed to start Open vSwitch Forwarding Unit.

Here is the kernel call trace 
[  140.149870] ------------[ cut here ]------------ 
[  140.154501] WARNING: CPU: 26 PID: 5138 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x64/0x80 
[  140.162239] sysfs: cannot create duplicate filename '/class/net/bonding_masters' 
[  140.169619] Modules linked in: bonding openvswitch nf_conntrack_ipv6 nf_nat_ipv6 nf_defrag_ipv6 sctp mlx4_ib mlx4_en mlx4_core xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache ib_isert iscsi_target_mod ib_srpt target_core_mod ib_srp scsi_transport_srp scsi_tgt i40iw rpcrdma sunrpc rdma_ucm ib_iser rdma_cm iw_cm libiscsi scsi_transport_iscsi ib_umad ib_ipoib ib_cm mlx5_ib ib_uverbs dell_smbios dell_wmi_descriptor iTCO_wdt iTCO_vendor_support dcdbas skx_edac bnxt_re intel_powerclamp ib_core coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper ipmi_ssif cryptd pcspkr sg wmi ipmi_si ipmi_devintf ipmi_msghandler mei_me mei i2c_i801 lpc_ich acpi_power_meter acpi_pad ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic mgag200 i2c_algo_bit drm_kms_helper ahci syscopyarea sysfillrect sysimgblt fb_sys_fops libahci ttm mlx5_core i40e tg3 crct10dif_pclmul crct10dif_common mlxfw drm bnxt_en crc32c_intel libata ptp devlink megaraid_sas pps_core drm_panel_orientation_quirks nfit libnvdimm dm_mirror dm_region_hash dm_log dm_mod 
[  140.294971] CPU: 26 PID: 5138 Comm: ovs-dpctl Kdump: loaded Not tainted 3.10.0-1160.24.1.el7.x86_64 #1 
[  140.304246] Hardware name: Dell Inc. PowerEdge R740/0JMK61, BIOS 2.10.2 02/24/2021 
[  140.311796] Call Trace: 
[  140.314251]  [<ffffffffbc38308a>] dump_stack+0x19/0x1b 
[  140.319381]  [<ffffffffbbc9b1b8>] __warn+0xd8/0x100 
[  140.324254]  [<ffffffffbbc9b23f>] warn_slowpath_fmt+0x5f/0x80 
[  140.329987]  [<ffffffffbbed9338>] ? kernfs_path+0x48/0x60 
[  140.335372]  [<ffffffffbbedc044>] sysfs_warn_dup+0x64/0x80 
[  140.340846]  [<ffffffffbbedc404>] sysfs_do_create_link_sd.isra.2+0xb4/0xc0 
[  140.347702]  [<ffffffffbbedc435>] sysfs_create_link+0x25/0x50 
[  140.353435]  [<ffffffffbc0b7f77>] device_add+0x397/0x7c0 
[  140.358735]  [<ffffffffbc0b5e67>] ? dev_set_name+0x57/0x80 
[  140.364208]  [<ffffffffbc277f16>] netdev_register_kobject+0x96/0x190 
[  140.370546]  [<ffffffffbbccbf86>] ? raw_notifier_call_chain+0x16/0x20 
[  140.376980]  [<ffffffffbc25c204>] register_netdevice+0x524/0x770 
[  140.382978]  [<ffffffffc0d10766>] internal_dev_create+0x106/0x1a0 [openvswitch] 
[  140.390264]  [<ffffffffc0d0fd2c>] ovs_vport_add+0xdc/0x140 [openvswitch] 
[  140.396948]  [<ffffffffc0d02cf2>] new_vport+0x12/0x50 [openvswitch] 
[  140.403201]  [<ffffffffc0d05eb3>] ovs_dp_cmd_new+0x1a3/0x360 [openvswitch] 
[  140.410055]  [<ffffffffbc295ea8>] genl_family_rcv_msg+0x208/0x430 
[  140.416131]  [<ffffffffbc29082f>] ? __netlink_sendskb+0x5f/0x180 
[  140.422125]  [<ffffffffbbf07bac>] ? security_sock_rcv_skb+0x1c/0x20 
[  140.428373]  [<ffffffffbc29612b>] genl_rcv_msg+0x5b/0xc0 
[  140.433674]  [<ffffffffbc2960d0>] ? genl_family_rcv_msg+0x430/0x430 
[  140.439925]  [<ffffffffbc29411b>] netlink_rcv_skb+0xab/0xc0 
[  140.445482]  [<ffffffffbc294658>] genl_rcv+0x28/0x40 
[  140.450438]  [<ffffffffbc293aa0>] netlink_unicast+0x170/0x210 
[  140.456168]  [<ffffffffbc293e48>] netlink_sendmsg+0x308/0x420 
[  140.461903]  [<ffffffffbc2363a6>] sock_sendmsg+0xb6/0xf0 
[  140.467203]  [<ffffffffbbf8d622>] ? radix_tree_lookup_slot+0x22/0x50 
[  140.473542]  [<ffffffffbbdc0c8d>] ? filemap_fault+0x17d/0x420 
[  140.479279]  [<ffffffffbc237269>] ___sys_sendmsg+0x3e9/0x400 
[  140.484924]  [<ffffffffbc292198>] ? netlink_insert+0x1b8/0x340 
[  140.490746]  [<ffffffffbbf0c2a5>] ? sock_has_perm+0x75/0x90 
[  140.496358]  [<ffffffffbbe4a9aa>] ? __check_object_size+0x1ca/0x250 
[  140.502613]  [<ffffffffbc235e81>] ? move_addr_to_user+0xa1/0xe0 
[  140.508518]  [<ffffffffbc2361c2>] ? SYSC_getsockname+0xd2/0xf0 
[  140.514338]  [<ffffffffbc238921>] __sys_sendmsg+0x51/0x90 
[  140.519724]  [<ffffffffbc238972>] SyS_sendmsg+0x12/0x20 
[  140.524940]  [<ffffffffbc396226>] tracesys+0xa6/0xcc 
[  140.529890] ---[ end trace fb8683c8f97605f6 ]--- 



Version-Release number of selected component (if applicable):
[root@dell-per740-10 ~]# rpm -qa | grep openv
openvswitch-selinux-extra-policy-1.0-18.el7fdp.noarch
kernel-kernel-networking-openvswitch-common-2.0-122.noarch
openvswitch-2.9.9-1.el7fdp.x86_64
[root@dell-per740-10 ~]# rpm -qa | grep dpdk
kernel-kernel-networking-ovs-dpdk-bonding-new-bonding-1.0-208.noarch
dpdk-tools-18.11.8-1.el7_8.x86_64
dpdk-18.11.8-1.el7_8.x86_64
[root@dell-per740-10 ~]# uname  -a
Linux dell-per740-10.rhts.eng.pek2.redhat.com 3.10.0-1160.24.1.el7.x86_64 #1 SMP Thu Mar 25 21:21:56 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux


How reproducible:
Always

Steps to Reproduce:
Install hosts
enable dpdk with broadcom nic 
Create ovs dpdk bonding 

Job link https://beaker.engineering.redhat.com/jobs/5274038

Actual results:
Both kernel and ovs are crash and panic

Expected results:
It works fine .

Additional info:

Comment 1 Hekai Wang 2021-04-15 10:34:30 UTC
Created attachment 1772111 [details]
console log