RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1589264 - [OVS] [bnxt] OVS daemon got segfault when adding bnxt dpdk interface to OVS-dpdk bridge
Summary: [OVS] [bnxt] OVS daemon got segfault when adding bnxt dpdk interface to OVS-d...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: openvswitch
Version: 7.5
Hardware: x86_64
OS: Linux
high
high
Target Milestone: rc
: ---
Assignee: Davide Caratti
QA Contact: Jean-Tsung Hsiao
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-06-08 14:41 UTC by Jean-Tsung Hsiao
Modified: 2018-08-15 13:53 UTC (History)
7 users (show)

Fixed In Version: openvswitch-2.9.0-51.el7fdn
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-08-15 13:53:04 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
If new MTU is not greater than mbuf size don't update HW (1.39 KB, patch)
2018-06-15 20:58 UTC, Ajit Khaparde
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2018:2432 0 None None None 2018-08-15 13:53:51 UTC

Description Jean-Tsung Hsiao 2018-06-08 14:41:55 UTC
Description of problem: [OVS] [bnxt] Encountered bnxt specific ERRs while configuring OVS-dpdk with bnxt dpdk interface

2018-06-08T14:17:42.467Z|00151|dpif_netdev|INFO|PMD thread on numa_id: 0, core id: 10 destroyed.
2018-06-08T14:17:42.469Z|00152|dpif_netdev|INFO|PMD thread on numa_id: 0, core id: 20 destroyed.
2018-06-08T14:17:42.471Z|00153|dpif_netdev|INFO|PMD thread on numa_id: 0, core id:  8 destroyed.
2018-06-08T14:17:42.472Z|00154|dpif_netdev|INFO|PMD thread on numa_id: 0, core id: 22 destroyed.
2018-06-08T14:17:42.474Z|00155|dpif_netdev|INFO|PMD thread on numa_id: 0, core id:  0 created.
2018-06-08T14:17:42.475Z|00156|dpif_netdev|INFO|PMD thread on numa_id: 1, core id:  1 created.
2018-06-08T14:17:42.475Z|00157|dpif_netdev|INFO|There are 1 pmd threads on numa node 0
2018-06-08T14:17:42.475Z|00158|dpif_netdev|INFO|There are 1 pmd threads on numa node 1
2018-06-08T14:17:42.475Z|00159|dpdk|INFO|PMD: Force Link Down
2018-06-08T14:17:42.477Z|00160|dpdk|ERR|PMD: bnxt_hwrm_port_clr_stats error 65535:0:00000000:0000
2018-06-08T14:17:42.482Z|00161|dpdk|ERR|PMD: bnxt_hwrm_vnic_tpa_cfg error 2:0:00000000:01f2
2018-06-08T14:17:42.502Z|00162|dpdk|INFO|PMD: New MTU is 1500
2018-06-08T14:17:42.503Z|00163|dpdk|ERR|PMD: bnxt_hwrm_vnic_plcmode_cfg error 2:0:00000000:01cb
2018-06-08T14:17:42.503Z|00164|netdev_dpdk|ERR|Interface dpdk-10 MTU (1500) setup error: Unknown error -2
2018-06-08T14:17:42.503Z|00165|netdev_dpdk|ERR|Interface dpdk-10(rxq:1 txq:3) configure error: Unknown error -2
2018-06-08T14:17:42.503Z|00166|dpif_netdev|ERR|Failed to set interface dpdk-10 new configuration



Version-Release number of selected component (if applicable):
[root@netqe22 jhsiao]# ethtool -i p5p1
driver: bnxt_en
version: 1.8.0
firmware-version: 212.0.92.0
expansion-rom-version: 
bus-info: 0000:07:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: no
supports-priv-flags: no


[root@netqe22 jhsiao]# rpm -q openvswitch
openvswitch-2.9.0-37.el7fdp.x86_64
[root@netqe22 jhsiao]# uname -a
Linux netqe22.knqe.lab.eng.bos.redhat.com 3.10.0-862.el7.x86_64 #1 SMP Wed Mar 21 18:14:51 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux


How reproducible: Reproducible


Steps to Reproduce:
1. Configure an OVS-dpdk bridge using bnxt with the same firmware mentioned above.
2.
3.

Actual results:
Got ERRs that resulted in failing to add bnxt dpdk interface to OVS-dpdk bridge.

Expected results:
Should succeed!

Additional info:

Comment 2 Jean-Tsung Hsiao 2018-06-08 15:59:54 UTC
The daemon also got segfault. Below is a gdb back trace.

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f3bfa7fc700 (LWP 3667)]
bnxt_recv_pkts (rx_queue=0x0, rx_pkts=0x7f3bfa7fb770, nb_pkts=32)
    at /usr/src/debug/openvswitch-2.9.0/dpdk-17.11/drivers/net/bnxt/bnxt_rxr.c:536
536		struct bnxt_rx_ring_info *rxr = rxq->rx_ring;
(gdb) bt
#0  bnxt_recv_pkts (rx_queue=0x0, rx_pkts=0x7f3bfa7fb770, nb_pkts=32)
    at /usr/src/debug/openvswitch-2.9.0/dpdk-17.11/drivers/net/bnxt/bnxt_rxr.c:536
#1  0x00005627206d6d4b in rte_eth_rx_burst (nb_pkts=32, rx_pkts=0x7f3bfa7fb770, 
    queue_id=1, port_id=0)
    at /usr/src/debug/openvswitch-2.9.0/dpdk-17.11/x86_64-native-linuxapp-gcc/include/rte_ethdev.h:2897
#2  netdev_dpdk_rxq_recv (rxq=<optimized out>, batch=0x7f3bfa7fb760)
    at lib/netdev-dpdk.c:1923
#3  0x0000562720624281 in netdev_rxq_recv (rx=<optimized out>, 
    batch=batch@entry=0x7f3bfa7fb760) at lib/netdev.c:701
#4  0x00005627205fd82f in dp_netdev_process_rxq_port (pmd=pmd@entry=0x7f3cb8358010, 
    rxq=0x562721bd3110, port_no=2) at lib/dpif-netdev.c:3279
#5  0x00005627205fdc3a in pmd_thread_main (f_=<optimized out>)
    at lib/dpif-netdev.c:4145
#6  0x000056272067a8c6 in ovsthread_wrapper (aux_=<optimized out>)
    at lib/ovs-thread.c:348
#7  0x00007f3cd7c78dd5 in start_thread () from /lib64/libpthread.so.0
#8  0x00007f3cd7076b3d in clone () from /lib64/libc.so.6
(gdb) q
A debugging session is active.

	Inferior 1 [process 3506] will be detached.

Quit anyway? (y or n) y
Quitting: Can't detach Thread 0x7f3c197fa700 (LWP 3665): No such process
[root@netqe22 ~]#

Comment 3 Jean-Tsung Hsiao 2018-06-08 16:19:38 UTC
For bnxt with different firmware there is no such issue.

[root@netqe16 jhsiao]# ethtool -i p7p1
driver: bnxt_en
version: 1.8.0
firmware-version: 20.6.55.0
expansion-rom-version: 
bus-info: 0000:05:00.0
supports-statistics: yes
supports-test: no
supports-eeprom-access: yes
supports-register-dump: no
supports-priv-flags: no
[root@netqe16 jhsiao]#

Comment 4 Davide Caratti 2018-06-13 07:21:20 UTC
hello Jean-Tsung,

- is this issue systematic?
- do you have a reproducer script for the segfault at comment #2?

thank you in advance!
-- 
davide

Comment 5 Jean-Tsung Hsiao 2018-06-13 15:28:19 UTC
Nothing special. Just add bnxt to OVS-dpdk bridge. Below is my example.


# Please change parameters accordingly
ovs-vsctl set Open_vSwitch . other_config={}
ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-lcore-mask=0x000004
ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-socket-mem="4096,4096"
ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true

ovs-vsctl --if-exists del-br ovsbr0
ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x500500
ovs-vsctl add-br ovsbr0 -- set bridge ovsbr0 datapath_type=netdev
ovs-vsctl add-port ovsbr0 dpdk-10 \
    -- set interface dpdk-10 type=dpdk ofport_request=10 options:dpdk-devargs=0000:05:00.0
ovs-vsctl add-port ovsbr0 dpdk-11 \
    -- set interface dpdk-11 type=dpdk ofport_request=11 options:dpdk-devargs=0000:05:00.1

ovs-vsctl --timeout 10 set Interface dpdk-10 options:n_rxq=2
ovs-vsctl --timeout 10 set Interface dpdk-11 options:n_rxq=2

ovs-ofctl del-flows ovsbr0
ovs-ofctl add-flow ovsbr0 in_port=10,actions=output:11
ovs-ofctl add-flow ovsbr0 in_port=11,actions=output:10
ovs-ofctl dump-flows ovsbr0

Comment 6 Ajit Khaparde 2018-06-13 15:39:56 UTC
And this is a VF and not a PF. Right?

Comment 7 Davide Caratti 2018-06-13 15:47:03 UTC
(In reply to Ajit Khaparde from comment #6)
> And this is a VF and not a PF. Right?

It does not look like a VF, see comment #3. But I don't have FW 20.x, can you please check if this happens with version 20.8.x ?

thanks!
-- 
davide

Comment 8 Davide Caratti 2018-06-13 15:51:23 UTC
(In reply to Davide Caratti from comment #7)
> (In reply to Ajit Khaparde from comment #6)
> > And this is a VF and not a PF. Right?
> 
> It does not look like a VF, see comment #3. But I don't have FW 20.x, can
> you please check if this happens with version 20.8.x ?
> 
> thanks!
> -- 
> davide

scratch my question. I was assuming that the fault was reproducible on old FWs, not new FWs, but now I read correctly:

FW 20.x -> no segfault 
FW 212.x -> segfault

Comment 9 Jean-Tsung Hsiao 2018-06-13 16:00:27 UTC
(In reply to Davide Caratti from comment #8)
> (In reply to Davide Caratti from comment #7)
> > (In reply to Ajit Khaparde from comment #6)
> > > And this is a VF and not a PF. Right?
> > 
> > It does not look like a VF, see comment #3. But I don't have FW 20.x, can
> > you please check if this happens with version 20.8.x ?
> > 
> > thanks!
> > -- 
> > davide
> 
> scratch my question. I was assuming that the fault was reproducible on old
> FWs, not new FWs, but now I read correctly:
> 
> FW 20.x -> no segfault 
> FW 212.x -> segfault

Correct! I am very surprised.

NOTE: I don't own the server at this moment. It's being used for other testing.

Comment 10 Ajit Khaparde 2018-06-15 00:51:24 UTC
(In reply to Jean-Tsung Hsiao from comment #9)
> (In reply to Davide Caratti from comment #8)
> > (In reply to Davide Caratti from comment #7)
> > > (In reply to Ajit Khaparde from comment #6)
> > > > And this is a VF and not a PF. Right?
> > > 
> > > It does not look like a VF, see comment #3. But I don't have FW 20.x, can
> > > you please check if this happens with version 20.8.x ?
> > > 
> > > thanks!
> > > -- 
> > > davide
> > 
> > scratch my question. I was assuming that the fault was reproducible on old
> > FWs, not new FWs, but now I read correctly:
> > 
> > FW 20.x -> no segfault 
> > FW 212.x -> segfault
> 
> Correct! I am very surprised.
> 
> NOTE: I don't own the server at this moment. It's being used for other
> testing.

When you get your server back, can you try a patch?
I am yet to see a segfault with the firmware I have on my setup.
So I am trying to get to the exact version you are using and try again.

Comment 11 Davide Caratti 2018-06-15 10:22:49 UTC
(In reply to Ajit Khaparde from comment #10)
> (In reply to Jean-Tsung Hsiao from comment #9)
> > (In reply to Davide Caratti from comment #8)
> > > (In reply to Davide Caratti from comment #7)
> > > > (In reply to Ajit Khaparde from comment #6)
> > > > > And this is a VF and not a PF. Right?
> > > > 
> > > > It does not look like a VF, see comment #3. But I don't have FW 20.x, can
> > > > you please check if this happens with version 20.8.x ?
> > > > 
> > > > thanks!
> > > > -- 
> > > > davide
> > > 
> > > scratch my question. I was assuming that the fault was reproducible on old
> > > FWs, not new FWs, but now I read correctly:
> > > 
> > > FW 20.x -> no segfault 
> > > FW 212.x -> segfault
> > 
> > Correct! I am very surprised.
> > 
> > NOTE: I don't own the server at this moment. It's being used for other
> > testing.
> 
> When you get your server back, can you try a patch?
> I am yet to see a segfault with the firmware I have on my setup.
> So I am trying to get to the exact version you are using and try again.

hello Ajit,

thanks for looking at this! I can reproduce the segfault and the reported errors on netdev90, using latest FDP openvswitch: if you share the code, I can build/test the patch you mention in comment #10 and give you feedback: please let me know how you want to proceed.

regards,
-- 
davide

Comment 12 Jean-Tsung Hsiao 2018-06-15 18:32:06 UTC
(In reply to Ajit Khaparde from comment #10)
> (In reply to Jean-Tsung Hsiao from comment #9)
> > (In reply to Davide Caratti from comment #8)
> > > (In reply to Davide Caratti from comment #7)
> > > > (In reply to Ajit Khaparde from comment #6)
> > > > > And this is a VF and not a PF. Right?
> > > > 
> > > > It does not look like a VF, see comment #3. But I don't have FW 20.x, can
> > > > you please check if this happens with version 20.8.x ?
> > > > 
> > > > thanks!
> > > > -- 
> > > > davide
> > > 
> > > scratch my question. I was assuming that the fault was reproducible on old
> > > FWs, not new FWs, but now I read correctly:
> > > 
> > > FW 20.x -> no segfault 
> > > FW 212.x -> segfault
> > 
> > Correct! I am very surprised.
> > 
> > NOTE: I don't own the server at this moment. It's being used for other
> > testing.
> 
> When you get your server back, can you try a patch?
> I am yet to see a segfault with the firmware I have on my setup.
> So I am trying to get to the exact version you are using and try again.

Where can I get the test build?

Comment 13 Ajit Khaparde 2018-06-15 20:58:23 UTC
Created attachment 1452067 [details]
If new MTU is not greater than mbuf size don't update HW

Can you try the attached patch.

Comment 14 Jean-Tsung Hsiao 2018-06-16 18:06:41 UTC
(In reply to Ajit Khaparde from comment #13)
> Created attachment 1452067 [details]
> If new MTU is not greater than mbuf size don't update HW
> 
> Can you try the attached patch.

Sorry, I am waiting for a test build, not a patch.

Comment 15 Davide Caratti 2018-06-18 10:10:44 UTC
(In reply to Jean-Tsung Hsiao from comment #14)
> (In reply to Ajit Khaparde from comment #13)
> > Created attachment 1452067 [details]
> > If new MTU is not greater than mbuf size don't update HW
> > 
> > Can you try the attached patch.
> 

hello, I made a build applying the attached patch on top of latest FDN and did a quick retest. There are still some ERR messages in the PMD log,

2018-06-18T10:03:14.528Z|00466|dpdk|INFO|PMD: Force Link Down
2018-06-18T10:03:14.529Z|00467|dpdk|ERR|PMD: bnxt_hwrm_port_clr_stats error 65535:0:00000000:0000
2018-06-18T10:03:14.530Z|00468|dpdk|ERR|PMD: bnxt_hwrm_vnic_tpa_cfg error 2:0:00000000:01f2
2018-06-18T10:03:14.550Z|00469|dpdk|INFO|PMD: New MTU is 1500
2018-06-18T10:03:14.571Z|00470|dpdk|ERR|PMD: bnxt_hwrm_vnic_tpa_cfg error 2:0:00000000:01f2
2018-06-18T10:03:14.576Z|00471|dpdk|ERR|PMD: bnxt_hwrm_vnic_tpa_cfg error 2:0:00000000:01f2
2018-06-18T10:03:14.580Z|00472|dpdk|INFO|PMD: bnxt_init_chip(): intr_vector = 2
2018-06-18T10:03:14.589Z|00473|dpdk|INFO|PMD: Port 1 Link Down

but the segfault does not seem to happen anymore. @Jean, can you confirm?

Comment 17 Ajit Khaparde 2018-06-18 13:47:58 UTC
Thanks for the update Davide.
We have a firmware fix and a PMD change for bnxt_hwrm_port_clr_stats.
I will take a look at bnxt_hwrm_vnic_tpa_cfg error.

Comment 18 Jean-Tsung Hsiao 2018-06-18 14:20:30 UTC
(In reply to Davide Caratti from comment #15)
> (In reply to Jean-Tsung Hsiao from comment #14)
> > (In reply to Ajit Khaparde from comment #13)
> > > Created attachment 1452067 [details]
> > > If new MTU is not greater than mbuf size don't update HW
> > > 
> > > Can you try the attached patch.
> > 
> 
> hello, I made a build applying the attached patch on top of latest FDN and
> did a quick retest. There are still some ERR messages in the PMD log,
> 
> 2018-06-18T10:03:14.528Z|00466|dpdk|INFO|PMD: Force Link Down
> 2018-06-18T10:03:14.529Z|00467|dpdk|ERR|PMD: bnxt_hwrm_port_clr_stats error
> 65535:0:00000000:0000
> 2018-06-18T10:03:14.530Z|00468|dpdk|ERR|PMD: bnxt_hwrm_vnic_tpa_cfg error
> 2:0:00000000:01f2
> 2018-06-18T10:03:14.550Z|00469|dpdk|INFO|PMD: New MTU is 1500
> 2018-06-18T10:03:14.571Z|00470|dpdk|ERR|PMD: bnxt_hwrm_vnic_tpa_cfg error
> 2:0:00000000:01f2
> 2018-06-18T10:03:14.576Z|00471|dpdk|ERR|PMD: bnxt_hwrm_vnic_tpa_cfg error
> 2:0:00000000:01f2
> 2018-06-18T10:03:14.580Z|00472|dpdk|INFO|PMD: bnxt_init_chip(): intr_vector
> = 2
> 2018-06-18T10:03:14.589Z|00473|dpdk|INFO|PMD: Port 1 Link Down
> 
> but the segfault does not seem to happen anymore. @Jean, can you confirm?

Hi Davide,
Yes, I got the same result as you did.
Thanks for the test build.
Jean

Comment 22 Ajit Khaparde 2018-06-21 21:21:32 UTC
(In reply to Davide Caratti from comment #20)
> https://mails.dpdk.org/archives/dev/2018-June/104698.html

Thanks for updating Davide.
I was planning to update the bug once the patch was applied.
But this will work as well.

Comment 24 Jean-Tsung Hsiao 2018-07-25 02:58:03 UTC
Waiting for netqe22 to verify the fix.

Comment 25 Jean-Tsung Hsiao 2018-07-25 14:04:28 UTC
The fix has been verified using OVS 2.9.0-55.

Comment 26 Timothy Redaelli 2018-08-10 13:45:33 UTC
The openvwitch component is delivered through the fast datapath channel, it is not documented in release notes.

Comment 28 errata-xmlrpc 2018-08-15 13:53:04 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2432


Note You need to log in before you can comment on or make changes to this bug.