RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1558328 - Kernel data path test with OVS 2.9 + DPDK 17.11 fails with low throughput
Summary: Kernel data path test with OVS 2.9 + DPDK 17.11 fails with low throughput
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: kernel
Version: 7.5
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Rasesh Mody
QA Contact: Christian Trautman
URL:
Whiteboard:
: 1568523 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-03-20 03:39 UTC by Rasesh Mody
Modified: 2023-03-24 14:01 UTC (History)
14 users (show)

Fixed In Version: kernel-3.10.0-898.el7
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-10-30 08:50:08 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
ethtool statistics (1.33 KB, text/plain)
2018-03-20 05:12 UTC, Rasesh Mody
no flags Details
traffic profile (588.93 KB, image/png)
2018-03-20 11:56 UTC, Christian Trautman
no flags Details
Potential fix (1.43 KB, patch)
2018-03-21 21:33 UTC, Rasesh Mody
no flags Details | Diff

Description Rasesh Mody 2018-03-20 03:39:10 UTC
Description of problem:
The kernel datapath tests fail to achieve required performance numbers

Version-Release number of selected component (if applicable):
RHEL 7.5 snapshot 5

How reproducible:
Always

Steps to Reproduce:
Run either of following tests:
1. ovs_perf for kernel datapath
2. VSPerf, sub type kernel datapath

Actual results:
# 64   Byte OVS Kernel PVP test result: 0.000 #
# 1500 Byte OVS Kernel PVP test result: 0.000 #

Expected results:
PASS

Additional info:

Comment 2 Rasesh Mody 2018-03-20 05:12:58 UTC
Created attachment 1410221 [details]
ethtool statistics

Comment 3 Rasesh Mody 2018-03-20 05:20:32 UTC
With RHEL 7.4 we saw packet drops for kernel datapath tests...

 - For VSPerf test failure, we collected ethtool statistic diffs as attached
 - packet drops are due to checksum errors
 - we believe its a transmit side issue

With RHEL 7.5 we observed same failure...

 - we need to collect the ethtool statistics to determine if its the same issue

Comment 4 Rasesh Mody 2018-03-20 05:49:22 UTC
The complete logs with RHEL 7.4 (vsperf_results_BB40G_2018-02-09_101038.tar.gz) are uploaded to ftp://10.16.187.188/pub/RHEL_NIC_QUALIFICATION_FastLinQ_test_results.

The same logs are also uploaded to Cavium SFT location.

Comment 5 Christian Trautman 2018-03-20 11:55:29 UTC
Ran test with Xena traffic generator using very simple traffic profile.

Will attach details on traffic being sent.

Running just a phy2phy ovs (no vm) test after 3 packets the qede driver will start dropping packets and cause the system to slow to a crawl. My network stack appears to go unresponsive on the system. Until the qede driver has decided to drop(process) all the packets that were sent then my system will come back to normal behaviour.

messages log shows lots of errors

No other card has this issue that I've seen.

Mar 20 07:48:14 netqe22 kernel: [qede_rx_process_cqe:1115(p7p2)]CQE has error, flags = 449, dropping incoming packet
Mar 20 07:48:14 netqe22 kernel: [qede_rx_process_cqe:1115(p7p2)]CQE has error, flags = 449, dropping incoming packet
Mar 20 07:48:14 netqe22 kernel: [qede_rx_process_cqe:1115(p7p2)]CQE has error, flags = 449, dropping incoming packet
Mar 20 07:48:14 netqe22 kernel: [qede_rx_process_cqe:1115(p7p2)]CQE has error, flags = 449, dropping incoming packet
Mar 20 07:48:14 netqe22 kernel: [qede_rx_process_cqe:1115(p7p2)]CQE has error, flags = 449, dropping incoming packet
Mar 20 07:48:14 netqe22 kernel: [qede_rx_process_cqe:1115(p7p2)]CQE has error, flags = 449, dropping incoming packet
Mar 20 07:48:14 netqe22 kernel: [qede_rx_process_cqe:1115(p7p2)]CQE has error, flags = 449, dropping incoming packet
Mar 20 07:48:14 netqe22 kernel: [qede_rx_process_cqe:1115(p7p2)]CQE has error, flags = 449, dropping incoming packet
Mar 20 07:48:14 netqe22 kernel: [qede_rx_process_cqe:1115(p7p2)]CQE has error, flags = 449, dropping incoming packet
Mar 20 07:48:14 netqe22 kernel: [qede_rx_process_cqe:1115(p7p2)]CQE has error, flags = 449, dropping incoming packet
Mar 20 07:48:14 netqe22 kernel: [qede_rx_process_cqe:1115(p7p2)]CQE has error, flags = 449, dropping incoming packet
Mar 20 07:48:14 netqe22 kernel: [qede_rx_process_cqe:1115(p7p2)]CQE has error, flags = 449, dropping incoming packet
Mar 20 07:48:14 netqe22 kernel: [qede_rx_process_cqe:1115(p7p2)]CQE has error, flags = 449, dropping incoming packet
Mar 20 07:48:14 netqe22 kernel: [qede_rx_process_cqe:1115(p7p2)]CQE has error, flags = 449, dropping incoming packet
Mar 20 07:48:14 netqe22 kernel: [qede_rx_process_cqe:1115(p7p2)]CQE has error, flags = 449, dropping incoming packet

This is not a traffic generator issue from my opinion and is a driver problem.

Assigning back to qlogic.

Comment 6 Christian Trautman 2018-03-20 11:56:55 UTC
Created attachment 1410443 [details]
traffic profile

Comment 7 Christian Trautman 2018-03-20 12:01:26 UTC
NIC statistics:
     0: rcv_pkts: 3
     0: rx_hw_errors: 0
     0: rx_alloc_errors: 0
     0: rx_ip_frags: 0
     0: xmit_pkts: 2
     0: stopped_cnt: 0
     1: rcv_pkts: 0
     1: rx_hw_errors: 0
     1: rx_alloc_errors: 0
     1: rx_ip_frags: 0
     1: xmit_pkts: 0
     1: stopped_cnt: 0
     2: rcv_pkts: 0
     2: rx_hw_errors: 0
     2: rx_alloc_errors: 0
     2: rx_ip_frags: 0
     2: xmit_pkts: 0
     2: stopped_cnt: 0
     3: rcv_pkts: 0
     3: rx_hw_errors: 0
     3: rx_alloc_errors: 0
     3: rx_ip_frags: 0
     3: xmit_pkts: 0
     3: stopped_cnt: 0
     4: rcv_pkts: 0
     4: rx_hw_errors: 0
     4: rx_alloc_errors: 0
     4: rx_ip_frags: 0
     4: xmit_pkts: 0
     4: stopped_cnt: 0
     5: rcv_pkts: 0
     5: rx_hw_errors: 0
     5: rx_alloc_errors: 0
     5: rx_ip_frags: 0
     5: xmit_pkts: 0
     5: stopped_cnt: 0
     6: rcv_pkts: 0
     6: rx_hw_errors: 0
     6: rx_alloc_errors: 0
     6: rx_ip_frags: 0
     6: xmit_pkts: 1
     6: stopped_cnt: 0
     7: rcv_pkts: 0
     7: rx_hw_errors: 0
     7: rx_alloc_errors: 0
     7: rx_ip_frags: 0
     7: xmit_pkts: 0
     7: stopped_cnt: 0
     rx_ucast_bytes: 192
     rx_mcast_bytes: 0
     rx_bcast_bytes: 0
     rx_ucast_pkts: 3
     rx_mcast_pkts: 0
     rx_bcast_pkts: 0
     tx_ucast_bytes: 192
     tx_mcast_bytes: 0
     tx_bcast_bytes: 0
     tx_ucast_pkts: 3
     tx_mcast_pkts: 0
     tx_bcast_pkts: 0
     rx_64_byte_packets: 3
     rx_65_to_127_byte_packets: 0
     rx_128_to_255_byte_packets: 0
     rx_256_to_511_byte_packets: 0
     rx_512_to_1023_byte_packets: 0
     rx_1024_to_1518_byte_packets: 0
     rx_1519_to_max_byte_packets: 0
     tx_64_byte_packets: 3
     tx_65_to_127_byte_packets: 36
     tx_128_to_255_byte_packets: 0
     tx_256_to_511_byte_packets: 0
     tx_512_to_1023_byte_packets: 0
     tx_1024_to_1518_byte_packets: 0
     rx_1519_to_max_byte_packets: 0
     tx_64_byte_packets: 3
     tx_65_to_127_byte_packets: 36
     tx_128_to_255_byte_packets: 0
     tx_256_to_511_byte_packets: 0
     tx_512_to_1023_byte_packets: 0
     tx_1024_to_1518_byte_packets: 0
     tx_1519_to_max_byte_packets: 0
     rx_mac_crtl_frames: 0
     tx_mac_ctrl_frames: 0
     rx_pause_frames: 0
     tx_pause_frames: 0
     rx_pfc_frames: 0
     tx_pfc_frames: 0
     rx_crc_errors: 0
     rx_align_errors: 0
     rx_carrier_errors: 0
     rx_oversize_packets: 0
     rx_jabbers: 0
     rx_undersize_packets: 0
     rx_fragments: 0
     brb_truncates: 0
     brb_discards: 0
     no_buff_discards: 0
     mftag_filter_discards: 0
     mac_filter_discards: 0
     tx_err_drop_pkts: 0
     ttl0_discard: 0
     packet_too_big_discard: 0
     coalesced_pkts: 0
     coalesced_events: 0
     coalesced_aborts_num: 0
     non_coalesced_pkts: 0
     coalesced_bytes: 0

Comment 8 Christian Trautman 2018-03-20 12:13:03 UTC
Last thing I notice is running at 1pps that sometimes a packet will get processed and go through the flow tables correctly to return to the traffic generator. But almost all of them are dropped.

ovs-ofctl dump-ports br0
OFPST_PORT reply (xid=0x2): 3 ports
  port LOCAL: rx pkts=0, bytes=0, drop=0, errs=0, frame=0, over=0, crc=0
           tx pkts=0, bytes=0, drop=0, errs=0, coll=0
  port  p7p1: rx pkts=6, bytes=384, drop=0, errs=0, frame=0, over=0, crc=0
           tx pkts=3, bytes=192, drop=0, errs=0, coll=0
  port  p7p2: rx pkts=9186, bytes=587904, drop=0, errs=0, frame=0, over=0, crc=0
           tx pkts=6, bytes=384, drop=0, errs=0, coll=0

Comment 9 Rasesh Mody 2018-03-21 21:31:34 UTC
The wire trace captured shows that the traffic generator is sending packets with incorrect UDP checksum. The driver drops received packets which are marked with invalid checksum by the device.

The attached patch is a potential fix for the problem. It changes the behavior by not dropping such packets instead passing it to the stack with CHECKSUM_NONE indication in skb and let stack handle the packet.

Comment 10 Rasesh Mody 2018-03-21 21:33:35 UTC
Created attachment 1411471 [details]
Potential fix

Comment 15 Christian Trautman 2018-03-22 18:49:16 UTC
Patch resolves the issue. I was able to get >200kkps with 64 bytes at 0.002 loss rates.

Comment 16 Ameen Rahman 2018-04-04 02:30:18 UTC
Upstream commit details for this fix/patch.

commit 58f101bf87e32753342a6924772c6ebb0fbde24a
Author: Manish Chopra <manish.chopra>
Date:   Wed Mar 28 03:35:52 2018 -0700

    qede: Do not drop rx-checksum invalidated packets.

    Today, driver drops received packets which are indicated as
    invalid checksum by the device. Instead of dropping such packets,
    pass them to the stack with CHECKSUM_NONE indication in skb.

    Signed-off-by: Ariel Elior <ariel.elior>
    Signed-off-by: Manish Chopra <manish.chopra>
    Signed-off-by: David S. Miller <davem>

Comment 17 Ameen Rahman 2018-04-04 02:36:42 UTC
Harish, Please submit this patch to RHEL

Comment 18 Harish Patil 2018-04-17 23:01:26 UTC
*** Bug 1568523 has been marked as a duplicate of this bug. ***

Comment 19 Harish Patil 2018-04-17 23:02:36 UTC
(In reply to Ameen Rahman from comment #17)
> Harish, Please submit this patch to RHEL

The patch has been posted.

Comment 20 Chad Dupuis 2018-05-07 23:30:00 UTC
Adding comment to flag for 7.5 z-stream for this commit:

58f101bf87e32753342a6924772c6ebb0fbde24a
qede: Do not drop rx-checksum invalidated packets.

Comment 21 Bruno Meneguele 2018-06-06 13:32:48 UTC
Patch(es) committed on kernel repository and an interim kernel build is undergoing testing

Comment 23 Bruno Meneguele 2018-06-07 19:58:29 UTC
Patch(es) available on kernel-3.10.0-898.el7

Comment 25 Christian Trautman 2018-09-27 21:09:16 UTC
This has been verified running VSperf kernel testing on kernel-3.10.0-898.el7 with Q Logic cards. Issue is not present.

Comment 27 errata-xmlrpc 2018-10-30 08:50:08 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:3083


Note You need to log in before you can comment on or make changes to this bug.