Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1558328 - Kernel data path test with OVS 2.9 + DPDK 17.11 fails with low throughput
Kernel data path test with OVS 2.9 + DPDK 17.11 fails with low throughput
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: kernel (Show other bugs)
7.5
Unspecified Unspecified
unspecified Severity unspecified
: rc
: ---
Assigned To: Rasesh Mody
Christian Trautman
:
: 1568523 (view as bug list)
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2018-03-19 23:39 EDT by Rasesh Mody
Modified: 2018-10-30 04:50 EDT (History)
14 users (show)

See Also:
Fixed In Version: kernel-3.10.0-898.el7
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2018-10-30 04:50:08 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
ethtool statistics (1.33 KB, text/plain)
2018-03-20 01:12 EDT, Rasesh Mody
no flags Details
traffic profile (588.93 KB, image/png)
2018-03-20 07:56 EDT, Christian Trautman
no flags Details
Potential fix (1.43 KB, patch)
2018-03-21 17:33 EDT, Rasesh Mody
no flags Details | Diff

  None (edit)
Description Rasesh Mody 2018-03-19 23:39:10 EDT
Description of problem:
The kernel datapath tests fail to achieve required performance numbers

Version-Release number of selected component (if applicable):
RHEL 7.5 snapshot 5

How reproducible:
Always

Steps to Reproduce:
Run either of following tests:
1. ovs_perf for kernel datapath
2. VSPerf, sub type kernel datapath

Actual results:
# 64   Byte OVS Kernel PVP test result: 0.000 #
# 1500 Byte OVS Kernel PVP test result: 0.000 #

Expected results:
PASS

Additional info:
Comment 2 Rasesh Mody 2018-03-20 01:12 EDT
Created attachment 1410221 [details]
ethtool statistics
Comment 3 Rasesh Mody 2018-03-20 01:20:32 EDT
With RHEL 7.4 we saw packet drops for kernel datapath tests...

 - For VSPerf test failure, we collected ethtool statistic diffs as attached
 - packet drops are due to checksum errors
 - we believe its a transmit side issue

With RHEL 7.5 we observed same failure...

 - we need to collect the ethtool statistics to determine if its the same issue
Comment 4 Rasesh Mody 2018-03-20 01:49:22 EDT
The complete logs with RHEL 7.4 (vsperf_results_BB40G_2018-02-09_101038.tar.gz) are uploaded to ftp://10.16.187.188/pub/RHEL_NIC_QUALIFICATION_FastLinQ_test_results.

The same logs are also uploaded to Cavium SFT location.
Comment 5 Christian Trautman 2018-03-20 07:55:29 EDT
Ran test with Xena traffic generator using very simple traffic profile.

Will attach details on traffic being sent.

Running just a phy2phy ovs (no vm) test after 3 packets the qede driver will start dropping packets and cause the system to slow to a crawl. My network stack appears to go unresponsive on the system. Until the qede driver has decided to drop(process) all the packets that were sent then my system will come back to normal behaviour.

messages log shows lots of errors

No other card has this issue that I've seen.

Mar 20 07:48:14 netqe22 kernel: [qede_rx_process_cqe:1115(p7p2)]CQE has error, flags = 449, dropping incoming packet
Mar 20 07:48:14 netqe22 kernel: [qede_rx_process_cqe:1115(p7p2)]CQE has error, flags = 449, dropping incoming packet
Mar 20 07:48:14 netqe22 kernel: [qede_rx_process_cqe:1115(p7p2)]CQE has error, flags = 449, dropping incoming packet
Mar 20 07:48:14 netqe22 kernel: [qede_rx_process_cqe:1115(p7p2)]CQE has error, flags = 449, dropping incoming packet
Mar 20 07:48:14 netqe22 kernel: [qede_rx_process_cqe:1115(p7p2)]CQE has error, flags = 449, dropping incoming packet
Mar 20 07:48:14 netqe22 kernel: [qede_rx_process_cqe:1115(p7p2)]CQE has error, flags = 449, dropping incoming packet
Mar 20 07:48:14 netqe22 kernel: [qede_rx_process_cqe:1115(p7p2)]CQE has error, flags = 449, dropping incoming packet
Mar 20 07:48:14 netqe22 kernel: [qede_rx_process_cqe:1115(p7p2)]CQE has error, flags = 449, dropping incoming packet
Mar 20 07:48:14 netqe22 kernel: [qede_rx_process_cqe:1115(p7p2)]CQE has error, flags = 449, dropping incoming packet
Mar 20 07:48:14 netqe22 kernel: [qede_rx_process_cqe:1115(p7p2)]CQE has error, flags = 449, dropping incoming packet
Mar 20 07:48:14 netqe22 kernel: [qede_rx_process_cqe:1115(p7p2)]CQE has error, flags = 449, dropping incoming packet
Mar 20 07:48:14 netqe22 kernel: [qede_rx_process_cqe:1115(p7p2)]CQE has error, flags = 449, dropping incoming packet
Mar 20 07:48:14 netqe22 kernel: [qede_rx_process_cqe:1115(p7p2)]CQE has error, flags = 449, dropping incoming packet
Mar 20 07:48:14 netqe22 kernel: [qede_rx_process_cqe:1115(p7p2)]CQE has error, flags = 449, dropping incoming packet
Mar 20 07:48:14 netqe22 kernel: [qede_rx_process_cqe:1115(p7p2)]CQE has error, flags = 449, dropping incoming packet

This is not a traffic generator issue from my opinion and is a driver problem.

Assigning back to qlogic.
Comment 6 Christian Trautman 2018-03-20 07:56 EDT
Created attachment 1410443 [details]
traffic profile
Comment 7 Christian Trautman 2018-03-20 08:01:26 EDT
NIC statistics:
     0: rcv_pkts: 3
     0: rx_hw_errors: 0
     0: rx_alloc_errors: 0
     0: rx_ip_frags: 0
     0: xmit_pkts: 2
     0: stopped_cnt: 0
     1: rcv_pkts: 0
     1: rx_hw_errors: 0
     1: rx_alloc_errors: 0
     1: rx_ip_frags: 0
     1: xmit_pkts: 0
     1: stopped_cnt: 0
     2: rcv_pkts: 0
     2: rx_hw_errors: 0
     2: rx_alloc_errors: 0
     2: rx_ip_frags: 0
     2: xmit_pkts: 0
     2: stopped_cnt: 0
     3: rcv_pkts: 0
     3: rx_hw_errors: 0
     3: rx_alloc_errors: 0
     3: rx_ip_frags: 0
     3: xmit_pkts: 0
     3: stopped_cnt: 0
     4: rcv_pkts: 0
     4: rx_hw_errors: 0
     4: rx_alloc_errors: 0
     4: rx_ip_frags: 0
     4: xmit_pkts: 0
     4: stopped_cnt: 0
     5: rcv_pkts: 0
     5: rx_hw_errors: 0
     5: rx_alloc_errors: 0
     5: rx_ip_frags: 0
     5: xmit_pkts: 0
     5: stopped_cnt: 0
     6: rcv_pkts: 0
     6: rx_hw_errors: 0
     6: rx_alloc_errors: 0
     6: rx_ip_frags: 0
     6: xmit_pkts: 1
     6: stopped_cnt: 0
     7: rcv_pkts: 0
     7: rx_hw_errors: 0
     7: rx_alloc_errors: 0
     7: rx_ip_frags: 0
     7: xmit_pkts: 0
     7: stopped_cnt: 0
     rx_ucast_bytes: 192
     rx_mcast_bytes: 0
     rx_bcast_bytes: 0
     rx_ucast_pkts: 3
     rx_mcast_pkts: 0
     rx_bcast_pkts: 0
     tx_ucast_bytes: 192
     tx_mcast_bytes: 0
     tx_bcast_bytes: 0
     tx_ucast_pkts: 3
     tx_mcast_pkts: 0
     tx_bcast_pkts: 0
     rx_64_byte_packets: 3
     rx_65_to_127_byte_packets: 0
     rx_128_to_255_byte_packets: 0
     rx_256_to_511_byte_packets: 0
     rx_512_to_1023_byte_packets: 0
     rx_1024_to_1518_byte_packets: 0
     rx_1519_to_max_byte_packets: 0
     tx_64_byte_packets: 3
     tx_65_to_127_byte_packets: 36
     tx_128_to_255_byte_packets: 0
     tx_256_to_511_byte_packets: 0
     tx_512_to_1023_byte_packets: 0
     tx_1024_to_1518_byte_packets: 0
     rx_1519_to_max_byte_packets: 0
     tx_64_byte_packets: 3
     tx_65_to_127_byte_packets: 36
     tx_128_to_255_byte_packets: 0
     tx_256_to_511_byte_packets: 0
     tx_512_to_1023_byte_packets: 0
     tx_1024_to_1518_byte_packets: 0
     tx_1519_to_max_byte_packets: 0
     rx_mac_crtl_frames: 0
     tx_mac_ctrl_frames: 0
     rx_pause_frames: 0
     tx_pause_frames: 0
     rx_pfc_frames: 0
     tx_pfc_frames: 0
     rx_crc_errors: 0
     rx_align_errors: 0
     rx_carrier_errors: 0
     rx_oversize_packets: 0
     rx_jabbers: 0
     rx_undersize_packets: 0
     rx_fragments: 0
     brb_truncates: 0
     brb_discards: 0
     no_buff_discards: 0
     mftag_filter_discards: 0
     mac_filter_discards: 0
     tx_err_drop_pkts: 0
     ttl0_discard: 0
     packet_too_big_discard: 0
     coalesced_pkts: 0
     coalesced_events: 0
     coalesced_aborts_num: 0
     non_coalesced_pkts: 0
     coalesced_bytes: 0
Comment 8 Christian Trautman 2018-03-20 08:13:03 EDT
Last thing I notice is running at 1pps that sometimes a packet will get processed and go through the flow tables correctly to return to the traffic generator. But almost all of them are dropped.

ovs-ofctl dump-ports br0
OFPST_PORT reply (xid=0x2): 3 ports
  port LOCAL: rx pkts=0, bytes=0, drop=0, errs=0, frame=0, over=0, crc=0
           tx pkts=0, bytes=0, drop=0, errs=0, coll=0
  port  p7p1: rx pkts=6, bytes=384, drop=0, errs=0, frame=0, over=0, crc=0
           tx pkts=3, bytes=192, drop=0, errs=0, coll=0
  port  p7p2: rx pkts=9186, bytes=587904, drop=0, errs=0, frame=0, over=0, crc=0
           tx pkts=6, bytes=384, drop=0, errs=0, coll=0
Comment 9 Rasesh Mody 2018-03-21 17:31:34 EDT
The wire trace captured shows that the traffic generator is sending packets with incorrect UDP checksum. The driver drops received packets which are marked with invalid checksum by the device.

The attached patch is a potential fix for the problem. It changes the behavior by not dropping such packets instead passing it to the stack with CHECKSUM_NONE indication in skb and let stack handle the packet.
Comment 10 Rasesh Mody 2018-03-21 17:33 EDT
Created attachment 1411471 [details]
Potential fix
Comment 15 Christian Trautman 2018-03-22 14:49:16 EDT
Patch resolves the issue. I was able to get >200kkps with 64 bytes at 0.002 loss rates.
Comment 16 Ameen Rahman 2018-04-03 22:30:18 EDT
Upstream commit details for this fix/patch.

commit 58f101bf87e32753342a6924772c6ebb0fbde24a
Author: Manish Chopra <manish.chopra@cavium.com>
Date:   Wed Mar 28 03:35:52 2018 -0700

    qede: Do not drop rx-checksum invalidated packets.

    Today, driver drops received packets which are indicated as
    invalid checksum by the device. Instead of dropping such packets,
    pass them to the stack with CHECKSUM_NONE indication in skb.

    Signed-off-by: Ariel Elior <ariel.elior@cavium.com>
    Signed-off-by: Manish Chopra <manish.chopra@cavium.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
Comment 17 Ameen Rahman 2018-04-03 22:36:42 EDT
Harish, Please submit this patch to RHEL
Comment 18 Harish Patil 2018-04-17 19:01:26 EDT
*** Bug 1568523 has been marked as a duplicate of this bug. ***
Comment 19 Harish Patil 2018-04-17 19:02:36 EDT
(In reply to Ameen Rahman from comment #17)
> Harish, Please submit this patch to RHEL

The patch has been posted.
Comment 20 Chad Dupuis 2018-05-07 19:30:00 EDT
Adding comment to flag for 7.5 z-stream for this commit:

58f101bf87e32753342a6924772c6ebb0fbde24a
qede: Do not drop rx-checksum invalidated packets.
Comment 21 Bruno Meneguele 2018-06-06 09:32:48 EDT
Patch(es) committed on kernel repository and an interim kernel build is undergoing testing
Comment 23 Bruno Meneguele 2018-06-07 15:58:29 EDT
Patch(es) available on kernel-3.10.0-898.el7
Comment 25 Christian Trautman 2018-09-27 17:09:16 EDT
This has been verified running VSperf kernel testing on kernel-3.10.0-898.el7 with Q Logic cards. Issue is not present.
Comment 27 errata-xmlrpc 2018-10-30 04:50:08 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:3083

Note You need to log in before you can comment on or make changes to this bug.