Bug 1558328
Summary: | Kernel data path test with OVS 2.9 + DPDK 17.11 fails with low throughput | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Rasesh Mody <rasesh.mody> | ||||||||
Component: | kernel | Assignee: | Rasesh Mody <rasesh.mody> | ||||||||
kernel sub component: | NIC Drivers | QA Contact: | Christian Trautman <ctrautma> | ||||||||
Status: | CLOSED ERRATA | Docs Contact: | |||||||||
Severity: | unspecified | ||||||||||
Priority: | unspecified | CC: | ameen.rahman, arahman, atragler, brdeoliv, chad.dupuis, ctrautma, Harish.Patil, jean-mickael.guerin, network-qe, pabeni, pvauter, qding, rasesh.mody, tredaelli | ||||||||
Version: | 7.5 | ||||||||||
Target Milestone: | rc | ||||||||||
Target Release: | --- | ||||||||||
Hardware: | Unspecified | ||||||||||
OS: | Unspecified | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | kernel-3.10.0-898.el7 | Doc Type: | If docs needed, set a value | ||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2018-10-30 08:50:08 UTC | Type: | Bug | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Attachments: |
|
Description
Rasesh Mody
2018-03-20 03:39:10 UTC
Created attachment 1410221 [details]
ethtool statistics
With RHEL 7.4 we saw packet drops for kernel datapath tests... - For VSPerf test failure, we collected ethtool statistic diffs as attached - packet drops are due to checksum errors - we believe its a transmit side issue With RHEL 7.5 we observed same failure... - we need to collect the ethtool statistics to determine if its the same issue The complete logs with RHEL 7.4 (vsperf_results_BB40G_2018-02-09_101038.tar.gz) are uploaded to ftp://10.16.187.188/pub/RHEL_NIC_QUALIFICATION_FastLinQ_test_results. The same logs are also uploaded to Cavium SFT location. Ran test with Xena traffic generator using very simple traffic profile. Will attach details on traffic being sent. Running just a phy2phy ovs (no vm) test after 3 packets the qede driver will start dropping packets and cause the system to slow to a crawl. My network stack appears to go unresponsive on the system. Until the qede driver has decided to drop(process) all the packets that were sent then my system will come back to normal behaviour. messages log shows lots of errors No other card has this issue that I've seen. Mar 20 07:48:14 netqe22 kernel: [qede_rx_process_cqe:1115(p7p2)]CQE has error, flags = 449, dropping incoming packet Mar 20 07:48:14 netqe22 kernel: [qede_rx_process_cqe:1115(p7p2)]CQE has error, flags = 449, dropping incoming packet Mar 20 07:48:14 netqe22 kernel: [qede_rx_process_cqe:1115(p7p2)]CQE has error, flags = 449, dropping incoming packet Mar 20 07:48:14 netqe22 kernel: [qede_rx_process_cqe:1115(p7p2)]CQE has error, flags = 449, dropping incoming packet Mar 20 07:48:14 netqe22 kernel: [qede_rx_process_cqe:1115(p7p2)]CQE has error, flags = 449, dropping incoming packet Mar 20 07:48:14 netqe22 kernel: [qede_rx_process_cqe:1115(p7p2)]CQE has error, flags = 449, dropping incoming packet Mar 20 07:48:14 netqe22 kernel: [qede_rx_process_cqe:1115(p7p2)]CQE has error, flags = 449, dropping incoming packet Mar 20 07:48:14 netqe22 kernel: [qede_rx_process_cqe:1115(p7p2)]CQE has error, flags = 449, dropping incoming packet Mar 20 07:48:14 netqe22 kernel: [qede_rx_process_cqe:1115(p7p2)]CQE has error, flags = 449, dropping incoming packet Mar 20 07:48:14 netqe22 kernel: [qede_rx_process_cqe:1115(p7p2)]CQE has error, flags = 449, dropping incoming packet Mar 20 07:48:14 netqe22 kernel: [qede_rx_process_cqe:1115(p7p2)]CQE has error, flags = 449, dropping incoming packet Mar 20 07:48:14 netqe22 kernel: [qede_rx_process_cqe:1115(p7p2)]CQE has error, flags = 449, dropping incoming packet Mar 20 07:48:14 netqe22 kernel: [qede_rx_process_cqe:1115(p7p2)]CQE has error, flags = 449, dropping incoming packet Mar 20 07:48:14 netqe22 kernel: [qede_rx_process_cqe:1115(p7p2)]CQE has error, flags = 449, dropping incoming packet Mar 20 07:48:14 netqe22 kernel: [qede_rx_process_cqe:1115(p7p2)]CQE has error, flags = 449, dropping incoming packet This is not a traffic generator issue from my opinion and is a driver problem. Assigning back to qlogic. Created attachment 1410443 [details]
traffic profile
NIC statistics: 0: rcv_pkts: 3 0: rx_hw_errors: 0 0: rx_alloc_errors: 0 0: rx_ip_frags: 0 0: xmit_pkts: 2 0: stopped_cnt: 0 1: rcv_pkts: 0 1: rx_hw_errors: 0 1: rx_alloc_errors: 0 1: rx_ip_frags: 0 1: xmit_pkts: 0 1: stopped_cnt: 0 2: rcv_pkts: 0 2: rx_hw_errors: 0 2: rx_alloc_errors: 0 2: rx_ip_frags: 0 2: xmit_pkts: 0 2: stopped_cnt: 0 3: rcv_pkts: 0 3: rx_hw_errors: 0 3: rx_alloc_errors: 0 3: rx_ip_frags: 0 3: xmit_pkts: 0 3: stopped_cnt: 0 4: rcv_pkts: 0 4: rx_hw_errors: 0 4: rx_alloc_errors: 0 4: rx_ip_frags: 0 4: xmit_pkts: 0 4: stopped_cnt: 0 5: rcv_pkts: 0 5: rx_hw_errors: 0 5: rx_alloc_errors: 0 5: rx_ip_frags: 0 5: xmit_pkts: 0 5: stopped_cnt: 0 6: rcv_pkts: 0 6: rx_hw_errors: 0 6: rx_alloc_errors: 0 6: rx_ip_frags: 0 6: xmit_pkts: 1 6: stopped_cnt: 0 7: rcv_pkts: 0 7: rx_hw_errors: 0 7: rx_alloc_errors: 0 7: rx_ip_frags: 0 7: xmit_pkts: 0 7: stopped_cnt: 0 rx_ucast_bytes: 192 rx_mcast_bytes: 0 rx_bcast_bytes: 0 rx_ucast_pkts: 3 rx_mcast_pkts: 0 rx_bcast_pkts: 0 tx_ucast_bytes: 192 tx_mcast_bytes: 0 tx_bcast_bytes: 0 tx_ucast_pkts: 3 tx_mcast_pkts: 0 tx_bcast_pkts: 0 rx_64_byte_packets: 3 rx_65_to_127_byte_packets: 0 rx_128_to_255_byte_packets: 0 rx_256_to_511_byte_packets: 0 rx_512_to_1023_byte_packets: 0 rx_1024_to_1518_byte_packets: 0 rx_1519_to_max_byte_packets: 0 tx_64_byte_packets: 3 tx_65_to_127_byte_packets: 36 tx_128_to_255_byte_packets: 0 tx_256_to_511_byte_packets: 0 tx_512_to_1023_byte_packets: 0 tx_1024_to_1518_byte_packets: 0 rx_1519_to_max_byte_packets: 0 tx_64_byte_packets: 3 tx_65_to_127_byte_packets: 36 tx_128_to_255_byte_packets: 0 tx_256_to_511_byte_packets: 0 tx_512_to_1023_byte_packets: 0 tx_1024_to_1518_byte_packets: 0 tx_1519_to_max_byte_packets: 0 rx_mac_crtl_frames: 0 tx_mac_ctrl_frames: 0 rx_pause_frames: 0 tx_pause_frames: 0 rx_pfc_frames: 0 tx_pfc_frames: 0 rx_crc_errors: 0 rx_align_errors: 0 rx_carrier_errors: 0 rx_oversize_packets: 0 rx_jabbers: 0 rx_undersize_packets: 0 rx_fragments: 0 brb_truncates: 0 brb_discards: 0 no_buff_discards: 0 mftag_filter_discards: 0 mac_filter_discards: 0 tx_err_drop_pkts: 0 ttl0_discard: 0 packet_too_big_discard: 0 coalesced_pkts: 0 coalesced_events: 0 coalesced_aborts_num: 0 non_coalesced_pkts: 0 coalesced_bytes: 0 Last thing I notice is running at 1pps that sometimes a packet will get processed and go through the flow tables correctly to return to the traffic generator. But almost all of them are dropped. ovs-ofctl dump-ports br0 OFPST_PORT reply (xid=0x2): 3 ports port LOCAL: rx pkts=0, bytes=0, drop=0, errs=0, frame=0, over=0, crc=0 tx pkts=0, bytes=0, drop=0, errs=0, coll=0 port p7p1: rx pkts=6, bytes=384, drop=0, errs=0, frame=0, over=0, crc=0 tx pkts=3, bytes=192, drop=0, errs=0, coll=0 port p7p2: rx pkts=9186, bytes=587904, drop=0, errs=0, frame=0, over=0, crc=0 tx pkts=6, bytes=384, drop=0, errs=0, coll=0 The wire trace captured shows that the traffic generator is sending packets with incorrect UDP checksum. The driver drops received packets which are marked with invalid checksum by the device. The attached patch is a potential fix for the problem. It changes the behavior by not dropping such packets instead passing it to the stack with CHECKSUM_NONE indication in skb and let stack handle the packet. Created attachment 1411471 [details]
Potential fix
Patch resolves the issue. I was able to get >200kkps with 64 bytes at 0.002 loss rates. Upstream commit details for this fix/patch. commit 58f101bf87e32753342a6924772c6ebb0fbde24a Author: Manish Chopra <manish.chopra> Date: Wed Mar 28 03:35:52 2018 -0700 qede: Do not drop rx-checksum invalidated packets. Today, driver drops received packets which are indicated as invalid checksum by the device. Instead of dropping such packets, pass them to the stack with CHECKSUM_NONE indication in skb. Signed-off-by: Ariel Elior <ariel.elior> Signed-off-by: Manish Chopra <manish.chopra> Signed-off-by: David S. Miller <davem> Harish, Please submit this patch to RHEL *** Bug 1568523 has been marked as a duplicate of this bug. *** (In reply to Ameen Rahman from comment #17) > Harish, Please submit this patch to RHEL The patch has been posted. Adding comment to flag for 7.5 z-stream for this commit: 58f101bf87e32753342a6924772c6ebb0fbde24a qede: Do not drop rx-checksum invalidated packets. Patch(es) committed on kernel repository and an interim kernel build is undergoing testing Patch(es) available on kernel-3.10.0-898.el7 This has been verified running VSperf kernel testing on kernel-3.10.0-898.el7 with Q Logic cards. Issue is not present. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:3083 |