Bug 1834918

Summary:	High number of TX errors on geneve interfaces
Product:	Red Hat Enterprise Linux Fast Datapath	Reporter:	Sai Sindhur Malleni <smalleni>
Component:	OVN	Assignee:	Ben Nemec <bnemec>
Status:	CLOSED DUPLICATE	QA Contact:	Jianlin Shi <jishi>
Severity:	high	Docs Contact:
Priority:	high
Version:	RHEL 8.0	CC:	asegurap, bbennett, ctrautma, dblack, dcbw, gnault, jbenc, jtaleric, mcambria, mcornea, mkarg, mmichels, rkhan
Target Milestone:	---	Keywords:	UpcomingSprint
Target Release:	RHEL 8
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:
Clones:	1841214 1843412 (view as bug list)		Environment:
Last Closed:	2020-06-01 15:28:23 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Sai Sindhur Malleni 2020-05-12 16:27:44 UTC

Description of problem:

we have a 3 master+ 21 worker node bare metal deployment using OVNKubernetes. We are seeing wide-spread inability in the environment with kube-scheduler pods restarting and several other TLS handshake errors. As a part of debugging we ended up looking at the TX/RX errors on interfaces and found that the geneve interface on both the masters as well as workers had million+ transmission side errors. 


Version-Release number of selected component (if applicable):
4.5.0-0.nightly-2020-05-04-113741

How reproducible:
100%

Steps to Reproduce:
1. Deploy with OVNKubernetes
2. Run some workload like launching pods
3. Observe interface errors

Actual results:
million + TX errors on the geneve tunnel interface

Expected results:
No/low number of errors

Additional info:
[root@master-0 core]# cat /proc/net/dev                            
Inter-|   Receive                                                |  Transmit
 face |bytes    packets errs drop fifo frame compressed multicast|bytes    packets errs drop fifo colls carrier compressed
b18be1669c605c1: 413588578 1240429    0    2    0     0          0         0 528657794 3171614    0    0    0     0       0          0
47a29baa21c6a62: 30323621  146108    0    2    0     0          0         0 297218856 2465274    0    0    0     0       0          0
ens2f1: 43299888  280361    0    0    0     0          0    254092 143884691 1239368    0    0    0     0       0          0
ens3f0: 49658654  308905    0    0    0     0          0    266360 143892720 1239605    0    0    0     0       0          0
231b3ee7b3e6a50: 2660710521 7689131    0    0    0     0          0         0 2643864491 10177787    0    0    0     0       0          0
be3338e7f9fd69b: 4778826958 4947955    0    2    0     0          0         0 11169925679 7876459    0    0    0     0       0          0
dd5337e8a56f5c7: 1474024   21534    0    2    0     0          0         0 283102010 2340514    0    0    0     0       0          0
    lo: 124958714543 244429022    0    0    0     0          0         0 124958714543 244429022    0    0    0     0       0          0
ens3f1: 1000823673 7480273    0    0    0     0          0   5238809 45405061934 31714662    0    0    0     0       0          0
b895587b3045ce6: 283361201 1496326    0    2    0     0          0         0 1164723279 2781043    0    0    0     0       0          0
ff4817bc583218d: 342183647 1983976    0    0    0     0          0         0 928989884 4034835    0    0    0     0       0          0 
ovn-k8s-gw0: 7463125754 20527322    0    0    0     0          0         0 21425455997 21625575    0    0    0     0       0          0
genev_sys_6081: 195463767146 125382597    0    0    0     0          0         0 44315269654 106570295 1672490    0    0     0       0          0
db6799ecd9fe7d7: 103399277  242733    0    2    0     0          0         0 365862649 2490227    0    0    0     0       0          0
81e19e8ad3a02bf: 268434434 1320423    0    2    0     0          0         0 506011280 3813758    0    0    0     0       0          0
ens2f0: 391148302407 787598228    0    0    0     0          0  58583981 305573768797 703018786    0    0    0     0       0          0
b41c47a2e579397: 138906457 1582257    0    2    0     0          0         0 1757501275 4015001    0    0    0     0       0          0
22aaf7e1448288b: 80244928  169520    0    2    0     0          0         0 264772250 1857246    0    0    0     0       0          0
ovn-k8s-mp0: 5957524390 12540359    0    0    0     0          0         0 3207601879 15507512    0    0    0     0       0          0
67a73da1b246095: 7393584784 94079402    0    2    0     0          0         0 193770401552 115219840    0    0    0     0       0          0
9f21a39be3d2a96: 30480411  185213    0    0    0     0          0         0 492539732 2513594    0    0    0     0       0          0
15a268026b10dca: 109745479  197920    0    2    0     0          0         0 361872488 2551184    0    0    0     0       0          0
br-int:       0       0    0    0    0     0          0         0        0       0    0    0    0     0       0          0
05ec1f85d07d191:    1216      16    0    2    0     0          0         0  3130371   26736    0    0    0     0       0          0
ovs-system:       0       0    0    0    0     0          0         0        0       0    0    0    0     0       0          0
7969ef50ef4ea41: 70641502  296715    0    2    0     0          0         0 296875003 1870716    0    0    0     0       0          0                                                                                                                                          
bfb62ef5f4b9b9d: 35273960253 3655281    0    4    0     0          0         0 2603761783 6036449    0    0    0     0       0          0
2c90a2828e22afe: 162249959  707440    0    2    0     0          0         0 412779541 3132318    0    0    0     0       0          0
608859240f881fe: 88663360  855339    0    2    0     0          0         0 359538039 3368729    0    0    0     0       0          0
br-local: 748259239 2326802    0    0    0     0          0         0 217376604 1672293    0    0    0     0       0          0
============================================================================
t[root@worker002 core]# cat /proc/net/dev
Inter-|   Receive                                                |  Transmit
 face |bytes    packets errs drop fifo frame compressed multicast|bytes    packets errs drop fifo colls carrier compressed
ens2f0: 973209719 5952116    0    0    0     0          0   5951437 292296262 2008515    0    0    0     0       0          0
br-int:       0       0    0    0    0     0          0         0        0       0    0    0    0     0       0          0
22e8c8998884bdc: 161124692  722182    0    2    0     0          0         0 314628934 2521846    0    0    0     0       0          0
genev_sys_6081: 82958882  625107    0    0    0     0          0         0 160187247  582702 1209563    0    0     0       0          0
ens2f1: 13606348217 65789313    0    0    0     0          0  60808605 6062554122 10972610    0    0    0     0       0          0
acddd4d9aebad40: 1711244    9697    0    2    0     0          0         0 11284661   37140    0    0    0     0       0          0
ovn-k8s-gw0: 184606517 1740705    0    0    0     0          0         0 930862619 1555233    0    0    0     0       0          0
br-local: 217050204 1105312    0    0    0     0          0         0 137365278 1209699    0    0    0     0       0          0
    lo: 1152036105 4992300    0    0    0     0          0         0 1152036105 4992300    0    0    0     0       0          0
ovn-k8s-mp0: 333584092  510497    0    0    0     0          0         0 130119567 1355912    0    0    0     0       0          0
ovs-system:       0       0    0    0    0     0          0         0        0       0    0    0    0     0       0          0

Comment 1 Mark Michelson 2020-05-13 18:16:42 UTC

Hi I'm from the OVN team. It's pretty difficult to get much information just based on what's been presented here. OVN sets up the geneve tunnels that OCP uses. So it would be good to see logs from ovn-controller on the node(s) where you see the transmission errors. This way we can see if OVN encountered any errors while setting up the tunnels. Similarly, the logs from ovs-vswitchd on the node(s) with transmission errors may also give more information on what's going wrong here. In addition, the contents of the OVN southbound database may also be useful. This way we can see how the interfaces and chassis have been 

One interesting data point here is that the ens2f0 and ens2f1 interfaces have lots of transmission errors, too. It's not just the geneve interfaces. So I'm curious if there's something more widespread going wrong here.

I have a feeling someone from the networking-services kernel team will need to look into this if there's nothing obvious from the OVN logs that indicate errors. I'd say be prepared to dump more information for them as well.

Comment 3 Sai Sindhur Malleni 2020-05-13 18:30:55 UTC

The geneve tunnel is using ens2f0 just as an additional note.

Comment 4 Mark Michelson 2020-05-13 19:58:50 UTC

As another data point, what is the kernel version?

Comment 5 Rashid Khan 2020-05-13 20:01:10 UTC

Which Kernel version? and which NIC.

Comment 6 Dan Williams 2020-05-13 20:11:19 UTC

Also:

ethtool -k ens2f0
ethtool -k ens2f1

Comment 7 Sai Sindhur Malleni 2020-05-13 21:56:46 UTC

Seeing this on a newly deployed OCP cluster, with no workloads/pods running. Going to get mus gather data shortly.

Geneve tunnel is using ens2f0 and ens2f1 is not being used

2: ens2f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000                                                                                                                                              
    link/ether 3c:fd:fe:ee:49:08 brd ff:ff:ff:ff:ff:ff                
    inet 192.168.222.10/24 brd 192.168.222.255 scope global dynamic noprefixroute ens2f0                                  
       valid_lft 2727sec preferred_lft 2727sec                       
    inet6 fe80::6b06:748a:ef56:1ae0/64 scope link noprefixroute                                                                                                                                                                              
       valid_lft forever preferred_lft forever                        
3: ens2f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000                           
    link/ether 3c:fd:fe:ee:49:09 brd ff:ff:ff:ff:ff:ff  

NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.5.0-0.nightly-2020-05-08-222601   True        False         178m    Cluster version is 4.5.0-0.nightly-2020-05-08-222601           

Kernel: 4.18.0-147.8.1.el8_1.x86_64


sh-4.2# ethtool ens2f0         
Settings for ens2f0:               
        Supported ports: [ FIBRE ]       
        Supported link modes:   25000baseSR/Full
                                10000baseSR/Full
        Supported pause frame use: Symmetric
        Supports auto-negotiation: Yes
        Supported FEC modes: None BaseR RS
        Advertised link modes:  25000baseSR/Full
                                10000baseSR/Full
        Advertised pause frame use: No
        Advertised auto-negotiation: Yes
        Advertised FEC modes: None BaseR RS
        Speed: 25000Mb/s  
        Duplex: Full        
        Port: FIBRE     
        PHYAD: 0        
        Transceiver: internal
        Auto-negotiation: off    
        Supports Wake-on: d
        Wake-on: d          
        Current message level: 0x00000007 (7)
                               drv probe link
        Link detected: yes 
===========================================
sh-4.2# ethtool -i ens2f0       
driver: i40e      
version: 2.8.20-k                
firmware-version: 6.01 0x80003554 1.1747.0
expansion-rom-version:          
bus-info: 0000:5e:00.0        
supports-statistics: yes
supports-test: yes  
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes
==============================================
sh-4.2# ethtool -k ens2f0
Features for ens2f0:
rx-checksumming: on
tx-checksumming: on
        tx-checksum-ipv4: on
        tx-checksum-ip-generic: off [fixed]
        tx-checksum-ipv6: on
        tx-checksum-fcoe-crc: off [fixed]
        tx-checksum-sctp: on
scatter-gather: on
        tx-scatter-gather: on
        tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: on
        tx-tcp-segmentation: on
        tx-tcp-ecn-segmentation: on
        tx-tcp-mangleid-segmentation: off
        tx-tcp6-segmentation: on
udp-fragmentation-offload: off
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: on
receive-hashing: on
highdma: on
rx-vlan-filter: on [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: on
tx-gre-csum-segmentation: on
tx-ipxip4-segmentation: on
tx-ipxip6-segmentation: on
tx-udp_tnl-segmentation: on
tx-udp_tnl-csum-segmentation: on
tx-gso-partial: on
tx-sctp-segmentation: off [fixed]
tx-esp-segmentation: off [fixed]
tx-udp-segmentation: off [fixed]
tls-hw-rx-offload: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off [fixed]
rx-all: off [fixed]
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]
hw-tc-offload: on
esp-hw-offload: off [fixed]
esp-tx-csum-hw-offload: off [fixed]
rx-udp_tunnel-port-offload: on
tls-hw-tx-offload: off [fixed]
rx-gro-hw: off [fixed]
tls-hw-record: off [fixed]

Comment 9 Sai Sindhur Malleni 2020-05-14 18:02:15 UTC

As an additional datapoint, running any workloads (creating projects or pods) on the cluster is failing with either TLS hadnshake and EOF errors i/o timeout errors

Here are the error examples from the client


1. Unexpected error:
    <*url.Error | 0xc001bcb560>: {                               
        Op: "Post",                                               
        URL: "https://api.test769.myocp4.com:6443/api/v1/namespaces/nodevertical0/pods",
        Err: {s: "EOF"},                                       
    }                                                          
    Post https://api.test769.myocp4.com:6443/api/v1/namespaces/nodevertical0/pods: EOF
2. Get https://api.test714.myocp4.com:6443/api?timeout=32s: dial tcp 192.168.222.3:6443: i/o timeout 


In prometheus we continuously see nodenetworktransmit errors: https://snapshot.raintank.io/dashboard/snapshot/vmqPeuQ3AL8TDkorrC5wqDNe60Ap8tlp

I made sure we don't have any unexpected Ips/ hosts in the baremetal environment by running an nmap.

Happy to give access to the environment and help debug further.

Comment 10 mcambria@redhat.com 2020-05-14 22:13:48 UTC

Can these be turned off and try again?

tx-checksumming: on
scatter-gather: on
tcp-segmentation-offload: on
generic-segmentation-offload: on
tx-gre-segmentation: on
tx-gre-csum-segmentation: on
tx-udp_tnl-segmentation: on
tx-udp_tnl-csum-segmentation: on
tx-gso-partial: on
hw-tc-offload: on

I think the commands are (adjust ens2f0 accordingly.)

ethtool -K ens2f0 tx off
ethtool -K ens2f0 sgo off
ethtool -K ens2f0 tso off
ethtool -K ens2f0 gso off
ethtool -K ens2f0 tx-gre-segmentation off
ethtool -K ens2f0 tx-gre-csum-segmentation off
ethtool -K ens2f0 tx-udp_tnl-segmentation off
ethtool -K ens2f0 tx-udp_tnl-csum-segmentation off
ethtool -K ens2f0 tx-gso-partial off
ethtool -K ens2f0 hw-tc-offload off

But check to see if the values change to be sure.

Comment 12 Sai Sindhur Malleni 2020-05-15 21:43:32 UTC

Actually, I already attached must-gather data earlier, never mind. Please use the first link.

Comment 13 Alexander Constantinescu 2020-05-18 14:10:24 UTC

*** Bug 1835376 has been marked as a duplicate of this bug. ***

Comment 15 Sai Sindhur Malleni 2020-05-18 17:53:05 UTC

Still seeing errors on the geneve interface. Do note that there are no errors seen on ens2f0 before and after the changes.

sh-4.2# ethtool -k genev_sys_6081
Features for genev_sys_6081:
rx-checksumming: off
tx-checksumming: off
        tx-checksum-ipv4: off [fixed]
        tx-checksum-ip-generic: off
        tx-checksum-ipv6: off [fixed]
        tx-checksum-fcoe-crc: off [fixed]
        tx-checksum-sctp: off [fixed]
scatter-gather: off
        tx-scatter-gather: off
        tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: off
        tx-tcp-segmentation: off
        tx-tcp-ecn-segmentation: off
        tx-tcp-mangleid-segmentation: off
        tx-tcp6-segmentation: off
udp-fragmentation-offload: off
generic-segmentation-offload: off
generic-receive-offload: off
large-receive-offload: off [fixed]
rx-vlan-offload: off [fixed]
tx-vlan-offload: off [fixed]
ntuple-filters: off [fixed]
receive-hashing: off [fixed]
highdma: off [fixed]
rx-vlan-filter: off [fixed]
vlan-challenged: off [fixed]
tx-lockless: on [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: off [fixed]
tx-gre-csum-segmentation: off [fixed]
tx-ipxip4-segmentation: off [fixed]
tx-ipxip6-segmentation: off [fixed]
tx-udp_tnl-segmentation: off [fixed]
tx-udp_tnl-csum-segmentation: off [fixed]
tx-gso-partial: off [fixed]
tx-sctp-segmentation: off
tx-esp-segmentation: off [fixed]
tx-udp-segmentation: off [fixed]
tls-hw-rx-offload: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off [fixed]
rx-all: off [fixed]
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]
hw-tc-offload: off [fixed]
esp-hw-offload: off [fixed]
esp-tx-csum-hw-offload: off [fixed]
rx-udp_tunnel-port-offload: off [fixed]
tls-hw-tx-offload: off [fixed]
rx-gro-hw: off [fixed]
tls-hw-record: off [fixed]



For ens2f0
sh-4.2# ethtool -k ens2f0                                                                                                                                                                                                               [8/59]
Features for ens2f0:
rx-checksumming: off
tx-checksumming: off
        tx-checksum-ipv4: off
        tx-checksum-ip-generic: off [fixed]
        tx-checksum-ipv6: off
        tx-checksum-fcoe-crc: off [fixed]
        tx-checksum-sctp: off
scatter-gather: on
        tx-scatter-gather: on
        tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: off
        tx-tcp-segmentation: off
        tx-tcp-ecn-segmentation: off
        tx-tcp-mangleid-segmentation: off
        tx-tcp6-segmentation: off
udp-fragmentation-offload: off
generic-segmentation-offload: off
generic-receive-offload: off
large-receive-offload: off [fixed]
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: on
receive-hashing: on
highdma: off
rx-vlan-filter: on [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: off
tx-gre-csum-segmentation: off [requested on]
tx-ipxip4-segmentation: off
tx-ipxip6-segmentation: off
tx-udp_tnl-segmentation: off
tx-udp_tnl-csum-segmentation: off
tx-gso-partial: off
tx-sctp-segmentation: off [fixed]
tx-esp-segmentation: off [fixed]
tx-udp-segmentation: off [fixed]
tls-hw-rx-offload: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off [fixed]
rx-all: off [fixed]
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]
hw-tc-offload: off
esp-hw-offload: off [fixed]
esp-tx-csum-hw-offload: off [fixed]
rx-udp_tunnel-port-offload: off
tls-hw-tx-offload: off [fixed]
rx-gro-hw: off [fixed]
tls-hw-record: off [fixed]


S

Comment 17 Ricardo Carrillo Cruz 2020-05-19 14:13:16 UTC

*** Bug 1809281 has been marked as a duplicate of this bug. ***

Comment 20 Sai Sindhur Malleni 2020-05-19 15:20:34 UTC

ens2f0 does not have any TX errors in the excerpt unless we are looking at the wrong column.

Comment 35 Dan Williams 2020-05-26 03:00:17 UTC

Random question (and not a cause/fix for anything); should the bare metal deployments be using jumbo frames? I noticed the NICs were 1500 MTU.

Comment 36 Sai Sindhur Malleni 2020-05-26 13:40:41 UTC

Irrespective of Jumbo or not shouldn't the MTU be set accordingly for geneve.

Comment 37 mcambria@redhat.com 2020-05-26 15:24:37 UTC

Even if MTU isn't set according, PMTU discovery should fix all this after a few drops.  This assumes that

a/ TCP PMTU discovery enabled
b/ ICMP messages are being generated at the geneve interface (doubtful, see below)
c/ iptables let the ICMP "would frag" packet back to the sending pod 

Looking at a working AWS cluster, the veth interface of a pod uses MTU of 8901.

sh-4.4# ip -d link show b7da53d9b260336
13: b7da53d9b260336@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8901 qdisc noqueue master ovs-system state UP mode DEFAULT group default 
    link/ether be:d4:0a:60:04:80 brd ff:ff:ff:ff:ff:ff link-netns 7f80269a-18ea-4fb7-b7d4-61be655894f1 promiscuity 1 
    veth 
    openvswitch_slave addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 
sh-4.4# 

It might be worth turning on Jumbo frames to see what's different.  This could be a problem but not "the" problem.

re "b" above: 

Normally, a TCP segment using PMTU Discovery would have the kernel auto-magically see the TCP segment > MTU on egress interface and trigger ICMP "would frag" 

For geneve, the TCP segment is put in a UDP datagram after geneve headers.  Techinically the MTU of egress isn't even known until later, after a route lookup is done to figure out what the next hop is.

If the UDP datagram > MTU of the egress interface, the source of the UDP datagram isn't the pod, it's the local UDP stack, not the sending pod.

Comment 38 mcambria@redhat.com 2020-05-26 17:00:12 UTC

Coffee kicked in... Quick update to #c37.   What's described above is accurate, but ONLY takes place at L3 to L2 boundary.   Here we are already inside L2.  The best that can be done is IP Fragment every UDP/GENEVE datagram which is > egress MTU, which will kill performance.

To avoid, as suggested in #c36, is to make the MTU of each interface connected to the bridge <= the smallest interface of any (current or future) member of the bridge.

Comment 39 mcambria@redhat.com 2020-05-27 14:57:27 UTC

We'll need access to cluster to continue digging into this.

From what notes were saved off from last time, we believe that bare metal using 1500 mtu everywhere, so things are consistent.  We don't see a combination of jumbo and 1500 mtu.

Focus is back to why pod/veth is sending packets > mtu (with bad checksum).

Comment 43 Jiri Benc 2020-05-28 11:43:44 UTC

From an email from Michael:
> sh-4.4# ip -s link show genev_sys_6081
> 5: genev_sys_6081: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65000 qdisc 
> noqueue master ovs-system state UNKNOWN mode DEFAULT group default qlen 1000
>      link/ether fe:b8:50:24:dd:9a brd ff:ff:ff:ff:ff:ff
>      RX: bytes  packets  errors  dropped overrun mcast
>      568426     3239     0       0       0       0
>      TX: bytes  packets  errors  dropped carrier collsns
>      1084980    3040     14736   0       0       0
> 
> I think for geneve this counter is incremented here only: 
> https://github.com/torvalds/linux/blob/master/drivers/net/geneve.c#L981

It's easy to confirm using dynamic debug:

ip -s -s a s dev genev_sys_6081 ; echo -n 'file drivers/net/geneve.c +p' > /sys/kernel/debug/dynamic_debug/control ; sleep 5 ; echo -n 'file drivers/net/geneve.c -p' > /sys/kernel/debug/dynamic_debug/control ; ip -s -s a s dev genev_sys_6081

Indeed, this produces dmesg messages:

[61503.656184] genev_sys_6081: no tunnel metadata
[61503.984187] genev_sys_6081: no tunnel metadata
...

The number of the messages matches the increment in tx_error stats. This confirms the drops happen due to no tunnel metadata.

Comment 44 Jiri Benc 2020-05-28 15:59:24 UTC

Finally was able to install perf and get some meaningful info out of the box. (For the sake of anyone else debugging this, the key command to run after sshing to a node is 'toolbox'.)

The tx_error messages are mostly caused by 'coredns' and 'mdns-publisher' processes. They send the UDP packets directly to the genev_sys_6081 interface (likely, they send to all interfaces). Understandingly, those packets are dropped as they don't (and can't) contain the lwt metadata. This is misconfiguration of those two applications.

I'm seeing also some dropped packets sent by mld_ifc_timer_expire in the kernel. I'll look more into those.

Comment 45 Rashid Khan 2020-05-28 16:49:05 UTC

(In reply to Jiri Benc from comment #44)
> Finally was able to install perf and get some meaningful info out of the
> box. (For the sake of anyone else debugging this, the key command to run
> after sshing to a node is 'toolbox'.)
> 
> The tx_error messages are mostly caused by 'coredns' and 'mdns-publisher'
> processes. They send the UDP packets directly to the genev_sys_6081
> interface (likely, they send to all interfaces). Understandingly, those
> packets are dropped as they don't (and can't) contain the lwt metadata. This
> is misconfiguration of those two applications.
> 
> I'm seeing also some dropped packets sent by mld_ifc_timer_expire in the
> kernel. I'll look more into those.

Thanks Jiri
Who needs to do what to stop sending the UDP packets to the genev_sys_6081 and to other interfaces. 
Even if they might not be getting in the way of the scale testing, they will cause alarms for our customers. 
So we should try to find a cure for these large number of dropped packets. e.g. stop sending them. 
Are coredns and mdns-publisher in OVS or OVN or somewhere else?

Comment 46 Dan Williams 2020-05-28 17:12:41 UTC

(In reply to Rashid Khan from comment #45)
> (In reply to Jiri Benc from comment #44)
> > Finally was able to install perf and get some meaningful info out of the
> > box. (For the sake of anyone else debugging this, the key command to run
> > after sshing to a node is 'toolbox'.)
> > 
> > The tx_error messages are mostly caused by 'coredns' and 'mdns-publisher'
> > processes. They send the UDP packets directly to the genev_sys_6081
> > interface (likely, they send to all interfaces). Understandingly, those
> > packets are dropped as they don't (and can't) contain the lwt metadata. This
> > is misconfiguration of those two applications.
> > 
> > I'm seeing also some dropped packets sent by mld_ifc_timer_expire in the
> > kernel. I'll look more into those.
> 
> Thanks Jiri
> Who needs to do what to stop sending the UDP packets to the genev_sys_6081
> and to other interfaces. 
> Even if they might not be getting in the way of the scale testing, they will
> cause alarms for our customers. 
> So we should try to find a cure for these large number of dropped packets.
> e.g. stop sending them. 
> Are coredns and mdns-publisher in OVS or OVN or somewhere else?

I cloned this bug to https://bugzilla.redhat.com/show_bug.cgi?id=1841214 for Network Edge team to investigate getting CoreDNS/mdns-publisher to stop whatever they are doing.

It's likely we can close this bug soon, but I'd like to make sure there aren't other issues to look at (MTU mostly).

Comment 47 Antoni Segura Puimedon 2020-05-28 21:36:13 UTC

CoreDNS-mDNS and mdns-publisher are handled by my team. We're on it!

Comment 48 Dan Williams 2020-05-29 03:45:50 UTC

(In reply to Antoni Segura Puimedon from comment #47)
> CoreDNS-mDNS and mdns-publisher are handled by my team. We're on it!

I filed https://bugzilla.redhat.com/show_bug.cgi?id=1841214 as a clone for network edge team (because DNS). Should that one get closed, and this one moved to DNS component?

Comment 49 Dan Williams 2020-06-01 15:28:23 UTC

Closing this one as a duplicate of bug 1841214 since that bug now has a patch.

*** This bug has been marked as a duplicate of bug 1841214 ***