Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1841214

Summary: High number of TX errors on geneve interfaces [coredns misconfiguration]
Product: OpenShift Container Platform Reporter: Dan Williams <dcbw>
Component: NetworkingAssignee: Daneyon Hansen <dhansen>
Networking sub component: mDNS QA Contact: Victor Voronkov <vvoronko>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: urgent CC: amcdermo, andbartl, aos-bugs, bbennett, beth.white, bnemec, ctrautma, dblack, dcbw, dhansen, gnault, jbenc, jishi, jnordell, jtaleric, mcambria, mkarg, mmasters, mmichels, rkhan, smalleni, tim-redhat, vvoronko
Version: 4.4Keywords: Performance, Triaged
Target Milestone: ---   
Target Release: 4.6.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1834918 Environment:
Last Closed: 2020-10-27 16:01:56 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1848094, 1852126, 1890797    

Comment 1 Ben Bennett 2020-05-28 16:24:58 UTC
Pulling the relevant comment for the DNS bug out:


--- Additional comment from Jiri Benc on 2020-05-28 10:59:24 CDT ---

Finally was able to install perf and get some meaningful info out of the box. (For the sake of anyone else debugging this, the key command to run after sshing to a node is 'toolbox'.)

The tx_error messages are mostly caused by 'coredns' and 'mdns-publisher' processes. They send the UDP packets directly to the genev_sys_6081 interface (likely, they send to all interfaces). Understandingly, those packets are dropped as they don't (and can't) contain the lwt metadata. This is misconfiguration of those two applications.

I'm seeing also some dropped packets sent by mld_ifc_timer_expire in the kernel. I'll look more into those.

Comment 3 Andrew McDermott 2020-05-29 16:10:03 UTC
Moved back to 4.5 to take another look.

Comment 4 Dan Williams 2020-06-01 15:28:23 UTC
*** Bug 1834918 has been marked as a duplicate of this bug. ***

Comment 8 Ben Nemec 2020-06-18 16:47:24 UTC
Until this gets vendored into coredns it isn't complete.

Comment 10 Miciah Dashiel Butler Masters 2020-07-01 14:46:19 UTC
The fix is in the coredns-mdns plugin, so I'm setting the component to "Networking" with subcomponent "mDNS".

Comment 17 Victor Voronkov 2020-10-04 18:02:30 UTC
[kni@provisionhost-0-0 ~]$ oc version
Client Version: 4.6.0-0.nightly-2020-10-02-160623
Server Version: 4.6.0-0.ci-2020-10-02-054056
Kubernetes Version: v1.19.0-rc.2.1064+1fc699e9f6becb-dirty

IPv6:
[core@master-0-0 ~]$ cat /etc/coredns/Corefile
. {
    errors
    health :18080
    mdns ocp-edge-cluster-0.qe.lab.redhat.com 0 ocp-edge-cluster-0 fd2e:6f44:5dd8::109
    forward . fe80::5054:ff:fe1e:3b18%br-ex fd2e:6f44:5dd8::1
... Output ommitted

[core@master-0-0 ~]$ ifconfig genev_sys_6081
genev_sys_6081: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 65000
        inet6 fe80::b837:95ff:fe2c:b28  prefixlen 64  scopeid 0x20<link>
        ether ba:37:95:2c:0b:28  txqueuelen 1000  (Ethernet)
        RX packets 2460982  bytes 552286598 (526.7 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 2239232  bytes 983737284 (938.1 MiB)
        TX errors 7  dropped 0 overruns 0  carrier 0  collisions 0

IPv4:
[core@master-0-0 ~]$ cat /etc/coredns/Corefile
. {
    errors
    health :18080
    mdns ocp-edge-cluster-0.qe.lab.redhat.com 0 ocp-edge-cluster-0 192.168.123.113
    forward . 192.168.123.1
...Output ommitted

[core@master-0-0 ~]$ ifconfig genev_sys_6081
genev_sys_6081: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 65000
        inet6 fe80::580e:22ff:fe27:7c0f  prefixlen 64  scopeid 0x20<link>
        ether 5a:0e:22:27:7c:0f  txqueuelen 1000  (Ethernet)
        RX packets 270685  bytes 78656649 (75.0 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 264560  bytes 109358304 (104.2 MiB)
        TX errors 7  dropped 0 overruns 0  carrier 0  collisions 0

Comment 20 errata-xmlrpc 2020-10-27 16:01:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196

Comment 24 Red Hat Bugzilla 2023-09-15 00:32:11 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days