Bug 1841214 - High number of TX errors on geneve interfaces [coredns misconfiguration] [NEEDINFO]
Summary: High number of TX errors on geneve interfaces [coredns misconfiguration]
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.4
Hardware: Unspecified
OS: Unspecified
urgent
high
Target Milestone: ---
: 4.6.0
Assignee: Daneyon Hansen
QA Contact: Victor Voronkov
URL:
Whiteboard:
: 1834918 (view as bug list)
Depends On:
Blocks: 1848094 1852126 1890797
TreeView+ depends on / blocked
 
Reported: 2020-05-28 16:21 UTC by Dan Williams
Modified: 2021-01-06 18:17 UTC (History)
23 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1834918
Environment:
Last Closed: 2020-10-27 16:01:56 UTC
Target Upstream Version:
rkhan: needinfo? (bbennett)


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift coredns-mdns pull 60 0 None closed Bug 1841214: Add bind address configuration setting 2021-02-19 11:17:20 UTC
Github openshift coredns pull 31 0 None closed Bug 1841214: Update vendoring of coredns-mdns 2021-02-19 11:17:20 UTC
Red Hat Product Errata RHBA-2020:4196 0 None None None 2020-10-27 16:02:36 UTC

Comment 1 Ben Bennett 2020-05-28 16:24:58 UTC
Pulling the relevant comment for the DNS bug out:


--- Additional comment from Jiri Benc on 2020-05-28 10:59:24 CDT ---

Finally was able to install perf and get some meaningful info out of the box. (For the sake of anyone else debugging this, the key command to run after sshing to a node is 'toolbox'.)

The tx_error messages are mostly caused by 'coredns' and 'mdns-publisher' processes. They send the UDP packets directly to the genev_sys_6081 interface (likely, they send to all interfaces). Understandingly, those packets are dropped as they don't (and can't) contain the lwt metadata. This is misconfiguration of those two applications.

I'm seeing also some dropped packets sent by mld_ifc_timer_expire in the kernel. I'll look more into those.

Comment 3 Andrew McDermott 2020-05-29 16:10:03 UTC
Moved back to 4.5 to take another look.

Comment 4 Dan Williams 2020-06-01 15:28:23 UTC
*** Bug 1834918 has been marked as a duplicate of this bug. ***

Comment 8 Ben Nemec 2020-06-18 16:47:24 UTC
Until this gets vendored into coredns it isn't complete.

Comment 10 Miciah Dashiel Butler Masters 2020-07-01 14:46:19 UTC
The fix is in the coredns-mdns plugin, so I'm setting the component to "Networking" with subcomponent "mDNS".

Comment 17 Victor Voronkov 2020-10-04 18:02:30 UTC
[kni@provisionhost-0-0 ~]$ oc version
Client Version: 4.6.0-0.nightly-2020-10-02-160623
Server Version: 4.6.0-0.ci-2020-10-02-054056
Kubernetes Version: v1.19.0-rc.2.1064+1fc699e9f6becb-dirty

IPv6:
[core@master-0-0 ~]$ cat /etc/coredns/Corefile
. {
    errors
    health :18080
    mdns ocp-edge-cluster-0.qe.lab.redhat.com 0 ocp-edge-cluster-0 fd2e:6f44:5dd8::109
    forward . fe80::5054:ff:fe1e:3b18%br-ex fd2e:6f44:5dd8::1
... Output ommitted

[core@master-0-0 ~]$ ifconfig genev_sys_6081
genev_sys_6081: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 65000
        inet6 fe80::b837:95ff:fe2c:b28  prefixlen 64  scopeid 0x20<link>
        ether ba:37:95:2c:0b:28  txqueuelen 1000  (Ethernet)
        RX packets 2460982  bytes 552286598 (526.7 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 2239232  bytes 983737284 (938.1 MiB)
        TX errors 7  dropped 0 overruns 0  carrier 0  collisions 0

IPv4:
[core@master-0-0 ~]$ cat /etc/coredns/Corefile
. {
    errors
    health :18080
    mdns ocp-edge-cluster-0.qe.lab.redhat.com 0 ocp-edge-cluster-0 192.168.123.113
    forward . 192.168.123.1
...Output ommitted

[core@master-0-0 ~]$ ifconfig genev_sys_6081
genev_sys_6081: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 65000
        inet6 fe80::580e:22ff:fe27:7c0f  prefixlen 64  scopeid 0x20<link>
        ether 5a:0e:22:27:7c:0f  txqueuelen 1000  (Ethernet)
        RX packets 270685  bytes 78656649 (75.0 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 264560  bytes 109358304 (104.2 MiB)
        TX errors 7  dropped 0 overruns 0  carrier 0  collisions 0

Comment 20 errata-xmlrpc 2020-10-27 16:01:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196


Note You need to log in before you can comment on or make changes to this bug.