1921797 – [OCP4.6 on Azure] packets dropped between master and worker

Bug 1921797 - [OCP4.6 on Azure] packets dropped between master and worker

Summary: [OCP4.6 on Azure] packets dropped between master and worker

Keywords:
Status:	CLOSED DUPLICATE of bug 1825219
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Networking
Sub Component:
Version:	4.6
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	mcambria@redhat.com
QA Contact:	zhaozhanqi
Docs Contact:
URL:
Whiteboard:
Depends On:	1941753
Blocks:
TreeView+	depends on / blocked

Reported:	2021-01-28 15:58 UTC by Angelo Gabrieli
Modified:	2024-06-14 00:04 UTC (History)
CC List:	17 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1941753 (view as bug list)
Environment:
Last Closed:	2021-05-10 17:59:23 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Angelo Gabrieli 2021-01-28 15:58:06 UTC

Description of problem:

OCP version: 4.6.12
Cloud provider: Azure
RHCOS kernel version: 4.18.0-193.40.1.el8_2.x86_64
Network plugin: SDN

- create a new test namespace
- create a new dummy (sleep) pod on a worker node and access it with `oc rsh`
- perform a ping to the DNS pods IP 
- perform a dig to the DNS pods IP


# oc rsh sleep
sh-4.2#
sh-4.2# ping 10.130.0.38
PING 10.130.0.38 (10.130.0.38) 56(84) bytes of data.
^C
--- 10.130.0.38 ping statistics ---
5 packets transmitted, 0 received, 100% packet loss, time 4117ms

sh-4.2#
sh-4.2# dig @10.130.0.38 -p 5353 <some DNS name>

; <<>> DiG 9.11.4-P2-RedHat-9.11.4-26.P2.el7_9.3 <<>> @10.130.0.38 -p 5353 <some DNS name>
; (1 server found)
;; global options: +cmd
;; connection timed out; no servers could be reached
sh-4.2#


No NetworkPolicy in place:


# oc get networkpolicy -A
No resources found


Version-Release number of selected component (if applicable):
OCP version: 4.6.12
Cloud provider: Azure
RHCOS kernel version: 4.18.0-193.40.1.el8_2.x86_64


How reproducible:


Steps to Reproduce:
1.
2.
3.


Actual results:
VXLAN Network traffic blocked


Expected results:
VXLAN Network traffic allowed


Additional info:

Comment 17 Ben Bennett 2021-03-04 14:56:45 UTC


*** This bug has been marked as a duplicate of bug 1933761 ***

Comment 18 Ben Bennett 2021-03-04 14:57:41 UTC


*** This bug has been marked as a duplicate of bug 1928773 ***

Comment 19 Ben Bennett 2021-03-04 15:02:23 UTC

https://bugzilla.redhat.com/show_bug.cgi?id=1928773 will change the DNS service to prefer the resolver on the local node if present.  This should prevent a lot of cross-node dns traffic.

https://bugzilla.redhat.com/show_bug.cgi?id=1933761 is also related (and will be backported to 4.6) since it changes the max ttl that coredns will return to clients to 900s (from 30s).  This means that if an upstream resolver sets a high ttl, we cap it at 30s today.  After the change we will cap at 15m.  That allows the pod's resolvers to cache the responses for much longer, and should avoid repeated dns requests.

Comment 26 Dan Winship 2021-03-22 17:09:24 UTC

It seems like this is a bug in OVS. I have cloned this bug to bug 1941753 for the OVS team to investigate. (Assuming it is an OVS bug, that bug will track fixing it in OVS and then this bug will track getting the fixed OVS package into OCP.)

Comment 27 zhaozhanqi 2021-03-26 05:08:57 UTC

is this same issue with https://bugzilla.redhat.com/show_bug.cgi?id=1825219 ?

Comment 33 Ben Bennett 2021-05-10 17:59:23 UTC


*** This bug has been marked as a duplicate of bug 1825219 ***

Note You need to log in before you can comment on or make changes to this bug.