Bug 1898097

Summary: mDNS floods the baremetal network
Product: OpenShift Container Platform Reporter: Yuval Kashtan <ykashtan>
Component: NetworkingAssignee: Ben Nemec <bnemec>
Networking sub component: mDNS QA Contact: Oleg Sher <osher>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: high CC: afasano, asegurap, augol, bbennett, dblack, jlema, kseremet, smalleni
Version: 4.6Keywords: Triaged
Target Milestone: ---   
Target Release: 4.7.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: The library used to provide mDNS services in the cluster did not properly implement the mDNS protocol. Consequence: Excessive multicast traffic was generated. Fix: Limit multicast frequency to once per second. Result: Significantly reduced multicast traffic.
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-02-24 15:33:27 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1936539    
Attachments:
Description Flags
mdns tcpdump capture none

Description Yuval Kashtan 2020-11-16 11:30:52 UTC
Created attachment 1729743 [details]
mdns tcpdump capture

Version: 4.6

$ openshift-install version
openshift-baremetal-install 4.6.0

Platform: IPI

What happened?
PnT Lab had to disconnect us from network due to excessive multicasts flooding the network.
these multicasts are all mDNS traffic.
see attachment for tcpdump

What did you expect to happen?
as all our servers have proper DNS records, we dont need mDNS.
I'd want a install-config parameter to disable mDNS

How to reproduce it (as minimally and precisely as possible)?
just install a bunch of cluster on same broadcast domain
and observe tcpdump

Comment 1 Andrea Fasano 2020-11-24 17:50:56 UTC
*** Bug 1898101 has been marked as a duplicate of this bug. ***

Comment 2 Yuval Kashtan 2020-11-25 11:09:14 UTC
The reason I opened 2 distinct BZs is that 
for the flood bug, even changing the frequency would be a solution
but for the scalability one,
really need a different solution (than mDNS, or at least it's current implementation)

also, the other BZ include some (very) good discussion which we're missing here.

Comment 3 Sai Sindhur Malleni 2020-12-02 16:52:21 UTC
Even the temporary fix mentioned in https://bugzilla.redhat.com/show_bug.cgi?id=1898101#c4 does not fix the issue. Please see: https://bugzilla.redhat.com/show_bug.cgi?id=1898101#c19


Also can we use https://bugzilla.redhat.com/show_bug.cgi?id=1898101 instead of this as the main tracking BZ as there is a lot of history in that.

Comment 5 Sai Sindhur Malleni 2020-12-04 20:27:31 UTC
I had two nodes in the cluster that did not pick up the machineconfig change fue to bad nodeselectors and those were enough to DDoS the network :) . Fixing that, I see a drastic drop in the number of mcast packets.

Comment 9 Ben Nemec 2021-02-02 17:32:49 UTC
*** Bug 1893670 has been marked as a duplicate of this bug. ***

Comment 12 errata-xmlrpc 2021-02-24 15:33:27 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633