Bug 1966116
| Summary: | DNS SRV request which worked in 4.7.9 stopped working in 4.7.11 | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Gabriel Stein <gferrazs> |
| Component: | Networking | Assignee: | Stephen Greene <sgreene> |
| Networking sub component: | DNS | QA Contact: | Hongan Li <hongli> |
| Status: | CLOSED ERRATA | Docs Contact: | |
| Severity: | urgent | ||
| Priority: | high | CC: | aos-bugs, mapandey, mjoseph, mmasters, sgreene |
| Version: | 4.6 | Keywords: | Regression |
| Target Milestone: | --- | ||
| Target Release: | 4.8.0 | ||
| Hardware: | All | ||
| OS: | All | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: |
Cause:
The fix for Bug 1953097 enabled the CoreDNS Bufsize plugin with a size of 1232 bytes. Some primitive DNS resolvers are not capable of receiving DNS response messages over UDP that are greater than 512 bytes. Note that DNS resolvers that retry lookups using TCP (such as Dig) are not affected by this bug.
Consequence:
Some DNS resolvers (such as Go's internal DNS library) are unable to receive long-winded DNS responses from openshift-dns.
Fix:
Set the CoreDNS bufsize to 512 bytes for all servers.
Result:
DNS Clients that require UDP DNS messages to not exceed 512 bytes function as expected.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-07-27 23:10:35 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 1967766 | ||
|
Comment 4
Stephen Greene
2021-06-01 19:10:51 UTC
This appears to be a regression per comment 4, and we backported the change that caused it to 4.6.z, so we'll need to fix this in 4.8 and backport to 4.6.z. > Who is impacted? If we have to block upgrade edges based on this issue, which edges would need blocking? Customers running workloads that utilize Go's built-in DNS resolver, such as Grafana Loki, to resolve DNS records that exceed 512 bytes. This bug is a regression caused by the fix for Bug 1949361, which merged into 4.7.11 and 4.6.30. Other primitive DNS resolvers that cannot accept UDP DNS messages longer than 512 bytes would be affected. Note that DNS resolvers that retry lookups using TCP (such as Dig) are not affected by this bug. > What is the impact? Is it serious enough to warrant blocking edges? This bug could affect DNS queries of any type for primitive DNS resolvers, but long-winded SRV lookups, such as those used by Loki, are likely to hit this issue. > How involved is remediation (even moderately serious impacts might be acceptable if they are easy to mitigate)? https://access.redhat.com/solutions/5984291 details an immediate workaround that involves configuring CoreDNS to force the use of TCP for DNS. This workaround is entirely unsupported. Service names can also be shortened in the case of SRV lookups as a potential mitigation. In some cases, a workload's DNS client could be switched out, or better configured, so that either DNS UDP messages longer than 512 bytes are accepted or failed quires are retried over TCP. > Is this a regression (if all previous versions were also vulnerable, updating to the new, vulnerable version does not increase exposure)? Yes, this is a regression caused by the fix merged for Bug 1949361. verified with 4.8.0-0.nightly-2021-06-03-221810 and passed.
The bufsize is set to 512 for all servers.
# oc -n openshift-dns get cm/dns-default -oyaml
apiVersion: v1
data:
Corefile: |
# test
mytest.ocp:5353 {
forward . 192.168.1.1
errors
bufsize 512
}
.:5353 {
bufsize 512
errors
health {
lameduck 20s
}
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438 |