Description of problem:
On ovirt e2e tests we noticed errors on master/worker journal due to:
dial tcp: lookup api-int.ovirt1X.gcp.devcluster.openshift.com on 192.168.21X.1:53: no such host"
2. ovirt17-kcphn-worker-0-c2rd4 journal on CI job https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-ovirt-4.6/1302427462155636736:
cat journal|grep "api-int.*1:53: no such host"|wc -l
On ovirt CI 192.168.21X.1 is the upstream DNS, and api-int is resolveable only on CoreDNS.
That means that at some points the coreDNS is not available and the node tries to use the Upstream DNS.
The NetworkManager-resolve-prepender is responsible for adding the coredns to the resolv.conf. As we can see on https://bugzilla.redhat.com/show_bug.cgi?id=1846529#c28
by looking at ovirt11-wz8kt-worker-0-5686p workers journal :
during this lookup failure we can see :
# cat workers-journal | grep nm-dispatcher | grep 'worker-0-5686p' | grep resolv-prepender
Aug 16 02:45:15.570577 ovirt11-wz8kt-worker-0-5686p nm-dispatcher: <13>Aug 16 02:45:15 root: NM resolv-prepender triggered by ens3 dhcp4-change.
Aug 16 02:45:17.726649 ovirt11-wz8kt-worker-0-5686p nm-dispatcher: <13>Aug 16 02:45:17 root: NM resolv-prepender: Prepending 'nameserver 192.168.211.118' to /etc/resolv.conf (other nameservers from /var/run/NetworkManager/resolv.conf)
It takes 2s to finish, at the begining of the script we copy /var/run/NetworkManager/resolv.conf to /etc/resolv.conf that means that in those 2 seconds we have the wrong DNS and that will lead to unexpected problems.
*** Bug 1876215 has been marked as a duplicate of this bug. ***
We don't see the issue occurring at CI anymore.
Verified in OCP 4.6.0-0.nightly-2020-09-17-031725 with RHV 18.104.22.168-1.el8