Bug 1876215 - [oVirt] api-int not resolvable for a short period of time
Summary: [oVirt] api-int not resolvable for a short period of time
Keywords:
Status: CLOSED DUPLICATE of bug 1876216
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Machine Config Operator
Version: 4.6
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: Martin André
QA Contact: Michael Nguyen
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-09-06 12:39 UTC by Gal Zaidman
Modified: 2020-09-07 13:56 UTC (History)
0 users

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-09-07 13:56:07 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Gal Zaidman 2020-09-06 12:39:12 UTC
Description of problem:

On ovirt e2e tests we noticed errors on master/worker journal due to: 
dial tcp: lookup api-int.ovirt1X.gcp.devcluster.openshift.com on 192.168.21X.1:53: no such host"

see:
1. https://bugzilla.redhat.com/show_bug.cgi?id=1846529#c18
2. ovirt17-kcphn-worker-0-c2rd4 journal on CI job https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-ovirt-4.6/1302427462155636736:
cat journal|grep "api-int.*1:53: no such host"|wc -l
945

On ovirt CI 192.168.21X.1 is the upstream DNS, and api-int is resolveable only on CoreDNS.
That means that at some points the coreDNS is not available and the node tries to use the Upstream DNS.

Additional thoughts:
The NetworkManager-resolve-prepender is responsible for adding the coredns to the resolv.conf. As we can see on https://bugzilla.redhat.com/show_bug.cgi?id=1846529#c28
"""
by looking at ovirt11-wz8kt-worker-0-5686p workers journal [0]:

during this lookup failure we can see :

# cat workers-journal | grep nm-dispatcher | grep 'worker-0-5686p' | grep resolv-prepender
Aug 16 02:45:15.570577 ovirt11-wz8kt-worker-0-5686p nm-dispatcher[302731]: <13>Aug 16 02:45:15 root: NM resolv-prepender triggered by ens3 dhcp4-change.
Aug 16 02:45:17.726649 ovirt11-wz8kt-worker-0-5686p nm-dispatcher[302731]: <13>Aug 16 02:45:17 root: NM resolv-prepender: Prepending 'nameserver 192.168.211.118' to /etc/resolv.conf (other nameservers from /var/run/NetworkManager/resolv.conf)
 """

It takes 2s to finish, at the begining of the script we copy /var/run/NetworkManager/resolv.conf to /etc/resolv.conf that means that in those 2 seconds we have the wrong DNS and that will lead to unexpected problems.

Comment 1 Gal Zaidman 2020-09-07 13:56:07 UTC

*** This bug has been marked as a duplicate of bug 1876216 ***


Note You need to log in before you can comment on or make changes to this bug.