Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 2033550

Summary: [vsphere] api-int dns could not be resolved on bootstrap server
Product: OpenShift Container Platform Reporter: jima
Component: InstallerAssignee: aos-install
Installer sub component: openshift-installer QA Contact: jima
Status: CLOSED DUPLICATE Docs Contact:
Severity: high    
Priority: unspecified CC: mstaeble
Version: 4.10   
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-12-17 14:10:57 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description jima 2021-12-17 08:07:51 UTC
Version:
4.10.0-0.nightly-2021-12-14-083101

Platform:
vsphere IPI

What happened?
hit the same issue reported in bz https://bugzilla.redhat.com/show_bug.cgi?id=1884435 on recent 4.10 nightly build.

loopback is not added into resolver, unable to resolve api-int dns, so bootstrap failed

[root@ip-172-31-247-110 ~]# ls -ltr /etc/resolv.conf 
-rw-r--r--. 1 root root 87 Dec 17 03:17 /etc/resolv.conf

[root@ip-172-31-247-110 ~]# cat /etc/resolv.conf 
# Generated by NetworkManager
search us-west-2.compute.internal
nameserver 10.3.192.12


In bootkube log:
Dec 17 03:29:19 ip-172-31-247-110.us-west-2.compute.internal bootkube.sh[2428]: Tearing down temporary bootstrap control plane...
Dec 17 03:29:19 ip-172-31-247-110.us-west-2.compute.internal bootkube.sh[2428]: Sending bootstrap-finished event.Skipped "secret-service-network-serving-signer.yaml" secrets.v1./service-network-serving-signer -n openshift-kube-apiserver-operator as it already exists
Dec 17 03:29:19 ip-172-31-247-110.us-west-2.compute.internal bootkube.sh[2428]: Restoring CVO overrides
Dec 17 03:29:19 ip-172-31-247-110.us-west-2.compute.internal bootkube.sh[2428]: Unable to connect to the server: dial tcp: lookup api-int.jim1217atest.qe.devcluster.openshift.com on 10.3.192.12:53: no such host
Dec 17 03:29:29 ip-172-31-247-110.us-west-2.compute.internal bootkube.sh[2428]: Trying again to restore CVO overrides

But from NetworkManager-dispatcher.service log, 127.0.0.1 could be added into /etc/resolv.conf, actually not.

[root@ip-172-31-247-110 ~]# journalctl -u NetworkManager-dispatcher.service
-- Logs begin at Fri 2021-12-17 03:17:26 UTC, end at Fri 2021-12-17 06:57:09 UTC. --
Dec 17 03:17:36 localhost systemd[1]: Starting Network Manager Script Dispatcher Service...
Dec 17 03:17:36 localhost systemd[1]: Started Network Manager Script Dispatcher Service.
Dec 17 03:17:36 localhost root[1663]: NM local-dns-prepender triggered by ens192 up.
Dec 17 03:17:36 localhost nm-dispatcher[1559]: <13>Dec 17 03:17:36 root: NM local-dns-prepender triggered by ens192 up.
Dec 17 03:17:36 localhost nm-dispatcher[1559]: Failed to get unit file state for systemd-resolved.service: No such file or directory
Dec 17 03:17:36 localhost nm-dispatcher[1559]: <13>Dec 17 03:17:36 root: NM local-dns-prepender: Checking if local DNS IP is the first entry in resolv.conf
Dec 17 03:17:36 localhost nm-dispatcher[1559]: <13>Dec 17 03:17:36 root: NM local-dns-prepender: Looking for '# Generated by NetworkManager' in /etc/resolv.conf to place 'nameserver 127.0.0.1'
Dec 17 03:17:48 ip-172-31-247-110.us-west-2.compute.internal systemd[1]: NetworkManager-dispatcher.service: Succeeded.
Dec 17 04:21:21 ip-172-31-247-110.us-west-2.compute.internal systemd[1]: Starting Network Manager Script Dispatcher Service...
Dec 17 04:21:21 ip-172-31-247-110.us-west-2.compute.internal systemd[1]: Started Network Manager Script Dispatcher Service.
Dec 17 04:21:21 ip-172-31-247-110.us-west-2.compute.internal root[23517]: NM local-dns-prepender triggered by ens192 dhcp4-change.
Dec 17 04:21:21 ip-172-31-247-110.us-west-2.compute.internal nm-dispatcher[23504]: <13>Dec 17 04:21:21 root: NM local-dns-prepender triggered by ens192 dhcp4-change.
Dec 17 04:21:21 ip-172-31-247-110.us-west-2.compute.internal root[23520]: NM resolv-prepender: Checking for nameservers in /var/run/NetworkManager/resolv.conf
Dec 17 04:21:21 ip-172-31-247-110.us-west-2.compute.internal nm-dispatcher[23504]: <13>Dec 17 04:21:21 root: NM resolv-prepender: Checking for nameservers in /var/run/NetworkManager/resolv.conf
Dec 17 04:21:21 ip-172-31-247-110.us-west-2.compute.internal nm-dispatcher[23504]: nameserver 10.3.192.12
Dec 17 04:21:21 ip-172-31-247-110.us-west-2.compute.internal nm-dispatcher[23504]: Failed to get unit file state for systemd-resolved.service: No such file or directory
Dec 17 04:21:21 ip-172-31-247-110.us-west-2.compute.internal root[23523]: NM local-dns-prepender: Checking if local DNS IP is the first entry in resolv.conf
Dec 17 04:21:21 ip-172-31-247-110.us-west-2.compute.internal nm-dispatcher[23504]: <13>Dec 17 04:21:21 root: NM local-dns-prepender: Checking if local DNS IP is the first entry in resolv.conf
Dec 17 04:21:21 ip-172-31-247-110.us-west-2.compute.internal nm-dispatcher[23504]: <13>Dec 17 04:21:21 root: NM local-dns-prepender: Looking for '# Generated by NetworkManager' in /etc/resolv.conf to place 'nam>
Dec 17 04:21:31 ip-172-31-247-110.us-west-2.compute.internal systemd[1]: NetworkManager-dispatcher.service: Succeeded.

After 1 hour, I saw that 127.0.0.1 was added by nm-dispatcher due to dhcp status changed (bound -> extended). 

[root@ip-172-31-247-110 ~]# ls -ltr /etc/resolv.conf 
-rw-r--r--. 1 root root 108 Dec 17 04:21 /etc/resolv.conf
[root@ip-172-31-247-110 ~]# cat /etc/resolv.conf 
# Generated by NetworkManager
nameserver 127.0.0.1
search us-west-2.compute.internal
nameserver 10.3.192.12

and bootkube.server is completed this time.
Dec 17 04:21:27 ip-172-31-247-110.us-west-2.compute.internal bootkube.sh[2428]: Waiting for CEO to finish...
Dec 17 04:21:27 ip-172-31-247-110.us-west-2.compute.internal bootkube.sh[2428]: I1217 04:21:27.733238       1 waitforceo.go:64] Cluster etcd operator bootstrapped successfully
Dec 17 04:21:27 ip-172-31-247-110.us-west-2.compute.internal bootkube.sh[2428]: I1217 04:21:27.734789       1 waitforceo.go:58] cluster-etcd-operator bootstrap etcd
Dec 17 04:21:27 ip-172-31-247-110.us-west-2.compute.internal bootkube.sh[2428]: bootkube.service complete
Dec 17 04:21:27 ip-172-31-247-110.us-west-2.compute.internal systemd[1]: bootkube.service: Succeeded.


What did you expect to happen?
Installation is successful, but fails at bootstrap stage.

How to reproduce it (as minimally and precisely as possible)?
several times

Anything else we need to know?

Comment 3 Matthew Staebler 2021-12-17 14:10:57 UTC

*** This bug has been marked as a duplicate of bug 2029438 ***