Version: OpenShift Installer 4.6.0-0.nightly-2020-10-01-181852 DEBUG Built from commit 540f6a9dc127936c1085511daf5961342ec1 Platform: vsphere ipi What happened? This script is supposed to add 127.0.0.1 to /etc/resolv.conf since coredns is running on bootstrap to provide DNS for api-int. https://github.com/openshift/installer/blob/master/data/data/bootstrap/vsphere/files/etc/NetworkManager/dispatcher.d/30-local-dns-prepender.template from bootkube... 1.Etcd: failed to list *v1.Etcd: Get "https://api-int.jcallen.vmc.devcluster.openshift.com:6443/apis/operator.openshift.io/v1/etcds?fieldSelector=metadata.name%3Dcluster&limit=500&resourceVersion=0":dial tcp: lookup api-int.jcallen.vmc.devcluster.openshift.com on 10.3.192.12:53: no such host Oct 02 00:21:57 ip-172-31-251-83.us-west-2.compute.internal bootkube.sh[2362]: E1002 00:21:57.354155 1 reflector.go:127] k8s.io/client-go.0/tools/cache/reflector.go:156: Failed to watch *v1.Etcd: failed to list *v1.Etcd: Get "https://api-int.jcallen.vmc.devcluster.openshift.com:6443/apis/operator.openshift.io/v1/etcds?fieldSelector=metadata.name%3Dcluster&limit=500&resourceVersion=0":dial tcp: lookup api-int.jcallen.vmc.devcluster.openshift.com on 10.3.192.12:53: no such host Oct 02 00:22:33 ip-172-31-251-83.us-west-2.compute.internal bootkube.sh[2362]: E1002 00:22:33.557896 1 reflector.go:127] k8s.io/client-go.0/tools/cache/reflector.go:156: Failed to watch *v1.Etcd: failed to list *v1.Etcd: Get "https://api-int.jcallen.vmc.devcluster.openshift.com:6443/apis/operator.openshift.io/v1/etcds?fieldSelector=metadata.name%3Dcluster&limit=500&resourceVersion=0":dial tcp: lookup api-int.jcallen.vmc.devcluster.openshift.com on 10.3.192.12:53: no such host Oct 02 00:23:30 ip-172-31-251-83.us-west-2.compute.internal bootkube.sh[2362]: E1002 00:23:30.568147 1 reflector.go:127] k8s.io/client-go.0/tools/cache/reflector.go:156: Failed to watch *v1.Etcd: failed to list *v1.Etcd: Get "https://api-int.jcallen.vmc.devcluster.openshift.com:6443/apis/operator.openshift.io/v1/etcds?fieldSelector=metadata.name%3Dcluster&limit=500&resourceVersion=0":dial tcp: lookup api-int.jcallen.vmc.devcluster.openshift.com on 10.3.192.12:53: no such host ^C [root@ip-172-31-251-83 ~]# journalctl -fu NetworkManager-dispatcher.service -- Logs begin at Fri 2020-10-02 00:08:20 UTC. -- Oct 02 00:08:29 localhost systemd[1]: Starting Network Manager Script Dispatcher Service... Oct 02 00:08:29 localhost systemd[1]: Started Network Manager Script Dispatcher Service. Oct 02 00:08:29 localhost nm-dispatcher[1692]: <13>Oct 2 00:08:29 root: NM local-dns-prepender triggered by ens192 up. Oct 02 00:08:29 localhost nm-dispatcher[1692]: <13>Oct 2 00:08:29 root: NM local-dns-prepender: Checking if local DNS IP is the first entry in resolv.conf Oct 02 00:08:29 localhost nm-dispatcher[1692]: <13>Oct 2 00:08:29 root: NM local-dns-prepender: Looking for '# Generated by NetworkManager' in /etc/resolv.conf to place 'nameserver 127.0.0.1' ^C [root@ip-172-31-251-83 ~]# cat /etc/resolv.conf # Generated by NetworkManager search us-west-2.compute.internal nameserver 10.3.192.12
Created attachment 1718337 [details] bootstrap log bundle
Not sure there is much in the log, I have the bootstrap node still available though. DEBUG Unable to connect to the server: dial tcp: lookup api-int.jcallen.vmc.devcluster.openshift.com on 10.3.192.12:53: no such host DEBUG Unable to connect to the server: dial tcp: lookup api-int.jcallen.vmc.devcluster.openshift.com on 10.3.192.12:53: no such host DEBUG Gather remote logs DEBUG Collecting info from 172.31.251.122 DEBUG Unable to connect to the server: dial tcp: lookup api-int.jcallen.vmc.devcluster.openshift.com on 10.3.192.12:53: no such host DEBUG lost connection EBUG Warning: Permanently added '172.31.251.122' (ECDSA) to the list of known hosts. EBUG core.251.122: Permission denied (publickey,gssapi-keyex,gssapi-with-mic). DEBUG Collecting info from 172.31.251.18 DEBUG lost connection EBUG Warning: Permanently added '172.31.251.18' (ECDSA) to the list of known hosts. EBUG core.251.18: Permission denied (publickey,gssapi-keyex,gssapi-with-mic). DEBUG Collecting info from 172.31.251.144 DEBUG lost connection EBUG Warning: Permanently added '172.31.251.144' (ECDSA) to the list of known hosts. EBUG core.251.144: Permission denied (publickey,gssapi-keyex,gssapi-with-mic). DEBUG Log bundle written to /var/home/core/log-bundle-20201002004001.tar.gz INFO Bootstrap gather logs captured here "/projects/installer-testing/vsphere-ipi/log-bundle-20201002004001.tar.gz" FATAL Bootstrap failed to complete: failed to wait for bootstrapping to complete: timed out waiting for the condition
Another example of this: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ocp-4.6-e2e-vsphere/1311887212656201728 1033 Oct 02 05:08:41 ip-172-31-254-133.us-west-2.compute.internal bootkube.sh[2344]: E1002 05:08:41.177442 1 reflector.go:127] k8s.io/client-go.0/tools/cache/reflector.go:156: Failed to watch *v1.Etcd: failed to list *v1.Etcd: Get "https://api -int.ci-op-64sd0h4w-0aec4.origin-ci-int-aws.dev.rhcloud.com:6443/apis/operator.openshift.io/v1/etcds?fieldSelector=metadata.name%3Dcluster&limit=500&resourceVersion=0": dial tcp: lookup api-int.ci-op-64sd0h4w-0aec4.origin-ci-int-aws.dev.rhcloud.com o n 10.3.192.12:53: no such host
Moving to the mDNS team that usually knows how to handle this setup.
If this was the wrong component please help me move it to the team that handles the hosted DNS for baremetal deployments.
After just randomly going through MCO PRs I wonder if this is the _real_ fix: https://github.com/openshift/machine-config-operator/pull/2030/files
Install ipi on vsphere with 4.7.0-0.nightly-2020-10-21-001511 and succeed, so move the bug to VERIFIED
This was fixed as part of a different bug, so we don't need doc text on this one.