Bug 1944196

Summary: Installation fails for OCP 4.7 on vmware if api-int dns entry is missing
Product: OpenShift Container Platform Reporter: Victor Medina <vmedina>
Component: InstallerAssignee: Ben Nemec <bnemec>
Installer sub component: openshift-installer QA Contact: Victor Voronkov <vvoronko>
Status: CLOSED DUPLICATE Docs Contact:
Severity: high    
Priority: high CC: bnemec, djuran, jcallen, nschuetz
Version: 4.7Keywords: Triaged
Target Milestone: ---   
Target Release: 4.8.0   
Hardware: x86_64   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-06-10 17:37:48 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
installation config
none
log-bundle none

Description Victor Medina 2021-03-29 13:50:20 UTC
Description of problem:

Installation fails because dns entry missing.  Not stated in documentation that it is needed
The entry that seems to be needed is the api-int   entyr, see below.

The following error message is visible in the bootstrap server during deploy:

Mar 24 07:47:45 sekiius00660.exilis.npee.seki.gic.ericsson.se bootkube.sh[2389]: E0324 07:47:45.301079       1 reflector.go:138] k8s.io/client-go.1/tools/cache/reflector.go:167: Failed to watch *v1.Etcd: failed to list *v1.Etcd: Get "https://api-int.ocp007.exilis.npee.seki.gic.ericsson.se:6443/apis/operator.openshift.io/v1/etcds?fieldSelector=metadata.name%3Dcluster&limit=500&resourceVersion=0": dial tcp: lookup api-int.ocp007.exilis.npee.seki.gic.ericsson.se on 10.221.16.10:53: no such host

Version-Release number of selected component (if applicable):

4.7

Comment 1 Ben Nemec 2021-04-07 20:55:14 UTC
Is this IPI or UPI? In UPI api-int does need to be provided externally, so this would be expected. In IPI it should be provided by the internal coredns, so we would need to see the logs and Corefile from coredns on the bootstrap to determine why it isn't working.

Comment 2 Victor Medina 2021-04-12 06:52:42 UTC
Created attachment 1771273 [details]
installation config

Comment 4 Ben Nemec 2021-04-15 15:35:03 UTC
The install-config confirms that this is IPI. I still need the logs from coredns to determine why the record isn't being found.

Looking at this again, it's also possible resolv.conf is not correct. The first nameserver listed should be 127.0.0.1 so it uses the local coredns. If that is not the case then it would explain why the record isn't found. Then I need the nm-dispatcher logs from the bootstrap journal. These can be collected with "journalctl | grep nm-dispatcher".

So in short, I need two things:
1) coredns logs
2) nm-dispatcher logs

One of those two should tell us what went wrong.

Comment 6 Joseph Callen 2021-05-24 19:03:54 UTC
Created attachment 1786641 [details]
log-bundle

Comment 9 Ben Nemec 2021-06-10 17:37:48 UTC
https://github.com/openshift/installer/pull/4973 ended up being the fix for this. Duplicating to that bug.

*** This bug has been marked as a duplicate of bug 1966862 ***