Bug 1944196 - Installation fails for OCP 4.7 on vmware if api-int dns entry is missing
Summary: Installation fails for OCP 4.7 on vmware if api-int dns entry is missing
Keywords:
Status: CLOSED DUPLICATE of bug 1966862
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 4.7
Hardware: x86_64
OS: Unspecified
high
high
Target Milestone: ---
: 4.8.0
Assignee: Ben Nemec
QA Contact: Victor Voronkov
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-03-29 13:50 UTC by Victor Medina
Modified: 2021-11-02 06:24 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-06-10 17:37:48 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
installation config (1.09 KB, text/plain)
2021-04-12 06:52 UTC, Victor Medina
no flags Details
log-bundle (1.11 MB, application/gzip)
2021-05-24 19:03 UTC, Joseph Callen
no flags Details

Description Victor Medina 2021-03-29 13:50:20 UTC
Description of problem:

Installation fails because dns entry missing.  Not stated in documentation that it is needed
The entry that seems to be needed is the api-int   entyr, see below.

The following error message is visible in the bootstrap server during deploy:

Mar 24 07:47:45 sekiius00660.exilis.npee.seki.gic.ericsson.se bootkube.sh[2389]: E0324 07:47:45.301079       1 reflector.go:138] k8s.io/client-go.1/tools/cache/reflector.go:167: Failed to watch *v1.Etcd: failed to list *v1.Etcd: Get "https://api-int.ocp007.exilis.npee.seki.gic.ericsson.se:6443/apis/operator.openshift.io/v1/etcds?fieldSelector=metadata.name%3Dcluster&limit=500&resourceVersion=0": dial tcp: lookup api-int.ocp007.exilis.npee.seki.gic.ericsson.se on 10.221.16.10:53: no such host

Version-Release number of selected component (if applicable):

4.7

Comment 1 Ben Nemec 2021-04-07 20:55:14 UTC
Is this IPI or UPI? In UPI api-int does need to be provided externally, so this would be expected. In IPI it should be provided by the internal coredns, so we would need to see the logs and Corefile from coredns on the bootstrap to determine why it isn't working.

Comment 2 Victor Medina 2021-04-12 06:52:42 UTC
Created attachment 1771273 [details]
installation config

Comment 4 Ben Nemec 2021-04-15 15:35:03 UTC
The install-config confirms that this is IPI. I still need the logs from coredns to determine why the record isn't being found.

Looking at this again, it's also possible resolv.conf is not correct. The first nameserver listed should be 127.0.0.1 so it uses the local coredns. If that is not the case then it would explain why the record isn't found. Then I need the nm-dispatcher logs from the bootstrap journal. These can be collected with "journalctl | grep nm-dispatcher".

So in short, I need two things:
1) coredns logs
2) nm-dispatcher logs

One of those two should tell us what went wrong.

Comment 6 Joseph Callen 2021-05-24 19:03:54 UTC
Created attachment 1786641 [details]
log-bundle

Comment 9 Ben Nemec 2021-06-10 17:37:48 UTC
https://github.com/openshift/installer/pull/4973 ended up being the fix for this. Duplicating to that bug.

*** This bug has been marked as a duplicate of bug 1966862 ***


Note You need to log in before you can comment on or make changes to this bug.