Bug 1862941 - Installation fails unless api-int entry is present in DNS records
Summary: Installation fails unless api-int entry is present in DNS records
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 4.6
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 4.6.0
Assignee: Roy Golan
QA Contact: Lucie Leistnerova
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-08-03 09:30 UTC by Jan Zmeskal
Modified: 2020-08-12 10:20 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-08-12 10:20:34 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Jan Zmeskal 2020-08-03 09:30:38 UTC
Description of problem:
When installing latest version of OpenShift 4.6, bootstrap process gets stuck and prints this error message over and over:

Aug 03 09:15:24 <bootstap_hostname> bootkube.sh[2266]: E0803 09:15:24.829658       1 reflector.go:127] k8s.io/client-go.0-rc.2/tools/cache/reflector.go:156: Failed to watch *v1.Etcd: failed to list *v1.Etcd: Get "https://api-int.<cluster_name>.<base_domain>:6443/apis/operator.openshift.io/v1/etcds?fieldSelector=metadata.name%3Dcluster&limit=500&resourceVersion=0": dial tcp: lookup api-int.<cluster_name>.<base_domain> on <dns_server>:53: no such host

Without user intervention, the installation fails because of this.

Version-Release number of the following components:
4.6.0-0.nightly-2020-08-03-054919

How reproducible:
Reproduced once

Steps to Reproduce:
1. Start openshift-install while having DNS records as docs instrcut you. I have these two DNS records in my DNS system:
api.<cluster_name>.<base_domain>. IN A <api_vip>
*.apps.<cluster_name>.<base_domain>. IN A <ingress_vip>
2. journalctl -b -f -u release-image.service -u bootkube.service on bootstrap VM
3. Wait until above mentioned message start to appear
4. Reconfigure you DNS to contain following entry:
api-int.<cluster_name>.<base_domain>. IN A <api_vip>

Actual results:
After adding api-int entry to my DNS setup, the bootkube.service completed and the installation proceeded.

Expected results:
User is not expected to set up api-int DNS entry and the installation should work without it.

Additional info:
bootkube log: http://pastebin.test.redhat.com/889907
install-config.yaml: http://pastebin.test.redhat.com/889908

Comment 2 Jan Zmeskal 2020-08-03 14:48:18 UTC
DNS prepender script is present on the bootstrap VM: 

cat /etc/NetworkManager/dispatcher.d/30-local-dns-prepender
#!/bin/bash
IFACE=$1
STATUS=$2
case "$STATUS" in
    up)
    logger -s "NM local-dns-prepender triggered by ${1} ${2}."
    DNS_IP="127.0.0.1"
    set +e
    logger -s "NM local-dns-prepender: Checking if local DNS IP is the first entry in resolv.conf"
    if grep nameserver /etc/resolv.conf | head -n 1 | grep -q "$DNS_IP" ; then
        logger -s "NM local-dns-prepender: local DNS IP already is the first entry in resolv.conf"
        exit 0
    else
        logger -s "NM local-dns-prepender: Looking for '# Generated by NetworkManager' in /etc/resolv.conf to place 'nameserver $DNS_IP'"
        sed -i "/^# Generated by.*$/a nameserver $DNS_IP" /etc/resolv.conf
    fi
    ;;
    *)
    ;;
esac

Comment 3 Jan Zmeskal 2020-08-03 15:00:42 UTC
Seems like the problem might be specific for the version in description. Right now I'm trying with 4.6.0-0.nightly-2020-08-03-091920 (only couple of hours apart from 4.6.0-0.nightly-2020-08-03-054919) and the problem does not reproduce. The content of /etc/NetworkManager/dispatcher.d/30-local-dns-prepender is actually from my 4.6.0-0.nightly-2020-08-03-091920 attempt.

It's hard to believe there is a significant difference between those two builds. Hopefully this is not something flaky appearing seemingly randomly.

Comment 4 Jan Zmeskal 2020-08-12 09:38:53 UTC
The issue seems not to be reproducing any more, last time I tried with 4.6.0-0.nightly-2020-08-12-062953. Therefore I suggest to close it.


Note You need to log in before you can comment on or make changes to this bug.