Bug 1798805 - bootkube fails on bare metal deployment with IPv6 control plane becase it cannot resolve etcd* records
Summary: bootkube fails on bare metal deployment with IPv6 control plane becase it can...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 4.3.z
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
: 4.3.z
Assignee: Stephen Benjamin
QA Contact: Elena German
URL:
Whiteboard:
: 1801873 (view as bug list)
Depends On: 1801436
Blocks: 1771572
TreeView+ depends on / blocked
 
Reported: 2020-02-06 03:38 UTC by Marius Cornea
Modified: 2020-04-20 04:11 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-03-10 23:53:27 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift installer pull 3092 0 None closed Bug 1798805: baremetal: Fix bootstrap local DNS for IPv6 2020-04-20 04:10:58 UTC
Red Hat Product Errata RHBA-2020:0676 0 None None None 2020-03-10 23:53:43 UTC

Description Marius Cornea 2020-02-06 03:38:32 UTC
Description of problem:

bootkube fails on bare metal deployment with IPv6 control plane because it cannot resolve etcd* records.

Deployment is blocked at:

time="2020-02-06T03:30:59Z" level=info msg="Waiting up to 30m0s for the Kubernetes API at https://api.ocp-edge-cluster.qe.lab.redhat.com:6443..."
time="2020-02-06T03:30:59Z" level=debug msg="Still waiting for the Kubernetes API: the server could not find the requested resource"
time="2020-02-06T03:31:29Z" level=debug msg="Still waiting for the Kubernetes API: the server could not find the requested resource"

Version-Release number of selected component (if applicable):
4.3.0-0.nightly-2020-02-05-064223-ipv6.2

How reproducible:
always on 4.3.0-0.nightly-2020-02-05-064223-ipv6.2

Steps to Reproduce:
1. Deploy bare metal deployment with IPV6 control plane
2. Log in to the bootstrap VM
3. Check bootkube log

Actual results:

Feb 06 03:32:32 localhost bootkube.sh[2510]: {"level":"warn","ts":"2020-02-06T03:32:32.298Z","caller":"clientv3/retry_interceptor.go:61","msg":"retrying of unary invoker failed","target":"endpoint://client-08495e38-eb0d-4102-8b80-78ccf178d430/etcd-0.ocp-edge-cluster.qe.lab.redhat.com:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest connection error: connection error: desc = \"transport: Error while dialing dial tcp: lookup etcd-0.ocp-edge-cluster.qe.lab.redhat.com on [fe80::5054:ff:fe3a:f406%ens3]:53: no such host\""}
Feb 06 03:32:32 localhost bootkube.sh[2510]: {"level":"warn","ts":"2020-02-06T03:32:32.298Z","caller":"clientv3/retry_interceptor.go:61","msg":"retrying of unary invoker failed","target":"endpoint://client-1447de30-934a-4925-96cf-a6523b355f50/etcd-1.ocp-edge-cluster.qe.lab.redhat.com:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest connection error: connection error: desc = \"transport: Error while dialing dial tcp: lookup etcd-1.ocp-edge-cluster.qe.lab.redhat.com on [fe80::5054:ff:fe3a:f406%ens3]:53: no such host\""}
Feb 06 03:32:32 localhost bootkube.sh[2510]: {"level":"warn","ts":"2020-02-06T03:32:32.298Z","caller":"clientv3/retry_interceptor.go:61","msg":"retrying of unary invoker failed","target":"endpoint://client-6fa7cbc2-0e26-4781-978b-5335e7e186d3/etcd-2.ocp-edge-cluster.qe.lab.redhat.com:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest connection error: connection error: desc = \"transport: Error while dialing dial tcp: lookup etcd-2.ocp-edge-cluster.qe.lab.redhat.com on [fe80::5054:ff:fe3a:f406%ens3]:53: no such host\""}

Checking resolv.conf on the boostrap VM we can see:

[root@localhost core]# cat /etc/resolv.conf 
# Generated by NetworkManager
nameserver fe80::5054:ff:fe3a:f406%ens3
nameserver fd2e:6f44:5dd8:c956::1


Expected results:
bootkube runs without errors

Additional info:

After adding the DNS VIP to resolv.conf on the bootstrap VM deployment proceeds:

[root@localhost core]# cat /etc/resolv.conf 
# Generated by NetworkManager
nameserver fd2e:6f44:5dd8:c956::2
nameserver fe80::5054:ff:fe3a:f406%ens3
nameserver fd2e:6f44:5dd8:c956::1

where fd2e:6f44:5dd8:c956::2 is the DNS VIP

Comment 1 Russell Bryant 2020-02-06 15:44:07 UTC
This should be resolved by https://github.com/openshift/installer/pull/2982

This fix was temporarily accidentally dropped from the 4.3-ipv6 fork.  It is being re-added now and will be in a new 4.3-ipv6 build soon.

Comment 2 Russell Bryant 2020-02-06 18:16:22 UTC
should be fixed in 4.3.0-0.nightly-2020-02-06-120247-ipv6.3

Comment 5 Scott Dodson 2020-02-14 16:56:00 UTC
*** Bug 1801873 has been marked as a duplicate of this bug. ***

Comment 9 errata-xmlrpc 2020-03-10 23:53:27 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0676


Note You need to log in before you can comment on or make changes to this bug.