Bug 1798805

Summary: bootkube fails on bare metal deployment with IPv6 control plane becase it cannot resolve etcd* records
Product: OpenShift Container Platform Reporter: Marius Cornea <mcornea>
Component: InstallerAssignee: Stephen Benjamin <stbenjam>
Installer sub component: OpenShift on Bare Metal IPI QA Contact: Elena German <elgerman>
Status: CLOSED ERRATA Docs Contact:
Severity: urgent    
Priority: unspecified CC: augol, elgerman, rbartal, rbryant, william.caban, yprokule
Version: 4.3.zKeywords: TestBlocker
Target Milestone: ---   
Target Release: 4.3.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-03-10 23:53:27 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On: 1801436    
Bug Blocks: 1771572    

Description Marius Cornea 2020-02-06 03:38:32 UTC
Description of problem:

bootkube fails on bare metal deployment with IPv6 control plane because it cannot resolve etcd* records.

Deployment is blocked at:

time="2020-02-06T03:30:59Z" level=info msg="Waiting up to 30m0s for the Kubernetes API at https://api.ocp-edge-cluster.qe.lab.redhat.com:6443..."
time="2020-02-06T03:30:59Z" level=debug msg="Still waiting for the Kubernetes API: the server could not find the requested resource"
time="2020-02-06T03:31:29Z" level=debug msg="Still waiting for the Kubernetes API: the server could not find the requested resource"

Version-Release number of selected component (if applicable):
4.3.0-0.nightly-2020-02-05-064223-ipv6.2

How reproducible:
always on 4.3.0-0.nightly-2020-02-05-064223-ipv6.2

Steps to Reproduce:
1. Deploy bare metal deployment with IPV6 control plane
2. Log in to the bootstrap VM
3. Check bootkube log

Actual results:

Feb 06 03:32:32 localhost bootkube.sh[2510]: {"level":"warn","ts":"2020-02-06T03:32:32.298Z","caller":"clientv3/retry_interceptor.go:61","msg":"retrying of unary invoker failed","target":"endpoint://client-08495e38-eb0d-4102-8b80-78ccf178d430/etcd-0.ocp-edge-cluster.qe.lab.redhat.com:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest connection error: connection error: desc = \"transport: Error while dialing dial tcp: lookup etcd-0.ocp-edge-cluster.qe.lab.redhat.com on [fe80::5054:ff:fe3a:f406%ens3]:53: no such host\""}
Feb 06 03:32:32 localhost bootkube.sh[2510]: {"level":"warn","ts":"2020-02-06T03:32:32.298Z","caller":"clientv3/retry_interceptor.go:61","msg":"retrying of unary invoker failed","target":"endpoint://client-1447de30-934a-4925-96cf-a6523b355f50/etcd-1.ocp-edge-cluster.qe.lab.redhat.com:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest connection error: connection error: desc = \"transport: Error while dialing dial tcp: lookup etcd-1.ocp-edge-cluster.qe.lab.redhat.com on [fe80::5054:ff:fe3a:f406%ens3]:53: no such host\""}
Feb 06 03:32:32 localhost bootkube.sh[2510]: {"level":"warn","ts":"2020-02-06T03:32:32.298Z","caller":"clientv3/retry_interceptor.go:61","msg":"retrying of unary invoker failed","target":"endpoint://client-6fa7cbc2-0e26-4781-978b-5335e7e186d3/etcd-2.ocp-edge-cluster.qe.lab.redhat.com:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest connection error: connection error: desc = \"transport: Error while dialing dial tcp: lookup etcd-2.ocp-edge-cluster.qe.lab.redhat.com on [fe80::5054:ff:fe3a:f406%ens3]:53: no such host\""}

Checking resolv.conf on the boostrap VM we can see:

[root@localhost core]# cat /etc/resolv.conf 
# Generated by NetworkManager
nameserver fe80::5054:ff:fe3a:f406%ens3
nameserver fd2e:6f44:5dd8:c956::1


Expected results:
bootkube runs without errors

Additional info:

After adding the DNS VIP to resolv.conf on the bootstrap VM deployment proceeds:

[root@localhost core]# cat /etc/resolv.conf 
# Generated by NetworkManager
nameserver fd2e:6f44:5dd8:c956::2
nameserver fe80::5054:ff:fe3a:f406%ens3
nameserver fd2e:6f44:5dd8:c956::1

where fd2e:6f44:5dd8:c956::2 is the DNS VIP

Comment 1 Russell Bryant 2020-02-06 15:44:07 UTC
This should be resolved by https://github.com/openshift/installer/pull/2982

This fix was temporarily accidentally dropped from the 4.3-ipv6 fork.  It is being re-added now and will be in a new 4.3-ipv6 build soon.

Comment 2 Russell Bryant 2020-02-06 18:16:22 UTC
should be fixed in 4.3.0-0.nightly-2020-02-06-120247-ipv6.3

Comment 5 Scott Dodson 2020-02-14 16:56:00 UTC
*** Bug 1801873 has been marked as a duplicate of this bug. ***

Comment 9 errata-xmlrpc 2020-03-10 23:53:27 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0676