Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1803782

Summary:	wrong initial host-etcd endpoints in ipv6 cluster
Product:	OpenShift Container Platform	Reporter:	Dan Winship <danw>
Component:	Etcd Operator	Assignee:	Sam Batschelet <sbatsche>
Status:	CLOSED DUPLICATE	QA Contact:	ge liu <geliu>
Severity:	urgent	Docs Contact:
Priority:	urgent
Version:	4.4	CC:	deads, mfojtik, skolicha
Target Milestone:	---
Target Release:	4.4.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2020-03-10 16:46:01 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Dan Winship 2020-02-17 12:54:43 UTC

When attempting to install an IPv6 cluster (https://github.com/openshift/installer/pull/2847) the installer creates an initial set of openshift-etcd/host-etcd endpoints containing the IPv4 address of the bootstrap node:

danw@p50:~> oc get endpoints -n openshift-etcd
NAME        ENDPOINTS                                                 AGE
etcd                                                                  16h
host-etcd   10.0.0.8:2379,192.0.2.2:2379,192.0.2.3:2379 + 1 more...   16h

This value comes from "BOOTSTRAP_IP=$(hostname -I | awk '{ print $1 }')" in bootkube.sh... I'm not sure what information it has available to it to figure out that this should be a single-stack IPv6 cluster so it can adjust its logic accordingly. (The node has both IPv4 and IPv6 addresses but we want it to use the IPv6 address here.)


(There's another problem, which is that the endpoints never get updated to point to the masters, which is probably cluster-etcd-operator's fault not installer's, but I can't tell if it's partly related to this. eg, cluster-etcd-operator is pod-network, not host-network, so if it tries to connect to the existing endpoints from a single-stack IPv6 pod, it will fail).

Comment 1 Dan Winship 2020-02-20 18:13:48 UTC

I tried just hacking bootkube.sh.template quickly to change ETCD_ENDPOINTS but that doesn't fix things because, I think, the certificates are only being generated with the IPv4 address in them, eg:

    + kube-client-agent request --kubeconfig=/etc/kubernetes/kubeconfig --orgname=system:etcd-peers --assetsdir=/etc/ssl/etcd --dnsnames=dwinship-ipv6.sdn.azure.devcluster.openshift.com --commonname=system:etcd-peer:dwinship-ipv6-h6pzx-bootstrap --ipaddrs=10.0.0.5

and then the server rejects connections from itself:

    2020-02-20 17:26:39.261342 I | embed: rejected connection from "[fc00::5]:43686" (error "remote error: tls: bad certificate", ServerName "")


(With the IPv4 address in the initial endpoints, the bootstrap etcd comes up fine but then the first real etcd can't talk to it because it ends up trying to talk to the bootstrap etcd's IPv4 address, but the bootstrap etcd is only configured to accept connections from the real master's IPv6 address.)

Comment 3 Dan Winship 2020-03-02 13:58:02 UTC

I think this is fixed, but also it might make more sense to just mark this a duplicate of bug 1804913.

Comment 4 Suresh Kolichala 2020-03-10 16:46:01 UTC

This is fixed both in 4.4 and 4.5. Closing it as duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1810694 (4.4 clone of the bug 1804913)

*** This bug has been marked as a duplicate of bug 1810694 ***

Comment 5 Red Hat Bugzilla 2023-09-14 05:52:44 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days