Bug 1459505
Summary: | atomic-openshift-master-controllers reports etcd cluster is unavailable or misconfigured; error #0: Forbidden | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Jaspreet Kaur <jkaur> |
Component: | Node | Assignee: | Seth Jennings <sjenning> |
Status: | CLOSED DUPLICATE | QA Contact: | DeShuai Ma <dma> |
Severity: | urgent | Docs Contact: | |
Priority: | urgent | ||
Version: | 3.5.1 | CC: | aos-bugs, jokerman, mchappel, mmccomas |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2017-06-19 17:56:39 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Jaspreet Kaur
2017-06-07 10:15:28 UTC
The environment was working, the change made was an upgrade from 3.4 to 3.5 After some more poking on our side we think we've found what's going on. Jaspreet earlier made comments about having seen this when proxies started getting in the way. We looked at the configs but the hostnames for our etcd servers were in the no_proxy configs so we expected everything to behave. BUT... I ran an strace on the master-controllers process and noticed that it was connecting to the proxy servers rather than to the etcd servers. On a hunch I tried adding the etcd IP addresses to the no_proxy lists and this seems to have cleared the error. So it would appear that for some reason it was connecting to the etcd servers by IP rather than hostname, thus ignoring the no_proxy setting. Additionally no_proxy doesn't handle CIDRs so having 10.X.Y.Z/24 in there didn't help. As an educated guess the list of etcd cluster-members is now being pulled from etcd after making the initial connection, and then it's connecting by IP which is how etcd seems to store cluster members internally. Not sure what I'd consider the correct fix here, but the change in behaviour from hostname -> IP address will break previously running clusters. Mark, In the master-config.yaml, is the etcd url specified with hostname or IP? Is the config for 3.4 specified with hostname and 3.5 with IP? Example: etcdClientInfo: ca: ca-bundle.crt certFile: master.etcd-client.crt keyFile: master.etcd-client.key urls: - https://10.42.10.204:4001 <-- here Also, did you use openshift-ansible (the atomic-openshift-installer) to do the upgrade? If so, there could have been a change in there that changed the etcd url from hostname to IP during the 3.5 upgrade. Seth, The master configs, both before and after the upgrade, list the hostnames and not the IP addresses We did use the openshift-ansible playbooks to perform the upgrades, specifically "playbooks/byo/openshift-cluster/upgrades/v3_5/upgrade_control_plane.yml" I will attempt to find what caused of this change in behavior between 3.4 and 3.5. It seems there is a workaround however (adding the etcd IPs to NO_PROXY). Ok I think I know what happened here. etcd server changed how they store peer url endpoints underneath us. https://github.com/openshift/origin/blob/master/vendor/github.com/coreos/etcd/client/client.go#L416-L468 In the etcd (v2) client, the endpoints are overwritten with the peer URL list from the server on the first Sync(). etcd, at some point, moves from storing this URLs as hostname to IP addresses. This change cascades down to the client, overwrite the user-provided list of endpoints. Upstream issue for openshift-ansible: https://github.com/openshift/openshift-ansible/issues/4490 ETCD_ADVERTISE_CLIENT_URLS was switched to using IP instead of hostname https://github.com/openshift/openshift-ansible/pull/1754 etcd prefers using IPs so I doubt this change will be rolled back. There is work upstream to add IPs to the NO_PROXY as part of the installer. The workaround is to either 1) change ETCD_ADVERTISE_CLIENT_URLS on the etcd members to hostnames or 2) add the IPs to NO_PROXY. I'm duping this to the documentation bug 1458660 for this issue which is also tracking the installer change. *** This bug has been marked as a duplicate of bug 1458660 *** |