Bug 1626812
Summary: | Install failed due to etcd connection url hostname is mismatched with the one in cert files | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Weihua Meng <wmeng> | ||||||
Component: | Installer | Assignee: | Vadim Rutkovsky <vrutkovs> | ||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Weihua Meng <wmeng> | ||||||
Severity: | high | Docs Contact: | |||||||
Priority: | high | ||||||||
Version: | 3.11.0 | CC: | aos-bugs, jialiu, jokerman, mmccomas, shlao, vrutkovs, wmeng, wsun | ||||||
Target Milestone: | --- | Keywords: | Regression | ||||||
Target Release: | 3.11.0 | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||
Doc Text: |
Cause: any openstack install assumed openstack cloudprovider would be enabled
Consequence: openstack metadata was used to set hostnames, breaking upgrades on installs which didn't have cloudprovider enabled
Fix: openstack metadata is used only when openstack cloudprovider is enabled
Result: upgrade on openstack with custom hostnames and cloudprovider disables succeeds
|
Story Points: | --- | ||||||
Clone Of: | |||||||||
: | 1626935 (view as bug list) | Environment: | |||||||
Last Closed: | 2018-12-21 15:23:44 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Weihua Meng
2018-09-09 01:37:08 UTC
Created attachment 1482256 [details]
installation log with inventory file embedded for .28 build
Created attachment 1482257 [details]
installation log with inventory file embedded for .32 build
This issue looks like mismatched facts for hostname used for etcd and master. It seems its caused by https://github.com/openshift/openshift-ansible/pull/9876. Created PR to revert that in master - https://github.com/openshift/openshift-ansible/pull/9999 3.11 PR 9980 has been merged to openshift-ansible-3.11.2-1,please check the bug. Fixed. openshift-ansible-3.11.4-1.git.0.d727082.el7_5.noarch Kernel Version: 3.10.0-862.11.6.el7.x86_64 Operating System: Red Hat Enterprise Linux Server 7.5 (Maipo) QE have two deployment of OSP, one is "snvl2", another is "qeos10". Today when I was trying to install a cluster without cloudprovider enabled on "snvl2" OSP, failed due to the same issue in initial report. [root@qe-jialiu-master-etcd-1 ~]# root_path="/etc/origin/master"; [root@qe-jialiu-master-etcd-1 ~]# ca_file=${root_path}/$(grep -A 6 "etcdClientInfo:" /etc/origin/master/master-config.yaml | grep "ca" | awk -F": " '{print $2}'); [root@qe-jialiu-master-etcd-1 ~]# cert_file=${root_path}/$(grep -A 6 "etcdClientInfo:" /etc/origin/master/master-config.yaml | grep "certFile" | awk -F": " '{print $2}'); [root@qe-jialiu-master-etcd-1 ~]# key_file=${root_path}/$(grep -A 6 "etcdClientInfo:" /etc/origin/master/master-config.yaml | grep "keyFile" | awk -F": " '{print $2}'); [root@qe-jialiu-master-etcd-1 ~]# for i in `grep -A 8 "etcdClientInfo:" /etc/origin/master/master-config.yaml | grep -A 3 "urls:" | grep -v "urls:" | awk -F"- " '{print $2}'`; do url="$url,$i"; done [root@qe-jialiu-master-etcd-1 ~]# etcdctl --ca-file "${ca_file}" --cert-file "${cert_file}" --key-file "${key_file}" --endpoints ${url} cluster-health cluster may be unhealthy: failed to list members Error: client: etcd cluster is unavailable or misconfigured; error #0: http: no Host in request URL ; error #1: x509: certificate is valid for qe-jialiu-master-etcd-2.openshift-snvl2.internal, not qe-jialiu-master-etcd-2 ; error #2: x509: certificate is valid for qe-jialiu-master-etcd-3.openshift-snvl2.internal, not qe-jialiu-master-etcd-3 ; error #3: x509: certificate is valid for qe-jialiu-master-etcd-1.openshift-snvl2.internal, not qe-jialiu-master-etcd-1 error #0: http: no Host in request URL error #1: x509: certificate is valid for qe-jialiu-master-etcd-2.openshift-snvl2.internal, not qe-jialiu-master-etcd-2 error #2: x509: certificate is valid for qe-jialiu-master-etcd-3.openshift-snvl2.internal, not qe-jialiu-master-etcd-3 error #3: x509: certificate is valid for qe-jialiu-master-etcd-1.openshift-snvl2.internal, not qe-jialiu-master-etcd-1 But install on "qeos10", no such issue. Pls go throug the two install log, search "openshift_master_etcd_hosts" in the log, you will find the difference, one is using short hostname, another is using a fdqn hostname. When I was verifying https://bugzilla.redhat.com/show_bug.cgi?id=1623335, I am using "qeos10" OSP, but not "snvl2". (In reply to Johnny Liu from comment #20) > QE have two deployment of OSP, one is "snvl2", another is "qeos10". Please open a new issue for that. There is no version, inventory or playbook logs to find out what's wrong with URLs or etcd certs (In reply to Vadim Rutkovsky from comment #21) > (In reply to Johnny Liu from comment #20) > > QE have two deployment of OSP, one is "snvl2", another is "qeos10". > > Please open a new issue for that. There is no version, inventory or playbook > logs to find out what's wrong with URLs or etcd certs Done. https://bugzilla.redhat.com/show_bug.cgi?id=1631368 Closing bugs that were verified and targeted for GA but for some reason were not picked up by errata. This bug fix should be present in current 3.11 release content. |