Description of problem: Install failed 3.11.0-0.32.0 due to api pod keep restarting Version-Release number of the following components: openshift-ansible-3.11.0-0.32.0.git.0.b27b349.el7.noarch How reproducible: Always Steps to Reproduce: 1. install OCP with openshift-ansible-3.11.0-0.32.0.git.0.b27b349.el7.noarch Actual results: Install failed Expected results: Install succeeds Additional info: 1. Hosts: host-xxxx.host.centralci.eng.rdu2.redhat.com Play: Configure masters Task: Report control plane errors Message: Control plane pods didn't come up
Created attachment 1482256 [details] installation log with inventory file embedded for .28 build
Created attachment 1482257 [details] installation log with inventory file embedded for .32 build
This issue looks like mismatched facts for hostname used for etcd and master.
It seems its caused by https://github.com/openshift/openshift-ansible/pull/9876. Created PR to revert that in master - https://github.com/openshift/openshift-ansible/pull/9999
3.11 PR 9980 has been merged to openshift-ansible-3.11.2-1,please check the bug.
Fixed. openshift-ansible-3.11.4-1.git.0.d727082.el7_5.noarch Kernel Version: 3.10.0-862.11.6.el7.x86_64 Operating System: Red Hat Enterprise Linux Server 7.5 (Maipo)
QE have two deployment of OSP, one is "snvl2", another is "qeos10". Today when I was trying to install a cluster without cloudprovider enabled on "snvl2" OSP, failed due to the same issue in initial report. [root@qe-jialiu-master-etcd-1 ~]# root_path="/etc/origin/master"; [root@qe-jialiu-master-etcd-1 ~]# ca_file=${root_path}/$(grep -A 6 "etcdClientInfo:" /etc/origin/master/master-config.yaml | grep "ca" | awk -F": " '{print $2}'); [root@qe-jialiu-master-etcd-1 ~]# cert_file=${root_path}/$(grep -A 6 "etcdClientInfo:" /etc/origin/master/master-config.yaml | grep "certFile" | awk -F": " '{print $2}'); [root@qe-jialiu-master-etcd-1 ~]# key_file=${root_path}/$(grep -A 6 "etcdClientInfo:" /etc/origin/master/master-config.yaml | grep "keyFile" | awk -F": " '{print $2}'); [root@qe-jialiu-master-etcd-1 ~]# for i in `grep -A 8 "etcdClientInfo:" /etc/origin/master/master-config.yaml | grep -A 3 "urls:" | grep -v "urls:" | awk -F"- " '{print $2}'`; do url="$url,$i"; done [root@qe-jialiu-master-etcd-1 ~]# etcdctl --ca-file "${ca_file}" --cert-file "${cert_file}" --key-file "${key_file}" --endpoints ${url} cluster-health cluster may be unhealthy: failed to list members Error: client: etcd cluster is unavailable or misconfigured; error #0: http: no Host in request URL ; error #1: x509: certificate is valid for qe-jialiu-master-etcd-2.openshift-snvl2.internal, not qe-jialiu-master-etcd-2 ; error #2: x509: certificate is valid for qe-jialiu-master-etcd-3.openshift-snvl2.internal, not qe-jialiu-master-etcd-3 ; error #3: x509: certificate is valid for qe-jialiu-master-etcd-1.openshift-snvl2.internal, not qe-jialiu-master-etcd-1 error #0: http: no Host in request URL error #1: x509: certificate is valid for qe-jialiu-master-etcd-2.openshift-snvl2.internal, not qe-jialiu-master-etcd-2 error #2: x509: certificate is valid for qe-jialiu-master-etcd-3.openshift-snvl2.internal, not qe-jialiu-master-etcd-3 error #3: x509: certificate is valid for qe-jialiu-master-etcd-1.openshift-snvl2.internal, not qe-jialiu-master-etcd-1 But install on "qeos10", no such issue. Pls go throug the two install log, search "openshift_master_etcd_hosts" in the log, you will find the difference, one is using short hostname, another is using a fdqn hostname. When I was verifying https://bugzilla.redhat.com/show_bug.cgi?id=1623335, I am using "qeos10" OSP, but not "snvl2".
(In reply to Johnny Liu from comment #20) > QE have two deployment of OSP, one is "snvl2", another is "qeos10". Please open a new issue for that. There is no version, inventory or playbook logs to find out what's wrong with URLs or etcd certs
(In reply to Vadim Rutkovsky from comment #21) > (In reply to Johnny Liu from comment #20) > > QE have two deployment of OSP, one is "snvl2", another is "qeos10". > > Please open a new issue for that. There is no version, inventory or playbook > logs to find out what's wrong with URLs or etcd certs Done. https://bugzilla.redhat.com/show_bug.cgi?id=1631368
Closing bugs that were verified and targeted for GA but for some reason were not picked up by errata. This bug fix should be present in current 3.11 release content.