Description of problem: The installer fails while checking for existing pods: failed: [master0.openshift.cloud] (item=etcd) => {"attempts": 60, "changed": false, "failed": true, "item": "etcd", "results": {"cmd": "/bin/oc get pod master-etcd-master0.openshift.cloud -o json -n kube-system", "results": [{}], "returncode": 0, "stderr": "Error from server (NotFound): pods \"master-etcd-master0.openshift.cloud\" not found\n", "stdout": ""}, "state": "list"} failed: [master0.openshift.cloud] (item=api) => {"attempts": 60, "changed": false, "failed": true, "item": "api", "results": {"cmd": "/bin/oc get pod master-api-master0.openshift.cloud -o json -n kube-system", "results": [{}], "returncode": 0, "stderr": "Error from server (NotFound): pods \"master-api-master0.openshift.cloud\" not found\n", "stdout": ""}, "state": "list"} failed: [master0.openshift.cloud] (item=controllers) => {"attempts": 60, "changed": false, "failed": true, "item": "controllers", "results": {"cmd": "/bin/oc get pod master-controllers-master0.openshift.cloud -o json -n kube-system", "results": [{}], "returncode": 0, "stderr": "Error from server (NotFound): pods \"master-controllers-master0.openshift.cloud\" not found\n", "stdout": ""}, "state": "list"} That's because it's looking for the wrong name of the pods. The pods itself are already started but named differently - without the domain part: [cloud-user@master0 ~]$ oc get pods --all-namespaces NAMESPACE NAME READY STATUS RESTARTS AGE kube-system master-api-master0 1/1 Running 0 23m kube-system master-controllers-master0 1/1 Running 0 23m kube-system master-etcd-master0 1/1 Running 0 23m Version-Release number of selected component (if applicable): [root@h1 ~]# openstack --version openstack 3.14.1 [cloud-user@bastion ~]# ansible --version ansible 2.4.6.0 config file = /etc/ansible/ansible.cfg configured module search path = [u'/home/cloud-user/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules'] ansible python module location = /usr/lib/python2.7/site-packages/ansible executable location = /usr/bin/ansible python version = 2.7.5 (default, Feb 20 2018, 09:19:12) [GCC 4.8.5 20150623 (Red Hat 4.8.5-28)] [cloud-user@master0 ~]$ oc version oc v3.10.14 kubernetes v1.10.0+b81c8f8 features: Basic-Auth GSSAPI Kerberos SPNEGO Server https://openshift.46.4.143.210.xip.io:8443 openshift v3.10.14 kubernetes v1.10.0+b81c8f8 How reproducible: run ansible-playbook -i /home/cloud-user/openshift-inventory --private-key=/home/cloud-user/admin.pem -vv /usr/share/ansible/openshift-ansible/playbooks/deploy_cluster.yml Steps to Reproduce: 1. 2. 3. Actual results: failing with PLAY RECAP **************************************************************************************************************************************************** infra0.openshift.cloud : ok=115 changed=55 unreachable=0 failed=0 localhost : ok=14 changed=0 unreachable=0 failed=0 master0.openshift.cloud : ok=331 changed=143 unreachable=0 failed=1 node0.openshift.cloud : ok=115 changed=55 unreachable=0 failed=0 node1.openshift.cloud : ok=115 changed=55 unreachable=0 failed=0 node2.openshift.cloud : ok=115 changed=55 unreachable=0 failed=0 node3.openshift.cloud : ok=115 changed=55 unreachable=0 failed=0 Expected results: Additional info: it's OCP on OpenStack if that makes a difference Description of problem: Version-Release number of the following components: rpm -q openshift-ansible rpm -q ansible ansible --version How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Please include the entire output from the last TASK line through the end of output if an error is generated Expected results: Additional info: Please attach logs from ansible-playbook with the -vvv flag
Version-Release number of the following components: [cloud-user@bastion ~]$ rpm -q openshift-ansible openshift-ansible-3.10.21-1.git.0.6446011.el7.noarch [cloud-user@bastion ~]$ rpm -q ansible ansible-2.4.6.0-1.el7ae.noarch [cloud-user@bastion ~]$ ansible --version ansible 2.4.6.0 config file = /etc/ansible/ansible.cfg configured module search path = [u'/home/cloud-user/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules'] ansible python module location = /usr/lib/python2.7/site-packages/ansible executable location = /usr/bin/ansible python version = 2.7.5 (default, Feb 20 2018, 09:19:12) [GCC 4.8.5 20150623 (Red Hat 4.8.5-28)] Actual results: Please include the entire output from the last TASK line through the end of output if an error is generated TASK [openshift_control_plane : Wait for all control plane pods to become ready] ****************************************************************************** task path: /usr/share/ansible/openshift-ansible/roles/openshift_control_plane/tasks/main.yml:256 FAILED - RETRYING: Wait for all control plane pods to become ready (60 retries left). FAILED - RETRYING: Wait for all control plane pods to become ready (59 retries left). FAILED - RETRYING: Wait for all control plane pods to become ready (58 retries left). FAILED - RETRYING: Wait for all control plane pods to become ready (57 retries left). FAILED - RETRYING: Wait for all control plane pods to become ready (56 retries left). FAILED - RETRYING: Wait for all control plane pods to become ready (55 retries left). ... FAILED - RETRYING: Wait for all control plane pods to become ready (1 retries left). failed: [master0.openshift.cloud] (item=etcd) => {"attempts": 60, "changed": false, "failed": true, "item": "etcd", "results": {"cmd": "/bin/oc get pod master-etcd-master0.openshift.cloud -o json -n kube-system", "results": [{}], "returncode": 0, "stderr": "Error from server (NotFound): pods \"master-etcd-master0.openshift.cloud\" not found\n", "stdout": ""}, "state": "list"} FAILED - RETRYING: Wait for all control plane pods to become ready (60 retries left). FAILED - RETRYING: Wait for all control plane pods to become ready (59 retries left). FAILED - RETRYING: Wait for all control plane pods to become ready (58 retries left). ... FAILED - RETRYING: Wait for all control plane pods to become ready (1 retries left). failed: [master0.openshift.cloud] (item=api) => {"attempts": 60, "changed": false, "failed": true, "item": "api", "results": {"cmd": "/bin/oc get pod master-api-master0.openshift.cloud -o json -n kube-system", "results": [{}], "returncode": 0, "stderr": "Error from server (NotFound): pods \"master-api-master0.openshift.cloud\" not found\n", "stdout": ""}, "state": "list"} FAILED - RETRYING: Wait for all control plane pods to become ready (60 retries left). FAILED - RETRYING: Wait for all control plane pods to become ready (59 retries left). FAILED - RETRYING: Wait for all control plane pods to become ready (58 retries left). FAILED - RETRYING: Wait for all control plane pods to become ready (57 retries left). ... FAILED - RETRYING: Wait for all control plane pods to become ready (1 retries left). failed: [master0.openshift.cloud] (item=controllers) => {"attempts": 60, "changed": false, "failed": true, "item": "controllers", "results": {"cmd": "/bin/oc get pod master-controllers-master0.openshift.cloud -o json -n kube-system", "results": [{}], "returncode": 0, "stderr": "Error from server (NotFound): pods \"master-controllers-master0.openshift.cloud\" not found\n", "stdout": ""}, "state": "list"} NO MORE HOSTS LEFT ******************************************************************************************************************************************** [WARNING]: Could not create retry file '/usr/share/ansible/openshift-ansible/playbooks/deploy_cluster.retry'. [Errno 13] Permission denied: u'/usr/share/ansible/openshift-ansible/playbooks/deploy_cluster.retry' PLAY RECAP **************************************************************************************************************************************************** infra0.openshift.cloud : ok=115 changed=55 unreachable=0 failed=0 localhost : ok=14 changed=0 unreachable=0 failed=0 master0.openshift.cloud : ok=331 changed=143 unreachable=0 failed=1 node0.openshift.cloud : ok=115 changed=55 unreachable=0 failed=0 node1.openshift.cloud : ok=115 changed=55 unreachable=0 failed=0 node2.openshift.cloud : ok=115 changed=55 unreachable=0 failed=0 node3.openshift.cloud : ok=115 changed=55 unreachable=0 failed=0 INSTALLER STATUS ********************************************************************************************************************************************** Initialization : Complete (0:00:50) Health Check : Complete (0:00:02) Node Bootstrap Preparation : Complete (1:28:50) etcd Install : Complete (0:01:51) Master Install : In Progress (0:22:07) This phase can be restarted by running: playbooks/openshift-master/config.yml Failure summary: 1. Hosts: master0.openshift.cloud Play: Configure masters Task: Wait for all control plane pods to become ready Message: All items completed [cloud-user@bastion ~]$ Expected results: Pods are found and installer continues.
Same root cause as 1614904 *** This bug has been marked as a duplicate of bug 1614904 ***