Bug 1615754
| Summary: | Installer fails in Ansible TASK [openshift_control_plane : Wait for all control plane pods to become ready] | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Torben Jaeger <torben> |
| Component: | Installer | Assignee: | Scott Dodson <sdodson> |
| Status: | CLOSED DUPLICATE | QA Contact: | Johnny Liu <jialiu> |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 3.10.0 | CC: | aos-bugs, jokerman, mmccomas |
| Target Milestone: | --- | ||
| Target Release: | 3.10.z | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2018-08-14 20:15:04 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
Version-Release number of the following components:
[cloud-user@bastion ~]$ rpm -q openshift-ansible
openshift-ansible-3.10.21-1.git.0.6446011.el7.noarch
[cloud-user@bastion ~]$ rpm -q ansible
ansible-2.4.6.0-1.el7ae.noarch
[cloud-user@bastion ~]$ ansible --version
ansible 2.4.6.0
config file = /etc/ansible/ansible.cfg
configured module search path = [u'/home/cloud-user/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
ansible python module location = /usr/lib/python2.7/site-packages/ansible
executable location = /usr/bin/ansible
python version = 2.7.5 (default, Feb 20 2018, 09:19:12) [GCC 4.8.5 20150623 (Red Hat 4.8.5-28)]
Actual results:
Please include the entire output from the last TASK line through the end of output if an error is generated
TASK [openshift_control_plane : Wait for all control plane pods to become ready] ******************************************************************************
task path: /usr/share/ansible/openshift-ansible/roles/openshift_control_plane/tasks/main.yml:256
FAILED - RETRYING: Wait for all control plane pods to become ready (60 retries left).
FAILED - RETRYING: Wait for all control plane pods to become ready (59 retries left).
FAILED - RETRYING: Wait for all control plane pods to become ready (58 retries left).
FAILED - RETRYING: Wait for all control plane pods to become ready (57 retries left).
FAILED - RETRYING: Wait for all control plane pods to become ready (56 retries left).
FAILED - RETRYING: Wait for all control plane pods to become ready (55 retries left).
...
FAILED - RETRYING: Wait for all control plane pods to become ready (1 retries left).
failed: [master0.openshift.cloud] (item=etcd) => {"attempts": 60, "changed": false, "failed": true, "item": "etcd", "results": {"cmd": "/bin/oc get pod master-etcd-master0.openshift.cloud -o json -n kube-system", "results": [{}], "returncode": 0, "stderr": "Error from server (NotFound): pods \"master-etcd-master0.openshift.cloud\" not found\n", "stdout": ""}, "state": "list"}
FAILED - RETRYING: Wait for all control plane pods to become ready (60 retries left).
FAILED - RETRYING: Wait for all control plane pods to become ready (59 retries left).
FAILED - RETRYING: Wait for all control plane pods to become ready (58 retries left).
...
FAILED - RETRYING: Wait for all control plane pods to become ready (1 retries left).
failed: [master0.openshift.cloud] (item=api) => {"attempts": 60, "changed": false, "failed": true, "item": "api", "results": {"cmd": "/bin/oc get pod master-api-master0.openshift.cloud -o json -n kube-system", "results": [{}], "returncode": 0, "stderr": "Error from server (NotFound): pods \"master-api-master0.openshift.cloud\" not found\n", "stdout": ""}, "state": "list"}
FAILED - RETRYING: Wait for all control plane pods to become ready (60 retries left).
FAILED - RETRYING: Wait for all control plane pods to become ready (59 retries left).
FAILED - RETRYING: Wait for all control plane pods to become ready (58 retries left).
FAILED - RETRYING: Wait for all control plane pods to become ready (57 retries left).
...
FAILED - RETRYING: Wait for all control plane pods to become ready (1 retries left).
failed: [master0.openshift.cloud] (item=controllers) => {"attempts": 60, "changed": false, "failed": true, "item": "controllers", "results": {"cmd": "/bin/oc get pod master-controllers-master0.openshift.cloud -o json -n kube-system", "results": [{}], "returncode": 0, "stderr": "Error from server (NotFound): pods \"master-controllers-master0.openshift.cloud\" not found\n", "stdout": ""}, "state": "list"}
NO MORE HOSTS LEFT ********************************************************************************************************************************************
[WARNING]: Could not create retry file '/usr/share/ansible/openshift-ansible/playbooks/deploy_cluster.retry'. [Errno 13] Permission denied:
u'/usr/share/ansible/openshift-ansible/playbooks/deploy_cluster.retry'
PLAY RECAP ****************************************************************************************************************************************************
infra0.openshift.cloud : ok=115 changed=55 unreachable=0 failed=0
localhost : ok=14 changed=0 unreachable=0 failed=0
master0.openshift.cloud : ok=331 changed=143 unreachable=0 failed=1
node0.openshift.cloud : ok=115 changed=55 unreachable=0 failed=0
node1.openshift.cloud : ok=115 changed=55 unreachable=0 failed=0
node2.openshift.cloud : ok=115 changed=55 unreachable=0 failed=0
node3.openshift.cloud : ok=115 changed=55 unreachable=0 failed=0
INSTALLER STATUS **********************************************************************************************************************************************
Initialization : Complete (0:00:50)
Health Check : Complete (0:00:02)
Node Bootstrap Preparation : Complete (1:28:50)
etcd Install : Complete (0:01:51)
Master Install : In Progress (0:22:07)
This phase can be restarted by running: playbooks/openshift-master/config.yml
Failure summary:
1. Hosts: master0.openshift.cloud
Play: Configure masters
Task: Wait for all control plane pods to become ready
Message: All items completed
[cloud-user@bastion ~]$
Expected results:
Pods are found and installer continues.
Same root cause as 1614904 *** This bug has been marked as a duplicate of bug 1614904 *** |
Description of problem: The installer fails while checking for existing pods: failed: [master0.openshift.cloud] (item=etcd) => {"attempts": 60, "changed": false, "failed": true, "item": "etcd", "results": {"cmd": "/bin/oc get pod master-etcd-master0.openshift.cloud -o json -n kube-system", "results": [{}], "returncode": 0, "stderr": "Error from server (NotFound): pods \"master-etcd-master0.openshift.cloud\" not found\n", "stdout": ""}, "state": "list"} failed: [master0.openshift.cloud] (item=api) => {"attempts": 60, "changed": false, "failed": true, "item": "api", "results": {"cmd": "/bin/oc get pod master-api-master0.openshift.cloud -o json -n kube-system", "results": [{}], "returncode": 0, "stderr": "Error from server (NotFound): pods \"master-api-master0.openshift.cloud\" not found\n", "stdout": ""}, "state": "list"} failed: [master0.openshift.cloud] (item=controllers) => {"attempts": 60, "changed": false, "failed": true, "item": "controllers", "results": {"cmd": "/bin/oc get pod master-controllers-master0.openshift.cloud -o json -n kube-system", "results": [{}], "returncode": 0, "stderr": "Error from server (NotFound): pods \"master-controllers-master0.openshift.cloud\" not found\n", "stdout": ""}, "state": "list"} That's because it's looking for the wrong name of the pods. The pods itself are already started but named differently - without the domain part: [cloud-user@master0 ~]$ oc get pods --all-namespaces NAMESPACE NAME READY STATUS RESTARTS AGE kube-system master-api-master0 1/1 Running 0 23m kube-system master-controllers-master0 1/1 Running 0 23m kube-system master-etcd-master0 1/1 Running 0 23m Version-Release number of selected component (if applicable): [root@h1 ~]# openstack --version openstack 3.14.1 [cloud-user@bastion ~]# ansible --version ansible 2.4.6.0 config file = /etc/ansible/ansible.cfg configured module search path = [u'/home/cloud-user/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules'] ansible python module location = /usr/lib/python2.7/site-packages/ansible executable location = /usr/bin/ansible python version = 2.7.5 (default, Feb 20 2018, 09:19:12) [GCC 4.8.5 20150623 (Red Hat 4.8.5-28)] [cloud-user@master0 ~]$ oc version oc v3.10.14 kubernetes v1.10.0+b81c8f8 features: Basic-Auth GSSAPI Kerberos SPNEGO Server https://openshift.46.4.143.210.xip.io:8443 openshift v3.10.14 kubernetes v1.10.0+b81c8f8 How reproducible: run ansible-playbook -i /home/cloud-user/openshift-inventory --private-key=/home/cloud-user/admin.pem -vv /usr/share/ansible/openshift-ansible/playbooks/deploy_cluster.yml Steps to Reproduce: 1. 2. 3. Actual results: failing with PLAY RECAP **************************************************************************************************************************************************** infra0.openshift.cloud : ok=115 changed=55 unreachable=0 failed=0 localhost : ok=14 changed=0 unreachable=0 failed=0 master0.openshift.cloud : ok=331 changed=143 unreachable=0 failed=1 node0.openshift.cloud : ok=115 changed=55 unreachable=0 failed=0 node1.openshift.cloud : ok=115 changed=55 unreachable=0 failed=0 node2.openshift.cloud : ok=115 changed=55 unreachable=0 failed=0 node3.openshift.cloud : ok=115 changed=55 unreachable=0 failed=0 Expected results: Additional info: it's OCP on OpenStack if that makes a difference Description of problem: Version-Release number of the following components: rpm -q openshift-ansible rpm -q ansible ansible --version How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Please include the entire output from the last TASK line through the end of output if an error is generated Expected results: Additional info: Please attach logs from ansible-playbook with the -vvv flag