Bug 1615754 - Installer fails in Ansible TASK [openshift_control_plane : Wait for all control plane pods to become ready]
Summary: Installer fails in Ansible TASK [openshift_control_plane : Wait for all contr...
Keywords:
Status: CLOSED DUPLICATE of bug 1614904
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 3.10.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 3.10.z
Assignee: Scott Dodson
QA Contact: Johnny Liu
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-08-14 07:35 UTC by Torben Jaeger
Modified: 2018-08-14 20:15 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-08-14 20:15:04 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Torben Jaeger 2018-08-14 07:35:19 UTC
Description of problem:

The installer fails while checking for existing pods:

failed: [master0.openshift.cloud] (item=etcd) => {"attempts": 60, "changed": false, "failed": true, "item": "etcd", "results": {"cmd": "/bin/oc get pod master-etcd-master0.openshift.cloud -o json -n kube-system", "results": [{}], "returncode": 0, "stderr": "Error from server (NotFound): pods \"master-etcd-master0.openshift.cloud\" not found\n", "stdout": ""}, "state": "list"}

failed: [master0.openshift.cloud] (item=api) => {"attempts": 60, "changed": false, "failed": true, "item": "api", "results": {"cmd": "/bin/oc get pod master-api-master0.openshift.cloud -o json -n kube-system", "results": [{}], "returncode": 0, "stderr": "Error from server (NotFound): pods \"master-api-master0.openshift.cloud\" not found\n", "stdout": ""}, "state": "list"}

failed: [master0.openshift.cloud] (item=controllers) => {"attempts": 60, "changed": false, "failed": true, "item": "controllers", "results": {"cmd": "/bin/oc get pod master-controllers-master0.openshift.cloud -o json -n kube-system", "results": [{}], "returncode": 0, "stderr": "Error from server (NotFound): pods \"master-controllers-master0.openshift.cloud\" not found\n", "stdout": ""}, "state": "list"}

That's because it's looking for the wrong name of the pods. The pods itself are already started but named differently - without the domain part:

[cloud-user@master0 ~]$ oc get pods --all-namespaces
NAMESPACE     NAME                         READY     STATUS    RESTARTS   AGE
kube-system   master-api-master0           1/1       Running   0          23m
kube-system   master-controllers-master0   1/1       Running   0          23m
kube-system   master-etcd-master0          1/1       Running   0          23m



Version-Release number of selected component (if applicable):

[root@h1 ~]# openstack --version
openstack 3.14.1

[cloud-user@bastion ~]# ansible --version
ansible 2.4.6.0
  config file = /etc/ansible/ansible.cfg
  configured module search path = [u'/home/cloud-user/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/lib/python2.7/site-packages/ansible
  executable location = /usr/bin/ansible
  python version = 2.7.5 (default, Feb 20 2018, 09:19:12) [GCC 4.8.5 20150623 (Red Hat 4.8.5-28)]

[cloud-user@master0 ~]$ oc version
oc v3.10.14
kubernetes v1.10.0+b81c8f8
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://openshift.46.4.143.210.xip.io:8443
openshift v3.10.14
kubernetes v1.10.0+b81c8f8


How reproducible:

run

ansible-playbook -i /home/cloud-user/openshift-inventory --private-key=/home/cloud-user/admin.pem -vv /usr/share/ansible/openshift-ansible/playbooks/deploy_cluster.yml

Steps to Reproduce:
1.
2.
3.

Actual results:

failing with

PLAY RECAP ****************************************************************************************************************************************************
infra0.openshift.cloud     : ok=115  changed=55   unreachable=0    failed=0
localhost                  : ok=14   changed=0    unreachable=0    failed=0
master0.openshift.cloud    : ok=331  changed=143  unreachable=0    failed=1   
node0.openshift.cloud      : ok=115  changed=55   unreachable=0    failed=0
node1.openshift.cloud      : ok=115  changed=55   unreachable=0    failed=0
node2.openshift.cloud      : ok=115  changed=55   unreachable=0    failed=0
node3.openshift.cloud      : ok=115  changed=55   unreachable=0    failed=0
Expected results:


Additional info:

it's OCP on OpenStack if that makes a difference 

Description of problem:

Version-Release number of the following components:
rpm -q openshift-ansible
rpm -q ansible
ansible --version

How reproducible:

Steps to Reproduce:
1.
2.
3.

Actual results:
Please include the entire output from the last TASK line through the end of output if an error is generated

Expected results:

Additional info:
Please attach logs from ansible-playbook with the -vvv flag

Comment 1 Torben Jaeger 2018-08-14 07:48:01 UTC
Version-Release number of the following components:

[cloud-user@bastion ~]$ rpm -q openshift-ansible
openshift-ansible-3.10.21-1.git.0.6446011.el7.noarch
[cloud-user@bastion ~]$ rpm -q ansible
ansible-2.4.6.0-1.el7ae.noarch
[cloud-user@bastion ~]$ ansible --version
ansible 2.4.6.0
  config file = /etc/ansible/ansible.cfg
  configured module search path = [u'/home/cloud-user/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/lib/python2.7/site-packages/ansible
  executable location = /usr/bin/ansible
  python version = 2.7.5 (default, Feb 20 2018, 09:19:12) [GCC 4.8.5 20150623 (Red Hat 4.8.5-28)]

Actual results:
Please include the entire output from the last TASK line through the end of output if an error is generated

TASK [openshift_control_plane : Wait for all control plane pods to become ready] ******************************************************************************
task path: /usr/share/ansible/openshift-ansible/roles/openshift_control_plane/tasks/main.yml:256
FAILED - RETRYING: Wait for all control plane pods to become ready (60 retries left).
FAILED - RETRYING: Wait for all control plane pods to become ready (59 retries left).
FAILED - RETRYING: Wait for all control plane pods to become ready (58 retries left).
FAILED - RETRYING: Wait for all control plane pods to become ready (57 retries left).
FAILED - RETRYING: Wait for all control plane pods to become ready (56 retries left).
FAILED - RETRYING: Wait for all control plane pods to become ready (55 retries left).
...
FAILED - RETRYING: Wait for all control plane pods to become ready (1 retries left).
failed: [master0.openshift.cloud] (item=etcd) => {"attempts": 60, "changed": false, "failed": true, "item": "etcd", "results": {"cmd": "/bin/oc get pod master-etcd-master0.openshift.cloud -o json -n kube-system", "results": [{}], "returncode": 0, "stderr": "Error from server (NotFound): pods \"master-etcd-master0.openshift.cloud\" not found\n", "stdout": ""}, "state": "list"}
FAILED - RETRYING: Wait for all control plane pods to become ready (60 retries left).
FAILED - RETRYING: Wait for all control plane pods to become ready (59 retries left).
FAILED - RETRYING: Wait for all control plane pods to become ready (58 retries left).
...
FAILED - RETRYING: Wait for all control plane pods to become ready (1 retries left).
failed: [master0.openshift.cloud] (item=api) => {"attempts": 60, "changed": false, "failed": true, "item": "api", "results": {"cmd": "/bin/oc get pod master-api-master0.openshift.cloud -o json -n kube-system", "results": [{}], "returncode": 0, "stderr": "Error from server (NotFound): pods \"master-api-master0.openshift.cloud\" not found\n", "stdout": ""}, "state": "list"}
FAILED - RETRYING: Wait for all control plane pods to become ready (60 retries left).
FAILED - RETRYING: Wait for all control plane pods to become ready (59 retries left).
FAILED - RETRYING: Wait for all control plane pods to become ready (58 retries left).
FAILED - RETRYING: Wait for all control plane pods to become ready (57 retries left).
...
FAILED - RETRYING: Wait for all control plane pods to become ready (1 retries left).
failed: [master0.openshift.cloud] (item=controllers) => {"attempts": 60, "changed": false, "failed": true, "item": "controllers", "results": {"cmd": "/bin/oc get pod master-controllers-master0.openshift.cloud -o json -n kube-system", "results": [{}], "returncode": 0, "stderr": "Error from server (NotFound): pods \"master-controllers-master0.openshift.cloud\" not found\n", "stdout": ""}, "state": "list"}

NO MORE HOSTS LEFT ********************************************************************************************************************************************
 [WARNING]: Could not create retry file '/usr/share/ansible/openshift-ansible/playbooks/deploy_cluster.retry'.         [Errno 13] Permission denied:
u'/usr/share/ansible/openshift-ansible/playbooks/deploy_cluster.retry'
                        
                        
PLAY RECAP ****************************************************************************************************************************************************
infra0.openshift.cloud     : ok=115  changed=55   unreachable=0    failed=0
localhost                  : ok=14   changed=0    unreachable=0    failed=0
master0.openshift.cloud    : ok=331  changed=143  unreachable=0    failed=1   
node0.openshift.cloud      : ok=115  changed=55   unreachable=0    failed=0
node1.openshift.cloud      : ok=115  changed=55   unreachable=0    failed=0
node2.openshift.cloud      : ok=115  changed=55   unreachable=0    failed=0
node3.openshift.cloud      : ok=115  changed=55   unreachable=0    failed=0
                        
                        
INSTALLER STATUS **********************************************************************************************************************************************
Initialization              : Complete (0:00:50)
Health Check                : Complete (0:00:02)
Node Bootstrap Preparation  : Complete (1:28:50)
etcd Install                : Complete (0:01:51)
Master Install              : In Progress (0:22:07)
        This phase can be restarted by running: playbooks/openshift-master/config.yml
                        
                        
Failure summary:        
                        
                        
  1. Hosts:    master0.openshift.cloud
     Play:     Configure masters
     Task:     Wait for all control plane pods to become ready
     Message:  All items completed
[cloud-user@bastion ~]$


Expected results:

Pods are found and installer continues.

Comment 2 Scott Dodson 2018-08-14 20:15:04 UTC
Same root cause as 1614904

*** This bug has been marked as a duplicate of bug 1614904 ***


Note You need to log in before you can comment on or make changes to this bug.