Bug 1645323

Summary:

Director deployed OCP 3.11: scaling out with an additional master node fails during TASK [openshift_control_plane : Wait for all control plane pods to become ready]

Product:

Red Hat OpenStack

Reporter:

Marius Cornea <mcornea>

Component:

openstack-tripleo-heat-templates

Assignee:

Martin André <m.andre>

Status:

CLOSED ERRATA

QA Contact:

Marius Cornea <mcornea>

Severity:

urgent

Docs Contact:

Priority:

urgent

Version:

14.0 (Rocky)

CC:

athomas, dbecker, m.andre, mburns, morazi, sclewis

Target Milestone:

Keywords:

Triaged

Target Release:

14.0 (Rocky)

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

openstack-tripleo-heat-templates-9.0.1-0.20181013060891.el7ost

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2019-01-11 11:54:26 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
openshift.tar.gz	none

Description Marius Cornea 2018-11-01 23:43:33 UTC

Created attachment 1500279 [details]
openshift.tar.gz

Description of problem:
Director deployed OCP 3.11: scaling out with an additional master node fails during TASK [openshift_control_plane : Wait for all control plane pods to become ready]:

TASK [openshift_control_plane : Wait for all control plane pods to become ready] ***
FAILED - RETRYING: Wait for all control plane pods to become ready (2 retries left).
FAILED - RETRYING: Wait for all control plane pods to become ready (1 retries left).
failed: [openshift-master-3] (item=etcd) => {"attempts": 60, "changed": false, "item": "etcd", "results": {"cmd": "/bin/oc get pod master-etcd-openshift-master-3 -o json -n kube-system", "results": [{}], "returncode": 0, "stderr": "Error from server (NotFound): pods \"master-etcd-openshift-master-3\" not found\n", "stdout": ""}, "state": "list"}
ok: [openshift-master-3] => (item=api)
ok: [openshift-master-3] => (item=controllers)

NO MORE HOSTS LEFT *************************************************************

PLAY RECAP *********************************************************************
localhost                  : ok=36   changed=0    unreachable=0    failed=0   
openshift-infra-0          : ok=26   changed=5    unreachable=0    failed=0   
openshift-infra-1          : ok=26   changed=5    unreachable=0    failed=0   
openshift-master-0         : ok=52   changed=7    unreachable=0    failed=0   
openshift-master-1         : ok=52   changed=7    unreachable=0    failed=0   
openshift-master-2         : ok=92   changed=7    unreachable=0    failed=0   
openshift-master-3         : ok=321  changed=126  unreachable=0    failed=1   
openshift-worker-0         : ok=26   changed=5    unreachable=0    failed=0   
openshift-worker-1         : ok=26   changed=5    unreachable=0    failed=0   


INSTALLER STATUS ***************************************************************
Initialization              : Complete (0:01:57)
Node Bootstrap Preparation  : Complete (0:03:45)
Master Install              : In Progress (0:08:52)
	This phase can be restarted by running: playbooks/openshift-master/config.yml


Failure summary:


  1. Hosts:    openshift-master-3
     Play:     Configure masters
     Task:     Wait for all control plane pods to become ready
     Message:  All items completed


Version-Release number of selected component (if applicable):
openstack-tripleo-heat-templates-9.0.1-0.20181013060867.ffbe879.el7ost.noarch

How reproducible:
100%

Steps to Reproduce:
1. Deploy environment with 3 x masters + 2 x infra + 2 x worker nodes
2. Add an additional master node and re-run overcloud deploy command

Actual results:
Deployment fails.

Expected results:
No failures.

Additional info:
Attaching /var/lib/mistral.

Comment 1 Martin André 2018-11-07 08:15:27 UTC

I've tried to reproduce this issue twice, and both times it failed earlier for me with a different error:

TASK [etcd : Ensure CA certificate exists on etcd_ca_host] *********************
ok: [openshift-openshiftmaster-1 -> 192.168.24.24]

TASK [etcd : fail] *************************************************************
fatal: [openshift-openshiftmaster-1]: FAILED! => {"changed": false, "msg": "CA certificate /etc/etcd/ca/ca.crt doesn't exist on CA host openshift-openshiftmaster-1. Apply 'etcd_ca' action from `etcd` role to openshift-openshiftmaster-1.\n"}

NO MORE HOSTS LEFT *************************************************************

PLAY RECAP *********************************************************************
localhost                  : ok=39   changed=0    unreachable=0    failed=0   
openshift-openshiftinfra-0 : ok=27   changed=5    unreachable=0    failed=0   
openshift-openshiftinfra-1 : ok=27   changed=5    unreachable=0    failed=0   
openshift-openshiftinfra-2 : ok=27   changed=5    unreachable=0    failed=0   
openshift-openshiftmaster-0 : ok=53   changed=7    unreachable=0    failed=0   
openshift-openshiftmaster-1 : ok=242  changed=71   unreachable=0    failed=1   
openshift-openshiftworker-0 : ok=27   changed=5    unreachable=0    failed=0   
openshift-openshiftworker-1 : ok=27   changed=5    unreachable=0    failed=0   
openshift-openshiftworker-2 : ok=27   changed=5    unreachable=0    failed=0   


INSTALLER STATUS ***************************************************************
Initialization              : Complete (0:01:14)
Node Bootstrap Preparation  : Complete (0:04:51)


Failure summary:


  1. Hosts:    openshift-openshiftmaster-1
     Play:     Create etcd client certificates for master hosts
     Task:     etcd : fail
     Message:  CA certificate /etc/etcd/ca/ca.crt doesn't exist on CA host openshift-openshiftmaster-1. Apply 'etcd_ca' action from `etcd` role to openshift-openshiftmaster-1.

Comment 3 Martin André 2018-11-21 13:01:49 UTC

The upstream patch at https://review.openstack.org/616584 should fix the issue.

Comment 10 Martin André 2019-01-10 10:19:37 UTC

No doc text required.

Comment 11 errata-xmlrpc 2019-01-11 11:54:26 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:0045