Bug 1645323 - Director deployed OCP 3.11: scaling out with an additional master node fails during TASK [openshift_control_plane : Wait for all control plane pods to become ready]
Summary: Director deployed OCP 3.11: scaling out with an additional master node fails ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 14.0 (Rocky)
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: rc
: 14.0 (Rocky)
Assignee: Martin André
QA Contact: Marius Cornea
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-11-01 23:43 UTC by Marius Cornea
Modified: 2019-01-11 11:54 UTC (History)
6 users (show)

Fixed In Version: openstack-tripleo-heat-templates-9.0.1-0.20181013060891.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-01-11 11:54:26 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
openshift.tar.gz (2.93 MB, application/x-gzip)
2018-11-01 23:43 UTC, Marius Cornea
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1802319 0 None None None 2018-11-08 14:57:14 UTC
OpenStack gerrit 616584 0 None None None 2018-11-08 15:30:22 UTC
Red Hat Product Errata RHEA-2019:0045 0 None None None 2019-01-11 11:54:40 UTC

Description Marius Cornea 2018-11-01 23:43:33 UTC
Created attachment 1500279 [details]
openshift.tar.gz

Description of problem:
Director deployed OCP 3.11: scaling out with an additional master node fails during TASK [openshift_control_plane : Wait for all control plane pods to become ready]:

TASK [openshift_control_plane : Wait for all control plane pods to become ready] ***
FAILED - RETRYING: Wait for all control plane pods to become ready (2 retries left).
FAILED - RETRYING: Wait for all control plane pods to become ready (1 retries left).
failed: [openshift-master-3] (item=etcd) => {"attempts": 60, "changed": false, "item": "etcd", "results": {"cmd": "/bin/oc get pod master-etcd-openshift-master-3 -o json -n kube-system", "results": [{}], "returncode": 0, "stderr": "Error from server (NotFound): pods \"master-etcd-openshift-master-3\" not found\n", "stdout": ""}, "state": "list"}
ok: [openshift-master-3] => (item=api)
ok: [openshift-master-3] => (item=controllers)

NO MORE HOSTS LEFT *************************************************************

PLAY RECAP *********************************************************************
localhost                  : ok=36   changed=0    unreachable=0    failed=0   
openshift-infra-0          : ok=26   changed=5    unreachable=0    failed=0   
openshift-infra-1          : ok=26   changed=5    unreachable=0    failed=0   
openshift-master-0         : ok=52   changed=7    unreachable=0    failed=0   
openshift-master-1         : ok=52   changed=7    unreachable=0    failed=0   
openshift-master-2         : ok=92   changed=7    unreachable=0    failed=0   
openshift-master-3         : ok=321  changed=126  unreachable=0    failed=1   
openshift-worker-0         : ok=26   changed=5    unreachable=0    failed=0   
openshift-worker-1         : ok=26   changed=5    unreachable=0    failed=0   


INSTALLER STATUS ***************************************************************
Initialization              : Complete (0:01:57)
Node Bootstrap Preparation  : Complete (0:03:45)
Master Install              : In Progress (0:08:52)
	This phase can be restarted by running: playbooks/openshift-master/config.yml


Failure summary:


  1. Hosts:    openshift-master-3
     Play:     Configure masters
     Task:     Wait for all control plane pods to become ready
     Message:  All items completed


Version-Release number of selected component (if applicable):
openstack-tripleo-heat-templates-9.0.1-0.20181013060867.ffbe879.el7ost.noarch

How reproducible:
100%

Steps to Reproduce:
1. Deploy environment with 3 x masters + 2 x infra + 2 x worker nodes
2. Add an additional master node and re-run overcloud deploy command

Actual results:
Deployment fails.

Expected results:
No failures.

Additional info:
Attaching /var/lib/mistral.

Comment 1 Martin André 2018-11-07 08:15:27 UTC
I've tried to reproduce this issue twice, and both times it failed earlier for me with a different error:

TASK [etcd : Ensure CA certificate exists on etcd_ca_host] *********************
ok: [openshift-openshiftmaster-1 -> 192.168.24.24]

TASK [etcd : fail] *************************************************************
fatal: [openshift-openshiftmaster-1]: FAILED! => {"changed": false, "msg": "CA certificate /etc/etcd/ca/ca.crt doesn't exist on CA host openshift-openshiftmaster-1. Apply 'etcd_ca' action from `etcd` role to openshift-openshiftmaster-1.\n"}

NO MORE HOSTS LEFT *************************************************************

PLAY RECAP *********************************************************************
localhost                  : ok=39   changed=0    unreachable=0    failed=0   
openshift-openshiftinfra-0 : ok=27   changed=5    unreachable=0    failed=0   
openshift-openshiftinfra-1 : ok=27   changed=5    unreachable=0    failed=0   
openshift-openshiftinfra-2 : ok=27   changed=5    unreachable=0    failed=0   
openshift-openshiftmaster-0 : ok=53   changed=7    unreachable=0    failed=0   
openshift-openshiftmaster-1 : ok=242  changed=71   unreachable=0    failed=1   
openshift-openshiftworker-0 : ok=27   changed=5    unreachable=0    failed=0   
openshift-openshiftworker-1 : ok=27   changed=5    unreachable=0    failed=0   
openshift-openshiftworker-2 : ok=27   changed=5    unreachable=0    failed=0   


INSTALLER STATUS ***************************************************************
Initialization              : Complete (0:01:14)
Node Bootstrap Preparation  : Complete (0:04:51)


Failure summary:


  1. Hosts:    openshift-openshiftmaster-1
     Play:     Create etcd client certificates for master hosts
     Task:     etcd : fail
     Message:  CA certificate /etc/etcd/ca/ca.crt doesn't exist on CA host openshift-openshiftmaster-1. Apply 'etcd_ca' action from `etcd` role to openshift-openshiftmaster-1.

Comment 3 Martin André 2018-11-21 13:01:49 UTC
The upstream patch at https://review.openstack.org/616584 should fix the issue.

Comment 10 Martin André 2019-01-10 10:19:37 UTC
No doc text required.

Comment 11 errata-xmlrpc 2019-01-11 11:54:26 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:0045


Note You need to log in before you can comment on or make changes to this bug.