1734793 – [openshift-ansible]/openshift-logging/config.yaml failed provisioning additional instances of Elasticsearch

Bug 1734793 - [openshift-ansible]/openshift-logging/config.yaml failed provisioning additional instances of Elasticsearch

Summary: [openshift-ansible]/openshift-logging/config.yaml failed provisioning additio...

Keywords:
Status:	CLOSED WORKSFORME
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Logging
Sub Component:
Version:	3.11.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	3.11.z
Assignee:	Jeff Cantrill
QA Contact:	Anping Li
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2019-07-31 13:18 UTC by Radomir Ludva
Modified:	2020-02-02 01:29 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-02-02 01:29:36 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
Invenotry file (second variant with es_cluster_size=3) (5.94 KB, text/plain) 2019-07-31 13:18 UTC, Radomir Ludva	no flags	Details
Log from ansible playbook (4.26 MB, text/plain) 2019-07-31 13:19 UTC, Radomir Ludva	no flags	Details
View All

Description Radomir Ludva 2019-07-31 13:18:32 UTC

Created attachment 1595084 [details]
Invenotry file (second variant with es_cluster_size=3)

Description of problem:
After provision OCP v3.11.129 with
openshift_logging_es_cluster_size=1
openshift_logging_es_number_of_shards=1
openshift_logging_es_number_of_replicas=1

I realize that I need:
openshift_logging_es_cluster_size=3
openshift_logging_es_number_of_shards=1
openshift_logging_es_number_of_replicas=1

So after executing openshift-logging playbook it failed to deploy the rest two ES pods. It was important to manually oc rollout latest <es-deployment> on openshift-logging namespace. Then the two additional ES pods were created, but this should be managed by the ansible playbook.

Version-Release number of the following components:
# rpm -q openshift-ansible
openshift-ansible-3.11.123-1.git.0.db681ba.el7.noarch

# rpm -q ansible
ansible-2.6.18-1.el7ae.noarch

# ansible --version
ansible 2.6.18
  config file = /usr/share/ansible/openshift-ansible/ansible.cfg
  configured module search path = [u'/root/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/lib/python2.7/site-packages/ansible
  executable location = /usr/bin/ansible
  python version = 2.7.5 (default, Jun 11 2019, 12:19:05) [GCC 4.8.5 20150623 (Red Hat 4.8.5-36)]


How reproducible:
Deploy OCP with only one instance of ES, then change inventory file for 
openshift_logging_es_cluster_size=3 and deploy openshift-logging.

Actual results:
FAILED - RETRYING: openshift_logging_elasticsearch : command (3 retries left).
FAILED - RETRYING: openshift_logging_elasticsearch : command (2 retries left).
FAILED - RETRYING: openshift_logging_elasticsearch : command (1 retries left).
fatal: [torii-ichi-master.local.nutius.com]: FAILED! => {"attempts": 120, "changed": true, "cmd": ["oc", "--config=/etc/origin/master/admin.kubeconfig", "get", "pod", "-l", "component=es,provider=openshift", "-n", "openshift-logging", "-o", "jsonpath={.items[?(@.status.phase==\"Running\")].metadata.name}"], "delta": "0:00:00.182890", "end": "2019-07-31 14:48:27.233188", "rc": 0, "start": "2019-07-31 14:48:27.050298", "stderr": "", "stderr_lines": [], "stdout": "logging-es-data-master-vevnrhov-1-g8k89", "stdout_lines": ["logging-es-data-master-vevnrhov-1-g8k89"]}

Expected results:
Next two instances of ES should be created by ansible installation script without any issue.

Additional information:
I was simulating a situation when a customer has existing ES and needs to extend the ES for High-Availability. Without removing existing storage or existing ES pod.

Comment 1 Radomir Ludva 2019-07-31 13:19:21 UTC

Created attachment 1595085 [details]
Log from ansible playbook

Comment 2 Radomir Ludva 2019-07-31 13:24:40 UTC

Additional info
---------------
This cluster has a lot of install/uninstall playbook execution. But before the last installation, the OCP was regularly uninstalled by ansible playbook successfully. 

So the process workflow: Uninstall OCP -> Install OCP -> add es_cluster_size=3 -> deploy again openshift-logging

Note You need to log in before you can comment on or make changes to this bug.