1738766 – Upgrade failed when the logging-es-ops size is different with logging-es size

Bug 1738766 - Upgrade failed when the logging-es-ops size is different with logging-es size

Summary: Upgrade failed when the logging-es-ops size is different with logging-es size

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Logging
Sub Component:
Version:	3.11.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	3.11.z
Assignee:	Jeff Cantrill
QA Contact:	Anping Li
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2019-08-08 06:09 UTC by Anping Li
Modified:	2023-10-06 18:28 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-06-03 21:00:52 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
The playbook logs (32.64 KB, application/gzip) 2019-08-08 06:11 UTC, Anping Li	no flags	Details
View All

Description Anping Li 2019-08-08 06:09:42 UTC

Description of problem:
if the logging-es-ops size is different with logging-es size. The logging-es-ops upgrade fail at https://github.com/openshift/openshift-ansible/blob/release-3.11/roles/openshift_logging_elasticsearch/tasks/restart_cluster.yml#L2

The playbook using openshift_logging_es_cluster_size when upgrade the logging-es-ops

Version-Release number of selected component (if applicable):
openshift3/ose-ansible:v3.11.135

How reproducible:
Always

Steps to Reproduce:
1. Deploy logging using different size for logging-es-ops and logging-es
2. Upgrade to logging
openshift_logging_install_logging=true
openshift_logging_es_cluster_size=2
openshift_logging_es_number_of_replicas=1
openshift_logging_es_number_of_shards=1
openshift_logging_es_memory_limit=2Gi
openshift_logging_es_nodeselector={"node-role.kubernetes.io/infra": "true"}

openshift_logging_use_ops=true
openshift_logging_es_ops_cluster_size=3
openshift_logging_es_ops_number_of_replicas=1
openshift_logging_es_ops_number_of_shards=1
openshift_logging_es_ops_memory_limit=2Gi
openshift_logging_es_ops_nodeselector={"node-role.kubernetes.io/compute": "true"}
openshift_logging_elasticsearch_storage_type=hostmount

Actual results:
The logging failed at https://github.com/openshift/openshift-ansible/blob/release-3.11/roles/openshift_logging_elasticsearch/tasks/restart_cluster.yml#L2.

The debug code show penshift_logging_es_cluster_size=2 is used.

RUNNING HANDLER [openshift_logging_elasticsearch : debug] *********************************************************************************************************************************************************
ok: [ec2-54-161-31-32.compute-1.amazonaws.com] => {
    "msg": "the es-ops number is  2"
}

RUNNING HANDLER [openshift_logging_elasticsearch : command] *******************************************************************************************************************************************************
FAILED - RETRYING: openshift_logging_elasticsearch : command (120 retries left).
<---snip--->
<---snip--->
FAILED - RETRYING: openshift_logging_elasticsearch : command (2 retries left).
FAILED - RETRYING: openshift_logging_elasticsearch : command (1 retries left).
fatal: [ec2-54-161-31-32.compute-1.amazonaws.com]: FAILED! => {"attempts": 120, "changed": true, "cmd": ["/usr/local/bin/oc", "--config=/etc/origin/master/admin.kubeconfig", "get", "pod", "-l", "component=es-ops,provider=openshift", "-n", "openshift-logging", "-o", "jsonpath={.items[?(@.status.phase==\"Running\")].metadata.name}"], "delta": "0:00:00.223996", "end": "2019-08-08 05:46:42.404059", "rc": 0, "start": "2019-08-08 05:46:42.180063", "stderr": "", "stderr_lines": [], "stdout": "logging-es-ops-data-master-0fr84k1a-4-hwb42 logging-es-ops-data-master-9961o92h-5-j5bxj logging-es-ops-data-master-o7nhcbo4-5-b7stm", "stdout_lines": ["logging-es-ops-data-master-0fr84k1a-4-hwb42 logging-es-ops-data-master-9961o92h-5-j5bxj logging-es-ops-data-master-o7nhcbo4-5-b7stm"]}

Expected results:
The upgrade succceed.

Additional info:

Comment 1 Anping Li 2019-08-08 06:11:45 UTC

Created attachment 1601702 [details]
The playbook logs

Comment 2 Anping Li 2019-08-08 06:30:05 UTC

The workaround is rolling restart logging-es-ops pod manully after upgrade. https://docs.openshift.com/container-platform/3.11/install_config/aggregate_logging.html#manual-elasticsearch-rollouts

Comment 4 Anping Li 2019-10-22 03:00:49 UTC

Sorry, for the neeninfo missing. I will try your code in the 3.11 testing

Comment 6 Jeff Cantrill 2020-02-02 01:32:51 UTC

Closing DEFERRED. Please reopen if problem persists and there are open customer cases.

Note You need to log in before you can comment on or make changes to this bug.