Bug 1738766

Summary:

Upgrade failed when the logging-es-ops size is different with logging-es size

Product:

OpenShift Container Platform

Reporter:

Anping Li <anli>

Component:

Logging

Assignee:

Jeff Cantrill <jcantril>

Status:

CLOSED WONTFIX

QA Contact:

Anping Li <anli>

Severity:

medium

Docs Contact:

Priority:

unspecified

Version:

3.11.0

CC:

aos-bugs, ckoep, jcantril, nhosoi, rmeggins

Target Milestone:

---

Keywords:

Reopened

Target Release:

3.11.z

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2020-06-03 21:00:52 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
The playbook logs	none

Description Anping Li 2019-08-08 06:09:42 UTC

Description of problem:
if the logging-es-ops size is different with logging-es size. The logging-es-ops upgrade fail at https://github.com/openshift/openshift-ansible/blob/release-3.11/roles/openshift_logging_elasticsearch/tasks/restart_cluster.yml#L2

The playbook using openshift_logging_es_cluster_size when upgrade the logging-es-ops

Version-Release number of selected component (if applicable):
openshift3/ose-ansible:v3.11.135

How reproducible:
Always

Steps to Reproduce:
1. Deploy logging using different size for logging-es-ops and logging-es
2. Upgrade to logging
openshift_logging_install_logging=true
openshift_logging_es_cluster_size=2
openshift_logging_es_number_of_replicas=1
openshift_logging_es_number_of_shards=1
openshift_logging_es_memory_limit=2Gi
openshift_logging_es_nodeselector={"node-role.kubernetes.io/infra": "true"}

openshift_logging_use_ops=true
openshift_logging_es_ops_cluster_size=3
openshift_logging_es_ops_number_of_replicas=1
openshift_logging_es_ops_number_of_shards=1
openshift_logging_es_ops_memory_limit=2Gi
openshift_logging_es_ops_nodeselector={"node-role.kubernetes.io/compute": "true"}
openshift_logging_elasticsearch_storage_type=hostmount

Actual results:
The logging failed at https://github.com/openshift/openshift-ansible/blob/release-3.11/roles/openshift_logging_elasticsearch/tasks/restart_cluster.yml#L2.

The debug code show penshift_logging_es_cluster_size=2 is used.

RUNNING HANDLER [openshift_logging_elasticsearch : debug] *********************************************************************************************************************************************************
ok: [ec2-54-161-31-32.compute-1.amazonaws.com] => {
    "msg": "the es-ops number is  2"
}

RUNNING HANDLER [openshift_logging_elasticsearch : command] *******************************************************************************************************************************************************
FAILED - RETRYING: openshift_logging_elasticsearch : command (120 retries left).
<---snip--->
<---snip--->
FAILED - RETRYING: openshift_logging_elasticsearch : command (2 retries left).
FAILED - RETRYING: openshift_logging_elasticsearch : command (1 retries left).
fatal: [ec2-54-161-31-32.compute-1.amazonaws.com]: FAILED! => {"attempts": 120, "changed": true, "cmd": ["/usr/local/bin/oc", "--config=/etc/origin/master/admin.kubeconfig", "get", "pod", "-l", "component=es-ops,provider=openshift", "-n", "openshift-logging", "-o", "jsonpath={.items[?(@.status.phase==\"Running\")].metadata.name}"], "delta": "0:00:00.223996", "end": "2019-08-08 05:46:42.404059", "rc": 0, "start": "2019-08-08 05:46:42.180063", "stderr": "", "stderr_lines": [], "stdout": "logging-es-ops-data-master-0fr84k1a-4-hwb42 logging-es-ops-data-master-9961o92h-5-j5bxj logging-es-ops-data-master-o7nhcbo4-5-b7stm", "stdout_lines": ["logging-es-ops-data-master-0fr84k1a-4-hwb42 logging-es-ops-data-master-9961o92h-5-j5bxj logging-es-ops-data-master-o7nhcbo4-5-b7stm"]}

Expected results:
The upgrade succceed.

Additional info:

Comment 1 Anping Li 2019-08-08 06:11:45 UTC

Created attachment 1601702 [details]
The playbook logs

Comment 2 Anping Li 2019-08-08 06:30:05 UTC

The workaround is rolling restart logging-es-ops pod manully after upgrade. https://docs.openshift.com/container-platform/3.11/install_config/aggregate_logging.html#manual-elasticsearch-rollouts

Comment 4 Anping Li 2019-10-22 03:00:49 UTC

Sorry, for the neeninfo missing. I will try your code in the 3.11 testing

Comment 6 Jeff Cantrill 2020-02-02 01:32:51 UTC

Closing DEFERRED. Please reopen if problem persists and there are open customer cases.