Bug 1738766 - Upgrade failed when the logging-es-ops size is different with logging-es size
Summary: Upgrade failed when the logging-es-ops size is different with logging-es size
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Logging
Version: 3.11.0
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 3.11.z
Assignee: Jeff Cantrill
QA Contact: Anping Li
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-08-08 06:09 UTC by Anping Li
Modified: 2023-10-06 18:28 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-06-03 21:00:52 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
The playbook logs (32.64 KB, application/gzip)
2019-08-08 06:11 UTC, Anping Li
no flags Details

Description Anping Li 2019-08-08 06:09:42 UTC
Description of problem:
if the logging-es-ops size is different with logging-es size. The logging-es-ops upgrade fail at https://github.com/openshift/openshift-ansible/blob/release-3.11/roles/openshift_logging_elasticsearch/tasks/restart_cluster.yml#L2

The playbook using openshift_logging_es_cluster_size when upgrade the logging-es-ops

Version-Release number of selected component (if applicable):
openshift3/ose-ansible:v3.11.135

How reproducible:
Always

Steps to Reproduce:
1. Deploy logging using different size for logging-es-ops and logging-es
2. Upgrade to logging
openshift_logging_install_logging=true
openshift_logging_es_cluster_size=2
openshift_logging_es_number_of_replicas=1
openshift_logging_es_number_of_shards=1
openshift_logging_es_memory_limit=2Gi
openshift_logging_es_nodeselector={"node-role.kubernetes.io/infra": "true"}

openshift_logging_use_ops=true
openshift_logging_es_ops_cluster_size=3
openshift_logging_es_ops_number_of_replicas=1
openshift_logging_es_ops_number_of_shards=1
openshift_logging_es_ops_memory_limit=2Gi
openshift_logging_es_ops_nodeselector={"node-role.kubernetes.io/compute": "true"}
openshift_logging_elasticsearch_storage_type=hostmount

Actual results:
The logging failed at https://github.com/openshift/openshift-ansible/blob/release-3.11/roles/openshift_logging_elasticsearch/tasks/restart_cluster.yml#L2.

The debug code show penshift_logging_es_cluster_size=2 is used.

RUNNING HANDLER [openshift_logging_elasticsearch : debug] *********************************************************************************************************************************************************
ok: [ec2-54-161-31-32.compute-1.amazonaws.com] => {
    "msg": "the es-ops number is  2"
}

RUNNING HANDLER [openshift_logging_elasticsearch : command] *******************************************************************************************************************************************************
FAILED - RETRYING: openshift_logging_elasticsearch : command (120 retries left).
<---snip--->
<---snip--->
FAILED - RETRYING: openshift_logging_elasticsearch : command (2 retries left).
FAILED - RETRYING: openshift_logging_elasticsearch : command (1 retries left).
fatal: [ec2-54-161-31-32.compute-1.amazonaws.com]: FAILED! => {"attempts": 120, "changed": true, "cmd": ["/usr/local/bin/oc", "--config=/etc/origin/master/admin.kubeconfig", "get", "pod", "-l", "component=es-ops,provider=openshift", "-n", "openshift-logging", "-o", "jsonpath={.items[?(@.status.phase==\"Running\")].metadata.name}"], "delta": "0:00:00.223996", "end": "2019-08-08 05:46:42.404059", "rc": 0, "start": "2019-08-08 05:46:42.180063", "stderr": "", "stderr_lines": [], "stdout": "logging-es-ops-data-master-0fr84k1a-4-hwb42 logging-es-ops-data-master-9961o92h-5-j5bxj logging-es-ops-data-master-o7nhcbo4-5-b7stm", "stdout_lines": ["logging-es-ops-data-master-0fr84k1a-4-hwb42 logging-es-ops-data-master-9961o92h-5-j5bxj logging-es-ops-data-master-o7nhcbo4-5-b7stm"]}

Expected results:
The upgrade succceed.

Additional info:

Comment 1 Anping Li 2019-08-08 06:11:45 UTC
Created attachment 1601702 [details]
The playbook logs

Comment 2 Anping Li 2019-08-08 06:30:05 UTC
The workaround is rolling restart logging-es-ops pod manully after upgrade. https://docs.openshift.com/container-platform/3.11/install_config/aggregate_logging.html#manual-elasticsearch-rollouts

Comment 4 Anping Li 2019-10-22 03:00:49 UTC
Sorry, for the neeninfo missing. I will try your code in the 3.11 testing

Comment 6 Jeff Cantrill 2020-02-02 01:32:51 UTC
Closing DEFERRED. Please reopen if problem persists and there are open customer cases.


Note You need to log in before you can comment on or make changes to this bug.