Bug 1738766

Summary: Upgrade failed when the logging-es-ops size is different with logging-es size
Product: OpenShift Container Platform Reporter: Anping Li <anli>
Component: LoggingAssignee: Jeff Cantrill <jcantril>
Status: CLOSED WONTFIX QA Contact: Anping Li <anli>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 3.11.0CC: aos-bugs, ckoep, jcantril, nhosoi, rmeggins
Target Milestone: ---Keywords: Reopened
Target Release: 3.11.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-06-03 21:00:52 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
The playbook logs none

Description Anping Li 2019-08-08 06:09:42 UTC
Description of problem:
if the logging-es-ops size is different with logging-es size. The logging-es-ops upgrade fail at https://github.com/openshift/openshift-ansible/blob/release-3.11/roles/openshift_logging_elasticsearch/tasks/restart_cluster.yml#L2

The playbook using openshift_logging_es_cluster_size when upgrade the logging-es-ops

Version-Release number of selected component (if applicable):
openshift3/ose-ansible:v3.11.135

How reproducible:
Always

Steps to Reproduce:
1. Deploy logging using different size for logging-es-ops and logging-es
2. Upgrade to logging
openshift_logging_install_logging=true
openshift_logging_es_cluster_size=2
openshift_logging_es_number_of_replicas=1
openshift_logging_es_number_of_shards=1
openshift_logging_es_memory_limit=2Gi
openshift_logging_es_nodeselector={"node-role.kubernetes.io/infra": "true"}

openshift_logging_use_ops=true
openshift_logging_es_ops_cluster_size=3
openshift_logging_es_ops_number_of_replicas=1
openshift_logging_es_ops_number_of_shards=1
openshift_logging_es_ops_memory_limit=2Gi
openshift_logging_es_ops_nodeselector={"node-role.kubernetes.io/compute": "true"}
openshift_logging_elasticsearch_storage_type=hostmount

Actual results:
The logging failed at https://github.com/openshift/openshift-ansible/blob/release-3.11/roles/openshift_logging_elasticsearch/tasks/restart_cluster.yml#L2.

The debug code show penshift_logging_es_cluster_size=2 is used.

RUNNING HANDLER [openshift_logging_elasticsearch : debug] *********************************************************************************************************************************************************
ok: [ec2-54-161-31-32.compute-1.amazonaws.com] => {
    "msg": "the es-ops number is  2"
}

RUNNING HANDLER [openshift_logging_elasticsearch : command] *******************************************************************************************************************************************************
FAILED - RETRYING: openshift_logging_elasticsearch : command (120 retries left).
<---snip--->
<---snip--->
FAILED - RETRYING: openshift_logging_elasticsearch : command (2 retries left).
FAILED - RETRYING: openshift_logging_elasticsearch : command (1 retries left).
fatal: [ec2-54-161-31-32.compute-1.amazonaws.com]: FAILED! => {"attempts": 120, "changed": true, "cmd": ["/usr/local/bin/oc", "--config=/etc/origin/master/admin.kubeconfig", "get", "pod", "-l", "component=es-ops,provider=openshift", "-n", "openshift-logging", "-o", "jsonpath={.items[?(@.status.phase==\"Running\")].metadata.name}"], "delta": "0:00:00.223996", "end": "2019-08-08 05:46:42.404059", "rc": 0, "start": "2019-08-08 05:46:42.180063", "stderr": "", "stderr_lines": [], "stdout": "logging-es-ops-data-master-0fr84k1a-4-hwb42 logging-es-ops-data-master-9961o92h-5-j5bxj logging-es-ops-data-master-o7nhcbo4-5-b7stm", "stdout_lines": ["logging-es-ops-data-master-0fr84k1a-4-hwb42 logging-es-ops-data-master-9961o92h-5-j5bxj logging-es-ops-data-master-o7nhcbo4-5-b7stm"]}

Expected results:
The upgrade succceed.

Additional info:

Comment 1 Anping Li 2019-08-08 06:11:45 UTC
Created attachment 1601702 [details]
The playbook logs

Comment 2 Anping Li 2019-08-08 06:30:05 UTC
The workaround is rolling restart logging-es-ops pod manully after upgrade. https://docs.openshift.com/container-platform/3.11/install_config/aggregate_logging.html#manual-elasticsearch-rollouts

Comment 4 Anping Li 2019-10-22 03:00:49 UTC
Sorry, for the neeninfo missing. I will try your code in the 3.11 testing

Comment 6 Jeff Cantrill 2020-02-02 01:32:51 UTC
Closing DEFERRED. Please reopen if problem persists and there are open customer cases.