Description of problem: if the logging-es-ops size is different with logging-es size. The logging-es-ops upgrade fail at https://github.com/openshift/openshift-ansible/blob/release-3.11/roles/openshift_logging_elasticsearch/tasks/restart_cluster.yml#L2 The playbook using openshift_logging_es_cluster_size when upgrade the logging-es-ops Version-Release number of selected component (if applicable): openshift3/ose-ansible:v3.11.135 How reproducible: Always Steps to Reproduce: 1. Deploy logging using different size for logging-es-ops and logging-es 2. Upgrade to logging openshift_logging_install_logging=true openshift_logging_es_cluster_size=2 openshift_logging_es_number_of_replicas=1 openshift_logging_es_number_of_shards=1 openshift_logging_es_memory_limit=2Gi openshift_logging_es_nodeselector={"node-role.kubernetes.io/infra": "true"} openshift_logging_use_ops=true openshift_logging_es_ops_cluster_size=3 openshift_logging_es_ops_number_of_replicas=1 openshift_logging_es_ops_number_of_shards=1 openshift_logging_es_ops_memory_limit=2Gi openshift_logging_es_ops_nodeselector={"node-role.kubernetes.io/compute": "true"} openshift_logging_elasticsearch_storage_type=hostmount Actual results: The logging failed at https://github.com/openshift/openshift-ansible/blob/release-3.11/roles/openshift_logging_elasticsearch/tasks/restart_cluster.yml#L2. The debug code show penshift_logging_es_cluster_size=2 is used. RUNNING HANDLER [openshift_logging_elasticsearch : debug] ********************************************************************************************************************************************************* ok: [ec2-54-161-31-32.compute-1.amazonaws.com] => { "msg": "the es-ops number is 2" } RUNNING HANDLER [openshift_logging_elasticsearch : command] ******************************************************************************************************************************************************* FAILED - RETRYING: openshift_logging_elasticsearch : command (120 retries left). <---snip---> <---snip---> FAILED - RETRYING: openshift_logging_elasticsearch : command (2 retries left). FAILED - RETRYING: openshift_logging_elasticsearch : command (1 retries left). fatal: [ec2-54-161-31-32.compute-1.amazonaws.com]: FAILED! => {"attempts": 120, "changed": true, "cmd": ["/usr/local/bin/oc", "--config=/etc/origin/master/admin.kubeconfig", "get", "pod", "-l", "component=es-ops,provider=openshift", "-n", "openshift-logging", "-o", "jsonpath={.items[?(@.status.phase==\"Running\")].metadata.name}"], "delta": "0:00:00.223996", "end": "2019-08-08 05:46:42.404059", "rc": 0, "start": "2019-08-08 05:46:42.180063", "stderr": "", "stderr_lines": [], "stdout": "logging-es-ops-data-master-0fr84k1a-4-hwb42 logging-es-ops-data-master-9961o92h-5-j5bxj logging-es-ops-data-master-o7nhcbo4-5-b7stm", "stdout_lines": ["logging-es-ops-data-master-0fr84k1a-4-hwb42 logging-es-ops-data-master-9961o92h-5-j5bxj logging-es-ops-data-master-o7nhcbo4-5-b7stm"]} Expected results: The upgrade succceed. Additional info:
Created attachment 1601702 [details] The playbook logs
The workaround is rolling restart logging-es-ops pod manully after upgrade. https://docs.openshift.com/container-platform/3.11/install_config/aggregate_logging.html#manual-elasticsearch-rollouts
Sorry, for the neeninfo missing. I will try your code in the 3.11 testing
Closing DEFERRED. Please reopen if problem persists and there are open customer cases.