Description of problem: During an upgrade of free-int to v3.10.0-0.37.0, the logging upgrade playbooks timed out: RUNNING HANDLER [openshift_logging_elasticsearch : command] ******************** Friday 11 May 2018 17:06:49 +0000 (0:00:00.473) 0:09:18.097 ************ FAILED - RETRYING: Waiting for ES node logging-es-data-master-t7rrl3te health to be in ['green', 'yellow'] (40 retries left). FAILED - RETRYING: Waiting for ES node logging-es-data-master-t7rrl3te health to be in ['green', 'yellow'] (39 retries left). FAILED - RETRYING: Waiting for ES node logging-es-data-master-t7rrl3te health to be in ['green', 'yellow'] (38 retries left). FAILED - RETRYING: Waiting for ES node logging-es-data-master-t7rrl3te health to be in ['green', 'yellow'] (37 retries left). FAILED - RETRYING: Waiting for ES node logging-es-data-master-t7rrl3te health to be in ['green', 'yellow'] (36 retries left). FAILED - RETRYING: Waiting for ES node logging-es-data-master-t7rrl3te health to be in ['green', 'yellow'] (35 retries left). FAILED - RETRYING: Waiting for ES node logging-es-data-master-t7rrl3te health to be in ['green', 'yellow'] (34 retries left). FAILED - RETRYING: Waiting for ES node logging-es-data-master-t7rrl3te health to be in ['green', 'yellow'] (33 retries left). ....
@ewolinet, Thanks. The ES status be in red status! Are the ES pod restart changed the one ES nodes Shards Allocation status to enabled while the other nodes are still disabled?
@Anping, One of the changes we are making is to disable shard allocation before the rollout of a node and re-enable shard allocation after it is available but prior to waiting for the cluster to return to 'green'. The issue we are seeing is that when a new index is created, if shard allocation is set to 'none' then we are unable to place any of the shards for the index which will automatically put the cluster into a 'red' state. This change should allow the cluster to return to 'green' between restarts.
https://github.com/openshift/openshift-ansible/pull/8415
Shall we use persistent setting? I think the transient setting may be changed during ES restart.
upgrade from v3.9 to v3.10 via openshift-ansible-3.10.0-0.53.0. The ES cluster are not restarted. The playbook report: Cluster logging-es was not in an optimal state and will not be automatically restarted. Please see documentation regarding doing a rolling cluster restart. [
Created attachment 1445323 [details] The ansible logs for logging upgrade The cluster_pods.stdout_lines is 1. It should be 3. Attached all ansible logs. RUNNING HANDLER [openshift_logging_elasticsearch : debug] ********************************************************************************************************************************************************* ok: [qe-anli310master-etcd-1.0529-l0l.qe.rhcloud.com] => { "msg": "Cluster logging-es was not in an optimal state and will not be automatically restarted. Please see documentation regarding doing a rolling cluster restart." } RUNNING HANDLER [openshift_logging_elasticsearch : debug] ********************************************************************************************************************************************************* ok: [qe-anli310master-etcd-1.0529-l0l.qe.rhcloud.com] => { "msg": "pod status is green, number_of_nodes is 3, cluster_pods.stdout_lines is 1" }
I watched red indices in v3.9 testing today. When I redeployed logging, one automation scripts is creating and deleting projects. some project index had become red. the .operations and .orphaned also had become red. Not sure if that is same issue, just leave a message here.
The upgrade works well with 3.10.0-0.60.0.