Description of problem: ElasticSearch v5 stores indices on persistent volumes differently than earlier versions (using a hash value instead of the name of the index, I believe). When ElasticSearch is upgraded to v5, the new pods are not considered ready until the Searchguard index becomes green. Especially on large clusters this can take a VERY long time to complete, but the rollout strategy has a 30-minute default timeout before terminating the pods and rolling back to the previous version. This can leave ElasticSearch indices partially converted to v5 format, which earlier ElasticSearch versions can't deal with. Further attempts to upgrade ElasticSearch to v5 can further corrupt the partially-upgraded indices, making them extremely difficult to recover. Version-Release number of selected component (if applicable): Upgrade from v3.9.41 -> v3.10.45 -> v3.11.43
(In reply to Matthew Barnes from comment #0) > Description of problem: > > ElasticSearch v5 stores indices on persistent volumes differently than > earlier versions (using a hash value instead of the name of the index, I > believe). > > When ElasticSearch is upgraded to v5, the new pods are not considered ready > until the Searchguard index becomes green. Especially on large clusters > this can take a VERY long time to complete, but the rollout strategy has a > 30-minute default timeout before terminating the pods and rolling back to > the previous version. This is not accurate. The state of the Searchguard index is not involved in determination of readiness. My assumption is the pod(s) are rolled back because the storage from the previous deployment is not released by AWS and attached to the new deployment before the rollback time is exceeded. This was fixed by [1]. [1] https://bugzilla.redhat.com/show_bug.cgi?id=1655675
It's been a long time with this issue but talking to PM we may resolve via documentation. Need to verify following [1] returns no results and that customer's must validate prior to upgrading their ES clusters otherwise their data may not be recoverable. [1] https://github.com/jcantrill/cluster-logging-tools/blob/release-3.x/scripts/dots-in-field-names
@lukas, I'm looking to turn this into a doc issue. Expecting to reference the content of the script in #c3 and want to reference ES changes. I found the mapping explosion [1] but I dont see ref to dots in fields. Do you have a link? [1] https://www.elastic.co/guide/en/elasticsearch/reference/5.6/breaking_50_mapping_changes.html#breaking_50_mapping_changes
Documentation PR: https://github.com/openshift/openshift-docs/pull/17931
Moving to ON_QA for validation which may have already occurred given QE has been involved in reviewing the docs
LGTM
Changes are live: https://docs.openshift.com/container-platform/3.11/upgrading/automated_upgrades.html#upgrading-efk-logging-stack https://access.redhat.com/documentation/en-us/openshift_container_platform/3.11/html/upgrading_clusters/install-config-upgrading-automated-upgrades#upgrading-efk-logging-stack
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days