Bug 1844097
Summary: | The ES pods couldn't be READY during upgrade. | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Anping Li <anli> | ||||||
Component: | Logging | Assignee: | Jeff Cantrill <jcantril> | ||||||
Status: | CLOSED ERRATA | QA Contact: | Anping Li <anli> | ||||||
Severity: | high | Docs Contact: | |||||||
Priority: | unspecified | ||||||||
Version: | 4.5 | CC: | aos-bugs, cruhm, lvlcek | ||||||
Target Milestone: | --- | ||||||||
Target Release: | 4.6.0 | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | backport:4.5 | ||||||||
Fixed In Version: | Doc Type: | No Doc Update | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2020-10-27 16:05:23 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 1845118 | ||||||||
Attachments: |
|
Description
Anping Li
2020-06-04 15:39:25 UTC
Created attachment 1695231 [details]
Upgrade steps or logs
Created attachment 1695233 [details]
elasticsearch pod log
[anli@preserve-docker-slave 96583]$ oc get pods NAME READY STATUS RESTARTS AGE cluster-logging-operator-568599f687-8prlw 1/1 Running 0 18m curator-1591363200-t8jrs 0/1 Completed 0 15m curator-1591363800-fshbz 1/1 Running 0 5m1s elasticsearch-cdm-dkx6l77h-1-5bfc78ffd-r5psk 1/2 Running 0 6m48s elasticsearch-cdm-dkx6l77h-2-589999f69f-bpwtf 1/2 Running 0 5m35s elasticsearch-cdm-dkx6l77h-3-846df5674d-4rgl7 1/2 Running 0 5m oc exec -c elasticsearch elasticsearch-cdm-dkx6l77h-1-5bfc78ffd-r5psk -- es_util '--query=_cluster/settings?pretty' { "persistent" : { "cluster" : { "routing" : { "allocation" : { "enable" : "primaries" } } }, "discovery" : { "zen" : { "minimum_master_nodes" : "2" } } }, "transient" : { "cluster" : { "routing" : { "allocation" : { "enable" : "all" } } } } } {"level":"info","ts":1591363697.9201612,"logger":"kubebuilder.controller","msg":"Starting workers","controller":"kibana-controller","worker count":1} time="2020-06-05T13:28:19Z" level=warning msg="Unable to perform synchronized flush: Failed to flush 3 shards in preparation for cluster restart" time="2020-06-05T13:28:22Z" level=info msg="Waiting for all nodes to rejoin cluster \"elasticsearch\" in namespace \"openshift-logging\"" time="2020-06-05T13:28:53Z" level=warning msg="when trying to perform full cluster restart: Timed out waiting for elasticsearch-cdm-dkx6l77h-1 to rejoin cluster elasticsearch" time="2020-06-05T13:29:30Z" level=info msg="Completed full cluster restart for cert redeploy on elasticsearch" time="2020-06-05T13:29:34Z" level=info msg="Beginning full cluster restart on elasticsearch" time="2020-06-05T13:30:06Z" level=info msg="Waiting for all nodes to rejoin cluster \"elasticsearch\" in namespace \"openshift-logging\"" time="2020-06-05T13:30:37Z" level=warning msg="when trying to perform full cluster restart: Timed out waiting for elasticsearch-cdm-dkx6l77h-2 to rejoin cluster elasticsearch" We why have the same settings set at both the transient and persistent levels? Are we aware of https://www.elastic.co/guide/en/elasticsearch/reference/6.8/cluster-update-settings.html#_order_of_precedence ? The transient settings has precedence over persistent; making the "cluster.routing.allocation.enable" : "primaries" basically no-op. Verified clusterlogging.4.4.0-202006061254 -> clusterlogging.v4.6.0 elasticsearch-operator.4.4.0-202006061254 -> elasticsearch-operator.v4.6.0 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196 |