Bug 1840909 - Elasticsearch operator sets shard allocation to "none" during restart for certs
Summary: Elasticsearch operator sets shard allocation to "none" during restart for certs
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Logging
Version: 4.5
Hardware: Unspecified
OS: Unspecified
unspecified
low
Target Milestone: ---
: 4.5.0
Assignee: ewolinet
QA Contact: Anping Li
URL:
Whiteboard:
Depends On:
Blocks: 1844302 1845993
TreeView+ depends on / blocked
 
Reported: 2020-05-27 21:04 UTC by ewolinet
Modified: 2020-07-13 17:42 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1845993 (view as bug list)
Environment:
Last Closed: 2020-07-13 17:42:22 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift elasticsearch-operator pull 365 0 None closed Bug 1840909: Updating cert restarts to also adhere to primaries instead of none 2020-12-30 12:32:41 UTC
Red Hat Product Errata RHBA-2020:2409 0 None None None 2020-07-13 17:42:37 UTC

Description ewolinet 2020-05-27 21:04:35 UTC
Description of problem:
EO currently sets shard allocation to "none" when doing a cert restart, unlike the other restarts that use "primaries".

Version-Release number of selected component (if applicable):
4.5

How reproducible:
Always

Steps to Reproduce:
1. Trigger cert redeployment
2. Check EO status while it is restarting

Actual results:
Shard allocation is set to "none"

Expected results:
Shard allocation should be set to "primaries"

Additional info:

Comment 3 Anping Li 2020-05-30 13:22:47 UTC
The EO reports the message 'Unable to set shard allocation to primaries'[1].  It also reports 'Timed out waiting for elasticsearch-cdm-g56b2tbr-xxx to leave the cluster" [2]. Have we changed the logic on ES cluster upgrading? Do we still need to shard allocation to primaries steps? 

[1]
time="2020-05-30T12:58:02Z" level=info msg="Beginning full cluster restart for cert redeploy on elasticsearch"
time="2020-05-30T12:58:02Z" level=warning msg="Unable to set shard allocation to primaries: Put https://elasticsearch.openshift-logging.svc:9200/_cluster/settings: x509: certificate signed by unknown authority (possibly because of \"crypto/rsa: verification error\" while trying to verify candidate authority certificate \"openshift-cluster-logging-signer\")"
time="2020-05-30T12:58:02Z" level=warning msg="Unable to perform synchronized flush: Post https://elasticsearch.openshift-logging.svc:9200/_flush/synced: x509: certificate signed by unknown authority (possibly because of \"crypto/rsa: verification error\" while trying to verify candidate authority certificate \"openshift-cluster-logging-signer\")"
time="2020-05-30T12:58:02Z" level=warning msg="Unable to get cluster size prior to restart for elasticsearch-cdm-g56b2tbr-1"
time="2020-05-30T12:58:02Z" level=warning msg="Unable to get cluster size prior to restart for elasticsearch-cdm-g56b2tbr-2"
time="2020-05-30T12:58:02Z" level=warning msg="Unable to get cluster size prior to restart for elasticsearch-cdm-g56b2tbr-3"
time="2020-05-30T12:58:02Z" level=warning msg="Unable to list existing templates in order to reconcile stale ones: Get https://elasticsearch.openshift-logging.svc:9200/_template: x509: certificate signed by unknown authority (possibly because of \"crypto/rsa: verification error\" while trying to verify candidate authority certificate \"openshift-cluster-logging-signer\")"
time="2020-05-30T12:58:52Z" level=info msg="Kibana status successfully updated"

[2]
time="2020-05-30T12:59:04Z" level=info msg="Timed out waiting for elasticsearch-cdm-g56b2tbr-1 to leave the cluster"
time="2020-05-30T12:59:22Z" level=info msg="skipping deleting kibana 5 image because kibana 6 installed"
time="2020-05-30T12:59:52Z" level=info msg="skipping kibana migrations: no index \".kibana\" available"
time="2020-05-30T12:59:52Z" level=info msg="Kibana status successfully updated"
time="2020-05-30T13:00:22Z" level=info msg="skipping deleting kibana 5 image because kibana 6 installed"
time="2020-05-30T13:00:48Z" level=info msg="Timed out waiting for elasticsearch-cdm-g56b2tbr-2 to leave the cluster"
time="2020-05-30T13:00:52Z" level=info msg="skipping kibana migrations: no index \".kibana\" available"
time="2020-05-30T13:00:52Z" level=info msg="Kibana status successfully updated"
time="2020-05-30T13:01:22Z" level=info msg="skipping deleting kibana 5 image because kibana 6 installed"
time="2020-05-30T13:01:52Z" level=info msg="skipping kibana migrations: no index \".kibana\" available"
time="2020-05-30T13:01:52Z" level=info msg="Kibana status successfully updated"
time="2020-05-30T13:02:20Z" level=info msg="Timed out waiting for elasticsearch-cdm-g56b2tbr-3 to leave the cluster"
time="2020-05-30T13:02:23Z" level=info msg="skipping deleting kibana 5 image because kibana 6 installed"
time="2020-05-30T13:02:53Z" level=info msg="skipping kibana migrations: no index \".kibana\" available"
time="2020-05-30T13:02:53Z" level=info msg="Kibana status successfully updated"

time="2020-05-30T13:05:24Z" level=info msg="Waiting for cluster to complete recovery: yellow / green"
time="2020-05-30T13:05:25Z" level=info msg="Waiting for cluster to complete recovery: yellow / green"
time="2020-05-30T13:05:53Z" level=info msg="Waiting for cluster to complete recovery: yellow / green"
time="2020-05-30T13:05:54Z" level=info msg="skipping deleting kibana 5 image because kibana 6 installed"
time="2020-05-30T13:05:54Z" level=info msg="Waiting for cluster to complete recovery: yellow / green"

Comment 4 Anping Li 2020-06-04 04:07:56 UTC
Move to verified, as the ES wasn't moved to None during cert regeneration.

#oc get csv
NAME                                        DISPLAY                  VERSION              REPLACES                                    PHASE
clusterlogging.4.5.0-202006032057           Cluster Logging          4.5.0-202006032057   clusterlogging.4.4.0-202006011837           Succeeded
elasticsearch-operator.4.5.0-202006031723   Elasticsearch Operator   4.5.0-202006031723   elasticsearch-operator.4.4.0-202006011837   Succeeded

Comment 5 errata-xmlrpc 2020-07-13 17:42:22 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409


Note You need to log in before you can comment on or make changes to this bug.