1840909 – Elasticsearch operator sets shard allocation to "none" during restart for certs

Bug 1840909 - Elasticsearch operator sets shard allocation to "none" during restart for certs

Summary: Elasticsearch operator sets shard allocation to "none" during restart for certs

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Logging
Sub Component:
Version:	4.5
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	low
Target Milestone:	---
Target Release:	4.5.0
Assignee:	ewolinet
QA Contact:	Anping Li
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1844302 1845993
TreeView+	depends on / blocked

Reported:	2020-05-27 21:04 UTC by ewolinet
Modified:	2020-07-13 17:42 UTC (History)
CC List:	1 user (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1845993 (view as bug list)
Environment:
Last Closed:	2020-07-13 17:42:22 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift elasticsearch-operator pull 365	0	None	closed	Bug 1840909: Updating cert restarts to also adhere to primaries instead of none	2020-12-30 12:32:41 UTC
Red Hat Product Errata	RHBA-2020:2409	0	None	None	None	2020-07-13 17:42:37 UTC

Description ewolinet 2020-05-27 21:04:35 UTC

Description of problem:
EO currently sets shard allocation to "none" when doing a cert restart, unlike the other restarts that use "primaries".

Version-Release number of selected component (if applicable):
4.5

How reproducible:
Always

Steps to Reproduce:
1. Trigger cert redeployment
2. Check EO status while it is restarting

Actual results:
Shard allocation is set to "none"

Expected results:
Shard allocation should be set to "primaries"

Additional info:

Comment 3 Anping Li 2020-05-30 13:22:47 UTC

The EO reports the message 'Unable to set shard allocation to primaries'[1].  It also reports 'Timed out waiting for elasticsearch-cdm-g56b2tbr-xxx to leave the cluster" [2]. Have we changed the logic on ES cluster upgrading? Do we still need to shard allocation to primaries steps? 

[1]
time="2020-05-30T12:58:02Z" level=info msg="Beginning full cluster restart for cert redeploy on elasticsearch"
time="2020-05-30T12:58:02Z" level=warning msg="Unable to set shard allocation to primaries: Put https://elasticsearch.openshift-logging.svc:9200/_cluster/settings: x509: certificate signed by unknown authority (possibly because of \"crypto/rsa: verification error\" while trying to verify candidate authority certificate \"openshift-cluster-logging-signer\")"
time="2020-05-30T12:58:02Z" level=warning msg="Unable to perform synchronized flush: Post https://elasticsearch.openshift-logging.svc:9200/_flush/synced: x509: certificate signed by unknown authority (possibly because of \"crypto/rsa: verification error\" while trying to verify candidate authority certificate \"openshift-cluster-logging-signer\")"
time="2020-05-30T12:58:02Z" level=warning msg="Unable to get cluster size prior to restart for elasticsearch-cdm-g56b2tbr-1"
time="2020-05-30T12:58:02Z" level=warning msg="Unable to get cluster size prior to restart for elasticsearch-cdm-g56b2tbr-2"
time="2020-05-30T12:58:02Z" level=warning msg="Unable to get cluster size prior to restart for elasticsearch-cdm-g56b2tbr-3"
time="2020-05-30T12:58:02Z" level=warning msg="Unable to list existing templates in order to reconcile stale ones: Get https://elasticsearch.openshift-logging.svc:9200/_template: x509: certificate signed by unknown authority (possibly because of \"crypto/rsa: verification error\" while trying to verify candidate authority certificate \"openshift-cluster-logging-signer\")"
time="2020-05-30T12:58:52Z" level=info msg="Kibana status successfully updated"

[2]
time="2020-05-30T12:59:04Z" level=info msg="Timed out waiting for elasticsearch-cdm-g56b2tbr-1 to leave the cluster"
time="2020-05-30T12:59:22Z" level=info msg="skipping deleting kibana 5 image because kibana 6 installed"
time="2020-05-30T12:59:52Z" level=info msg="skipping kibana migrations: no index \".kibana\" available"
time="2020-05-30T12:59:52Z" level=info msg="Kibana status successfully updated"
time="2020-05-30T13:00:22Z" level=info msg="skipping deleting kibana 5 image because kibana 6 installed"
time="2020-05-30T13:00:48Z" level=info msg="Timed out waiting for elasticsearch-cdm-g56b2tbr-2 to leave the cluster"
time="2020-05-30T13:00:52Z" level=info msg="skipping kibana migrations: no index \".kibana\" available"
time="2020-05-30T13:00:52Z" level=info msg="Kibana status successfully updated"
time="2020-05-30T13:01:22Z" level=info msg="skipping deleting kibana 5 image because kibana 6 installed"
time="2020-05-30T13:01:52Z" level=info msg="skipping kibana migrations: no index \".kibana\" available"
time="2020-05-30T13:01:52Z" level=info msg="Kibana status successfully updated"
time="2020-05-30T13:02:20Z" level=info msg="Timed out waiting for elasticsearch-cdm-g56b2tbr-3 to leave the cluster"
time="2020-05-30T13:02:23Z" level=info msg="skipping deleting kibana 5 image because kibana 6 installed"
time="2020-05-30T13:02:53Z" level=info msg="skipping kibana migrations: no index \".kibana\" available"
time="2020-05-30T13:02:53Z" level=info msg="Kibana status successfully updated"

time="2020-05-30T13:05:24Z" level=info msg="Waiting for cluster to complete recovery: yellow / green"
time="2020-05-30T13:05:25Z" level=info msg="Waiting for cluster to complete recovery: yellow / green"
time="2020-05-30T13:05:53Z" level=info msg="Waiting for cluster to complete recovery: yellow / green"
time="2020-05-30T13:05:54Z" level=info msg="skipping deleting kibana 5 image because kibana 6 installed"
time="2020-05-30T13:05:54Z" level=info msg="Waiting for cluster to complete recovery: yellow / green"

Comment 4 Anping Li 2020-06-04 04:07:56 UTC

Move to verified, as the ES wasn't moved to None during cert regeneration.

#oc get csv
NAME                                        DISPLAY                  VERSION              REPLACES                                    PHASE
clusterlogging.4.5.0-202006032057           Cluster Logging          4.5.0-202006032057   clusterlogging.4.4.0-202006011837           Succeeded
elasticsearch-operator.4.5.0-202006031723   Elasticsearch Operator   4.5.0-202006031723   elasticsearch-operator.4.4.0-202006011837   Succeeded

Comment 5 errata-xmlrpc 2020-07-13 17:42:22 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409

Note You need to log in before you can comment on or make changes to this bug.