Bug 1840909

Summary: Elasticsearch operator sets shard allocation to "none" during restart for certs
Product: OpenShift Container Platform Reporter: ewolinet
Component: LoggingAssignee: ewolinet
Status: CLOSED ERRATA QA Contact: Anping Li <anli>
Severity: low Docs Contact:
Priority: unspecified    
Version: 4.5CC: aos-bugs
Target Milestone: ---   
Target Release: 4.5.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1845993 (view as bug list) Environment:
Last Closed: 2020-07-13 17:42:22 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1844302, 1845993    

Description ewolinet 2020-05-27 21:04:35 UTC
Description of problem:
EO currently sets shard allocation to "none" when doing a cert restart, unlike the other restarts that use "primaries".

Version-Release number of selected component (if applicable):
4.5

How reproducible:
Always

Steps to Reproduce:
1. Trigger cert redeployment
2. Check EO status while it is restarting

Actual results:
Shard allocation is set to "none"

Expected results:
Shard allocation should be set to "primaries"

Additional info:

Comment 3 Anping Li 2020-05-30 13:22:47 UTC
The EO reports the message 'Unable to set shard allocation to primaries'[1].  It also reports 'Timed out waiting for elasticsearch-cdm-g56b2tbr-xxx to leave the cluster" [2]. Have we changed the logic on ES cluster upgrading? Do we still need to shard allocation to primaries steps? 

[1]
time="2020-05-30T12:58:02Z" level=info msg="Beginning full cluster restart for cert redeploy on elasticsearch"
time="2020-05-30T12:58:02Z" level=warning msg="Unable to set shard allocation to primaries: Put https://elasticsearch.openshift-logging.svc:9200/_cluster/settings: x509: certificate signed by unknown authority (possibly because of \"crypto/rsa: verification error\" while trying to verify candidate authority certificate \"openshift-cluster-logging-signer\")"
time="2020-05-30T12:58:02Z" level=warning msg="Unable to perform synchronized flush: Post https://elasticsearch.openshift-logging.svc:9200/_flush/synced: x509: certificate signed by unknown authority (possibly because of \"crypto/rsa: verification error\" while trying to verify candidate authority certificate \"openshift-cluster-logging-signer\")"
time="2020-05-30T12:58:02Z" level=warning msg="Unable to get cluster size prior to restart for elasticsearch-cdm-g56b2tbr-1"
time="2020-05-30T12:58:02Z" level=warning msg="Unable to get cluster size prior to restart for elasticsearch-cdm-g56b2tbr-2"
time="2020-05-30T12:58:02Z" level=warning msg="Unable to get cluster size prior to restart for elasticsearch-cdm-g56b2tbr-3"
time="2020-05-30T12:58:02Z" level=warning msg="Unable to list existing templates in order to reconcile stale ones: Get https://elasticsearch.openshift-logging.svc:9200/_template: x509: certificate signed by unknown authority (possibly because of \"crypto/rsa: verification error\" while trying to verify candidate authority certificate \"openshift-cluster-logging-signer\")"
time="2020-05-30T12:58:52Z" level=info msg="Kibana status successfully updated"

[2]
time="2020-05-30T12:59:04Z" level=info msg="Timed out waiting for elasticsearch-cdm-g56b2tbr-1 to leave the cluster"
time="2020-05-30T12:59:22Z" level=info msg="skipping deleting kibana 5 image because kibana 6 installed"
time="2020-05-30T12:59:52Z" level=info msg="skipping kibana migrations: no index \".kibana\" available"
time="2020-05-30T12:59:52Z" level=info msg="Kibana status successfully updated"
time="2020-05-30T13:00:22Z" level=info msg="skipping deleting kibana 5 image because kibana 6 installed"
time="2020-05-30T13:00:48Z" level=info msg="Timed out waiting for elasticsearch-cdm-g56b2tbr-2 to leave the cluster"
time="2020-05-30T13:00:52Z" level=info msg="skipping kibana migrations: no index \".kibana\" available"
time="2020-05-30T13:00:52Z" level=info msg="Kibana status successfully updated"
time="2020-05-30T13:01:22Z" level=info msg="skipping deleting kibana 5 image because kibana 6 installed"
time="2020-05-30T13:01:52Z" level=info msg="skipping kibana migrations: no index \".kibana\" available"
time="2020-05-30T13:01:52Z" level=info msg="Kibana status successfully updated"
time="2020-05-30T13:02:20Z" level=info msg="Timed out waiting for elasticsearch-cdm-g56b2tbr-3 to leave the cluster"
time="2020-05-30T13:02:23Z" level=info msg="skipping deleting kibana 5 image because kibana 6 installed"
time="2020-05-30T13:02:53Z" level=info msg="skipping kibana migrations: no index \".kibana\" available"
time="2020-05-30T13:02:53Z" level=info msg="Kibana status successfully updated"

time="2020-05-30T13:05:24Z" level=info msg="Waiting for cluster to complete recovery: yellow / green"
time="2020-05-30T13:05:25Z" level=info msg="Waiting for cluster to complete recovery: yellow / green"
time="2020-05-30T13:05:53Z" level=info msg="Waiting for cluster to complete recovery: yellow / green"
time="2020-05-30T13:05:54Z" level=info msg="skipping deleting kibana 5 image because kibana 6 installed"
time="2020-05-30T13:05:54Z" level=info msg="Waiting for cluster to complete recovery: yellow / green"

Comment 4 Anping Li 2020-06-04 04:07:56 UTC
Move to verified, as the ES wasn't moved to None during cert regeneration.

#oc get csv
NAME                                        DISPLAY                  VERSION              REPLACES                                    PHASE
clusterlogging.4.5.0-202006032057           Cluster Logging          4.5.0-202006032057   clusterlogging.4.4.0-202006011837           Succeeded
elasticsearch-operator.4.5.0-202006031723   Elasticsearch Operator   4.5.0-202006031723   elasticsearch-operator.4.4.0-202006011837   Succeeded

Comment 5 errata-xmlrpc 2020-07-13 17:42:22 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409