1836452 – Elasticsearch Operator does not continue upgrade on 'yellow', does not allow primaries to be created

Bug 1836452 - Elasticsearch Operator does not continue upgrade on 'yellow', does not allow primaries to be created

Summary: Elasticsearch Operator does not continue upgrade on 'yellow', does not allow ...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Logging
Sub Component:
Version:	4.5
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	4.5.0
Assignee:	ewolinet
QA Contact:	Anping Li
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-05-15 23:06 UTC by ewolinet
Modified:	2020-07-13 17:39 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-07-13 17:39:31 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift elasticsearch-operator pull 355	0	None	closed	Bug 1836452: Updating upgrade policy to allow yellow and set only primaries	2020-10-05 13:23:30 UTC
Red Hat Product Errata	RHBA-2020:2409	0	None	None	None	2020-07-13 17:39:51 UTC

Description ewolinet 2020-05-15 23:06:25 UTC

Description of problem:
During upgrade, the EO waits for the cluster to go 'green' each time, however it can get away with waiting for 'yellow'.

Also per the elasticsearch documentation we should be setting the allocation to be "primaries" not "none".

How reproducible:
Always

Comment 3 Anping Li 2020-05-23 09:12:50 UTC

Move to verified. 
1).  Make the ES to Yellow status (It is easy as https://bugzilla.redhat.com/show_bug.cgi?id=1838153)
1.2) Install 4.4 Cluster Logging with nodeCount=1 and ZeroRedundancy
1.2) Upgrade CLO to 4.5
1.3) The ES status

#oc get csv
NAME                                        DISPLAY                  VERSION              REPLACES                            PHASE
clusterlogging.4.5.0-202005221517           Cluster Logging          4.5.0-202005221517   clusterlogging.4.4.0-202005221357   Succeeded
elasticsearch-operator.4.4.0-202005220258   Elasticsearch Operator   4.4.0-202005220258                                       Succeeded


#oc exec -c elasticsearch elasticsearch-cdm-a9h73w86-1-758c54d858-b9zb6 -- es_cluster_health
{
  "cluster_name" : "elasticsearch",
  "status" : "yellow",
  "timed_out" : false,
  "number_of_nodes" : 1,
  "number_of_data_nodes" : 1,
  "active_primary_shards" : 16,
  "active_shards" : 16,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 10,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 1,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 61.53846153846154
}

2) Upgrade EO to 4.5.
3) Check the ES pods. You can see the ES had been upgraded to 4.5. (ose-elasticsearch-proxy).  (Note: the ES couldn't be running for https://bugzilla.redhat.com/show_bug.cgi?id=1838929. I think that doesn't blocked this bug.)
$ oc get pods
NAME                                            READY   STATUS             RESTARTS   AGE
cluster-logging-operator-558d8f8f7-4w6nm        1/1     Running            0          11m
curator-1590223200-pplnv                        0/1     Completed          0          17m
curator-1590223800-rtd2f                        0/1     Error              0          7m18s
elasticsearch-cdm-a9h73w86-1-64fdf5c8f5-pvbxv   1/2     CrashLoopBackOff   5          5m23s


$oc get pods elasticsearch-cdm-a9h73w86-1-64fdf5c8f5-pvbxv -o yaml |grep 'image:'
    image: quay.io/openshift-qe-optional-operators/ose-logging-elasticsearch6@sha256:1d2c67ad5a6bbebfc4d44c6e943b3c1727cb33731f67c35e69d4436ff8b46774
    image: quay.io/openshift-qe-optional-operators/ose-elasticsearch-proxy@sha256:cc93bc0d0e7a5c92f6380fde91b6bade54994b1d03949441d49a719fbfd55e23

Comment 4 Anping Li 2020-05-23 12:19:37 UTC

As the workaround to BZ1838929. After I used registry.svc.ci.openshift.org/origin/4.5:logging-elasticsearch6 instead of downstream image.  The ES pod can be running.

Comment 5 errata-xmlrpc 2020-07-13 17:39:31 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409

Note You need to log in before you can comment on or make changes to this bug.