Bug 1836452

Summary: Elasticsearch Operator does not continue upgrade on 'yellow', does not allow primaries to be created
Product: OpenShift Container Platform Reporter: ewolinet
Component: LoggingAssignee: ewolinet
Status: CLOSED ERRATA QA Contact: Anping Li <anli>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 4.5CC: aos-bugs, periklis
Target Milestone: ---   
Target Release: 4.5.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-07-13 17:39:31 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description ewolinet 2020-05-15 23:06:25 UTC
Description of problem:
During upgrade, the EO waits for the cluster to go 'green' each time, however it can get away with waiting for 'yellow'.

Also per the elasticsearch documentation we should be setting the allocation to be "primaries" not "none".

How reproducible:
Always

Comment 3 Anping Li 2020-05-23 09:12:50 UTC
Move to verified. 
1).  Make the ES to Yellow status (It is easy as https://bugzilla.redhat.com/show_bug.cgi?id=1838153)
1.2) Install 4.4 Cluster Logging with nodeCount=1 and ZeroRedundancy
1.2) Upgrade CLO to 4.5
1.3) The ES status

#oc get csv
NAME                                        DISPLAY                  VERSION              REPLACES                            PHASE
clusterlogging.4.5.0-202005221517           Cluster Logging          4.5.0-202005221517   clusterlogging.4.4.0-202005221357   Succeeded
elasticsearch-operator.4.4.0-202005220258   Elasticsearch Operator   4.4.0-202005220258                                       Succeeded


#oc exec -c elasticsearch elasticsearch-cdm-a9h73w86-1-758c54d858-b9zb6 -- es_cluster_health
{
  "cluster_name" : "elasticsearch",
  "status" : "yellow",
  "timed_out" : false,
  "number_of_nodes" : 1,
  "number_of_data_nodes" : 1,
  "active_primary_shards" : 16,
  "active_shards" : 16,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 10,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 1,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 61.53846153846154
}

2) Upgrade EO to 4.5.
3) Check the ES pods. You can see the ES had been upgraded to 4.5. (ose-elasticsearch-proxy).  (Note: the ES couldn't be running for https://bugzilla.redhat.com/show_bug.cgi?id=1838929. I think that doesn't blocked this bug.)
$ oc get pods
NAME                                            READY   STATUS             RESTARTS   AGE
cluster-logging-operator-558d8f8f7-4w6nm        1/1     Running            0          11m
curator-1590223200-pplnv                        0/1     Completed          0          17m
curator-1590223800-rtd2f                        0/1     Error              0          7m18s
elasticsearch-cdm-a9h73w86-1-64fdf5c8f5-pvbxv   1/2     CrashLoopBackOff   5          5m23s


$oc get pods elasticsearch-cdm-a9h73w86-1-64fdf5c8f5-pvbxv -o yaml |grep 'image:'
    image: quay.io/openshift-qe-optional-operators/ose-logging-elasticsearch6@sha256:1d2c67ad5a6bbebfc4d44c6e943b3c1727cb33731f67c35e69d4436ff8b46774
    image: quay.io/openshift-qe-optional-operators/ose-elasticsearch-proxy@sha256:cc93bc0d0e7a5c92f6380fde91b6bade54994b1d03949441d49a719fbfd55e23

Comment 4 Anping Li 2020-05-23 12:19:37 UTC
As the workaround to BZ1838929. After I used registry.svc.ci.openshift.org/origin/4.5:logging-elasticsearch6 instead of downstream image.  The ES pod can be running.

Comment 5 errata-xmlrpc 2020-07-13 17:39:31 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409