Bug 1892005

Summary: Elasticsearch Operator is unable to update ES pods when they are in crashloopbackoff state
Product: OpenShift Container Platform Reporter: ewolinet
Component: LoggingAssignee: ewolinet
Status: CLOSED ERRATA QA Contact: Giriyamma <gkarager>
Severity: low Docs Contact:
Priority: unspecified    
Version: 4.5CC: anli, aos-bugs, periklis
Target Milestone: ---   
Target Release: 4.6.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: logging-exploration
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1892002 Environment:
Last Closed: 2020-11-09 14:09:33 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1892002    
Bug Blocks: 1891019    

Description ewolinet 2020-10-27 20:47:00 UTC
+++ This bug was initially created as a clone of Bug #1892002 +++

Description of problem:
In EO 4.5 if ES pods are in a crashloopbackoff state, the operator does not treat this as an 'unschedulable' node and will try to communicate with the cluster before making changes which will not be possible. So the cluster gets into a 'wedged' state.


Version-Release number of selected component (if applicable):
4.5

How reproducible:
Always


Steps to Reproduce:
1. Force ES pods into a 'crashloopbackoff' state (using a 4.5+ EO update the CSV to use an elasticsearch 5 image) 
2. Make a change for EO to perform (Update the CSV to use an elasticsearch6 image)
3. Observe EO is unable to make these changes to the ES deployments.

Actual results:
EO does not update the deployments.


Expected results:
EO will update the deployments so the pods can correctly start.


Additional info:
This should be fixed in EO 4.6+ already, the logic to consider a crashloopbackoff an unschedulable condition was added in 4.6 as part of a feature.

Comment 2 Giriyamma 2020-11-04 18:07:49 UTC
Verified this fix using clusterlogging.4.6.0-202010311441.p0, elasticsearch-operator.4.6.0-202010311441.p0.

Comment 4 errata-xmlrpc 2020-11-09 14:09:33 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6.3 extras update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4341